﻿ Batted balls and park effects | The Hardball Times

# Batted balls and park effects

A few years ago, Dave Studeman and I exchanged a few dozen e-mails and a few hundred thoughts discussing park factors for batted balls. The result was an entertaining article by Dave in The Hardball Times Annual 2006 that pointed out some of his more interesting findings and had a larger data summary at the end.

This article will be much in the same vein, except that we now have more data, I’ll correct for the randomness in the data to present more meaningful numbers, and of course, you don’t have to buy a book to read this piece.

Most readers should be familiar with run park factors, which tell us how many more or fewer runs are scored at a given park when compared with a neutral context. For example, ESPN shows a run factor of 1.16 for the Colorado Rockies last season, which means that a game played at Coors would have 16 percent more run scoring than a game played by the same two teams at an average park.

But although run park factors are by far the most ubiquitous type, technically you can construct a park factor for any event. ESPN, for example, shows a 1.22 park factor for home runs at Coors, a 1.12 factor for hits, and so on. One of the things no one has really explored to my knowledge, except for Dave, is park factors for batted balls.

At THT, we buy data that breaks down every player’s numbers into infield and outfield fly balls, line drives, groundballs, and bunts, and further breaks those batted balls into all their possible outcomes, such as singles, doubles, triples, home runs, etc. We can construct park factors for all those events, and others, and if we do—and shortly, you won’t have to take my word for this—some very interesting conclusions come out of that work.

But before we get to the numbers, we do have to take care of one not-so-little issue. The problem with park factors, or really any statistic, is that they are based on sample data, which can be strongly affected by randomness. That can be a serious issue when we try to make conclusions from a number that seems significant, but really isn’t. The best example of that is batting average on balls in play (BABIP) for pitchers, which turns out to be heavily influenced by luck—in turn, that makes ERA a much less meaningful statistic than one might expect.

Luckily, there is something we can do to measure and take into account the effect of randomness on any statistic, park factors included. That concept is known as regression to the mean. Essentially, we can add a certain amount of average park effects to estimate a park’s “true” impact on a given statistic. The more luck involved, the more heavily we move the measured park factors toward the average (which is by definition 1.00).

So how do we figure how much to regress? One thing we can do is find the correlation between park factors in a given category in one year and the next. The stronger the correlation, the less luck is involved and the less we regress.

Some categories show little or no correlation and get regressed a lot. But in some categories, the results were so interesting, even after being regressed to the mean, that I just had to share them with you.

For example, did you know that parks can have a very large effect on strikeout rates? Here are the five best and worst ballparks for strikeouts:

```Team            K
Marlins         1.09
Mariners        1.09
Twins           1.07
Dodgers         1.06
---
Diamondbacks    0.93
Athletics       0.93
Orioles         0.93
Royals          0.91
Rockies         0.89```

A pitcher who would record 100 strikeouts in a neutral park would punch out 20 more hitters pitching in Florida than he would in Colorado! That effect is huge, and it suggests that when we look at strikeout rates, we definitely need to keep a player’s home park in mind.

For example, last year’s strikeout leader, Jake Peavy, pitched in one of the most strikeout-friendly parks in the major leagues. In a neutral environment, he would have struck out around 230 batters, which would have dropped him to fourth in MLB.

It has long been known that parks affect strikeouts, but it’s nice to finally know just how great the effect is and where it’s most important. Given the presence of the Marlins and Mariners at the top of the list, and the Rockies and Diamondbacks near the bottom, it seems that humidity is one of the biggest determinants of strikeout park effects.

Now what about walks? I’m fairly certain that sabermetric conventional wisdom is that the effect of parks on walk totals is minimal, if not non-existent. I found, however, that this was not really the case.

```Team            BB
Brewers         1.07
Mariners        1.06
Indians         1.05
Marlins         1.05
Diamondbacks    1.04
---
Yankees         0.97
Astros          0.97
Angels          0.97
Twins           0.96
Pirates         0.94```

The spread here isn’t quite as great as it is for strikeouts, but it is certainly meaningful. What is unclear is why this effect exists. The parks at the top and bottom seem fairly random, but there is a definite correlation (.32) in walk park factors from year to year.

However, even though the spread in walk park factors is not much different from the strikeout park factor spread, the run effect is much smaller. Whereas a player going from the best strikeout park (Florida) to the worst (Colorado) would see his ERA rise around .15 points, a player going from the worst park for walks (Milwaukee) to the best (Pittsburgh) would see his ERA rise only .07 points.

Predicting the 2018 Cy Young Race with Machine Learning
After using a model to predict league MVP races, we turn our attention to the Cy Young Award.

Thus, it’s true that there really is little reason to pay attention to park effects for walks.

One of the more interesting effects I found is that parks have a strong affect on the proportion of batted balls that are infield flies.

```Team            FlysIF
Brewers         1.15
Mariners        1.12
Marlins         1.12
Reds            1.08
Devil Rays      1.07
---
Phillies        0.93
Royals          0.93
Indians         0.92
Diamondbacks    0.92
Giants          0.90```

A player is 28 percent more likely to hit an infield fly in Milwaukee than he is in San Francisco. Why is that? My guess is that it has to do with foul territory. Since infield flies are only recorded when the ball is put into play, parks with a lot of foul territory are more likely to see foul pop-ups stay in and get caught, whereas ballparks with little foul territory will see a lot of pop-ups land in the stands and go unrecorded.

The problem with that theory is that while the parks at the bottom of the list do tend to be a little smaller in terms of foul territory, those at the top seem to be pretty average on the whole. Perhaps my data source is off, but there may be some other variable I’m not thinking of.

Let’s now turn our attention to ther other kind of fly ball, the outfield fly, or more specifically, to the potential outcomes of an outfield fly. It turns out that there are large effects everywhere we look. Let’s start with singles.

```Team            1BOF
Angels          1.22
Red Sox         1.21
Astros          1.15
Mets            1.10
Cubs            1.10
---
Diamondbacks    0.85
Mariners        0.84
Blue Jays       0.81
Brewers         0.78
Indians         0.77```

I’m willing to bet that the Red Sox and Astros are up there because of their short and tall left fields and huge right and center fields. Hard hit balls can turn into singles off the wall, while fly balls that would normally be outs drop in front of outfielders playing deep.

Overall, the effects are pretty huge. Anaheim sees 58 percent more singles on outfield flies than Cleveland—however, since singles on outfield flies are fairly rare, the effect is on the magnitude of strikeout park factors, around .15 runs at the extremes.

The doubles park factor has a similar spread, except for one huge outlier.

```Team            2BOF
Red Sox         1.53
Indians         1.19
Rockies         1.10
Cubs            1.07
Diamondbacks    1.06
---
Braves          0.87
Tigers          0.86
Orioles         0.86
Devil Rays      0.84
Phillies        0.83```

Almost all the teams are around plus/minus 20 percent, except for the Red Sox, where the Green Monster results in 53 percent more doubles per outfield fly than in the average park. So if Aaron Rowand had played for the Sox last season instead of the Phillies, he would have gone from 45 doubles to around 62. No wonder five of the past 20 American League doubles champions have played for the Red Sox.

The run impact here is greater, with a swing of roughly .35 earned runs per game from Fenway to Citizen’s Bank Park.

But if we’re talking about outfield flies, the most important possible event is a home run of course. So let’s take a look at the park factors for home runs per outfield fly.

```Team            HROF
White Sox       1.26
Rockies         1.22
Blue Jays       1.19
Phillies        1.16
Cubs            1.16
---
Cardinals       0.87
Angels          0.87
Mets            0.87
Giants          0.86

How’s that for a surprise? The Cell makes more outfield flies into home runs than Coors Field. At first, I thought this might be due to the installation of the humidor, but the park factor for home runs per outfield fly at Coors has been fairly stable since 2003, so that doesn’t look to be the case. Instead, it appears that the Cell itself is just an extreme home run park, though I have to wonder how much of that is Chicago itself, given Wrigley Field’s placement as one of the top home run parks in the league.

PETCO Field in San Diego is well known of course for its suppression of run scoring, and of course a large part of that is its effect on home runs. No wonder the Padres continually finish last in the National League in home run hitting.

The run effect here is by far the biggest of all the categories we’ve examined, almost .6 runs a game at the extremes. That is why you absolutely can’t compare pitchers from different teams without correcting for park effects first—it just isn’t a fair comparison.

Let’s look at one more cool finding. The spread here isn’t as great as for other categories, but because this event happens so often, its effect is not necessarily smaller. As well, while we can offer up explanations for the other effects we’ve discussed here today, this one is really confusing.

```Team            Grndrs
Indians         1.04
Giants          1.03
Phillies        1.02
Red Sox         1.02
Blue Jays       1.02
---
Marlins         0.98
Yankees         0.98
Brewers         0.98
Mariners        0.98
Reds            0.97```

I’m talking about groundballs, and it turns out that they’re more likely to occur at some parks than at others. In his article in the THT Annual 2006, Dave wrote:

In each of the four years we examined, for both batters and pitchers, Jacobs Field was a groundball park. This was the single most surprising finding to me. The groundball factor is Jacobs’s most important ballpark influence.

Now we have confirmation that this effect was not just a fluke. Unfortunately, I have no more explanation than Dave for why this may be the case. If you have any ideas, feel free to send them—I’d love to see what you’ve got.

Otherwise, feel free to download a spreadsheet with all batted ball park factors for 2003-07, properly regressed, of course. I couldn’t get to all the fun stuff in there, so feel free to peruse it for your own pleasure.