BABIP Park Factors and the Batted Ball Connection

Some of you may recall that before being promoted from a FanGraphs Community Research writer to an actual FanGraphs writer, my primary focus was on the relationship between batted ball types (infield fly balls, in particular) and BABIP for pitchers.  At the time, I’d been leaving park factors out of the equation in a [vain] attempt to keep things simple, but now I want to give them a bit of attention.

Now, Guts! is a great resource on FanGraphs, but it does leave out BABIP, HR/FB, and — I believe — something else I’d like to talk to you about for a second.  If you’re a big fan of the batted ball stats here, this bit of information might be completely earth-shattering, leaving you sobbing in a heap on the floor, pondering how your life will never be the same again: IFFB% may not mean what you think it does.  Now, FB%, for example — that’s defined as fly balls divided by batted balls, right?  Many of us might therefore assume that IFFB% equals infield fly balls divided by batted balls… but it doesn’t.  IFFB% is actually infield fly balls divided by fly balls.  This means that IFFB% doesn’t tell you much about a player unless you have the context of his FB% to go with it.  It also means that IFFB% * FB% equals what you probably thought IFFB% was, which is IFFB/(Batted Balls).

Hopefully you’ll be able to read this clearly through your tears: I’m going to introduce a new, not-officially-FanGraphs-sanctioned term here: IFFB% * FB% = PU%.  PU%, or popup percentage — again, it’s what you probably thought IFFB% meant — is the percentage of batted balls that are infield flies.  This leaves OFFB%, or outfield fly balls, as the remainder of FB% (i.e., FB% = PU%+OFFB%).

So, without further ado, here’s a sortable list of the park factors I came up with for the 2009-2012 seasons, with the exceptions noted at the bottom:

Team BABIP GB/FB LD% GB% FB% IFFB% PU% OFFB% HR/FB
Angels 98.4 98.0 99.3 99.2 101.3 95.4 96.5 101.8 93.8
Astros 99.4 99.8 98.7 100.3 100.4 98.4 98.9 100.6 103.2
Athletics 98.0 98.9 101.2 99.0 100.5 102.7 103.2 100.2 91.5
Blue Jays 99.7 99.6 100.0 99.7 100.3 98.6 98.9 100.4 105.9
Braves 100.6 100.7 101.8 99.9 99.2 95.8 95.0 99.6 97.7
Brewers 99.5 96.0 98.7 98.4 102.8 98.2 100.8 103.0 109.3
Cardinals 99.3 102.8 101.4 100.7 98.1 102.1 100.1 98.0 91.8
Cubs 100.9 98.4 99.6 99.4 101.0 99.2 100.2 101.1 99.5
Diamondbacks 102.3 100.4 101.5 99.8 99.4 97.5 96.9 99.7 104.1
Dodgers 98.8 100.4 97.8 100.8 100.2 109.5 109.4 99.3 98.9
Giants 99.8 106.6 100.4 102.7 96.6 101.3 97.9 96.5 90.5
Indians 98.9 104.5 101.0 101.7 97.5 101.9 99.3 97.3 96.3
Mariners 98.2 99.4 101.1 99.4 100.2 103.9 104.1 99.8 90.4
Marlins **
Mets *** 98.3 96.6 97.2 99.2 102.5 107.0 110.0 101.8 92.6
Nationals 99.5 99.4 99.3 99.9 100.5 98.9 99.4 100.6 99.3
Orioles 101.4 99.8 98.6 100.1 100.6 95.2 95.7 101.1 108.9
Padres 96.6 102.4 97.2 101.7 99.4 96.4 95.9 99.8 89.6
Phillies 99.5 100.4 100.8 100.1 99.6 97.7 97.3 99.8 103.4
Pirates 98.2 103.4 100.4 101.3 98.2 96.2 94.5 98.6 90.6
Rangers 102.2 97.7 103.8 98.0 100.3 96.6 96.7 100.6 109.9
Rays 98.3 96.5 99.1 98.4 102.3 112.2 115.1 101.0 93.6
Red Sox 104.3 100.7 101.1 100.1 99.4 103.7 103.1 99.0 97.0
Reds 100.2 99.6 100.5 99.7 100.1 105.3 105.5 99.6 115.5
Rockies 105.5 103.8 104.1 100.5 97.2 90.6 88.3 98.2 115.5
Royals 101.8 103.4 97.7 102.1 98.7 91.1 90.0 99.7 91.3
Tigers 99.8 98.7 97.9 99.9 101.2 104.2 105.5 100.8 96.3
Twins * 101.1 103.4 102.9 100.9 97.4 102.8 100.0 101.0 94.5
White Sox 99.7 95.2 99.7 97.8 102.8 97.3 99.9 103.2 113.6
Yankees 98.9 99.0 98.2 99.9 101.0 102.9 103.9 100.7 112.7

* Twins’ factors based on 2010-2012 data only

** Marlins Park excluded due to 2012 being first year (insufficient sample size)

*** Citi Field’s walls were moved closer in 2012

Park factors are halved (based on the assumption that a player will play half of their games there).

If you hadn’t heard,  The Padres and Mariners will be moving the fences in a bit this year, by the way.  My apologies if I neglected to mention any significant park dimension changes that happened between 2009 and 2012.

If you’re like me, you might find this table interesting; if you’re a normal person, skip right ahead:

Correlations Between Park Factors

BABIP GB/FB LD% GB% FB% IFFB% PU% OFFB%
GB/FB 0.198
LD% 0.544 0.296
GB% 0.003 0.922 -0.087
FB% -0.322 -0.966 -0.527 -0.799
IFFB% -0.389 -0.249 -0.228 -0.165 0.274
PU% -0.432 -0.501 -0.357 -0.378 0.534 0.959
OFFB% -0.167 -0.867 -0.366 -0.753 0.863 0.009 0.258
HR/FB 0.450 -0.342 0.205 -0.435 0.257 -0.239 -0.137 0.306

An obligatory refresher for those who haven’t taken statistics in a while (or ever): correlation coefficients (“r”) range between -1 and 1. A correlation of “0” means the two factors being compared have no apparent connection, whereas “1” indicates the two factors move together in a perfectly linear way, and “-1” means they move perfectly linearly in opposite directions.

I bolded the connections I thought were the most interesting.  Now, to discuss them more in-depth:

High LD% factor = high BABIP factor

This should come as no surprise to those of you who read my first Community article, in which I pointed out LD% and [what I’m now calling] PU% as the two main factors for explaining pitcher BABIPs.  LD% for a pitcher is hard to predict from year-to-year, and park factors aren’t entirely consistent on a yearly basis either, but many of the line drive park factors do make a lot of sense, and you can reasonably expect the factors behind them to exert their influence yearly.

Specifically, let’s look at the top two parks in terms of high LD% — Colorado and Texas.  What do they have in common?  Well, the most obvious is thin air; Colorado due to its altitude, and Texas presumably due to heat and perhaps dryness.  Thin air, of course, offers less resistance to a batted ball, but it also should theoretically allow for less break on pitches.  Most of the stadia (that’s fancy talk for “stadiums”) at the low end of the list also make sense, having thick marine air.  KC is an exception… but then again, its BABIP factor isn’t in-line with those of its surrounding teams on the LD% list.  This could have something to do with scorer’s bias issues, such as the one discussed here.  Another possible contributor to LD% differences is the batter’s eye in each stadium.

High PU% = low BABIP

This shouldn’t be a shocker to those of you who’ve read my previous work.  Popups are pretty close to automatic outs.  Let’s talk about how stadium characteristics might influence PU%.  The first thing that comes to mind is that a greater amount of foul territory should lead to a higher PU%; that’s because a foul IFFB is only recorded if caught.

Another possible factor, judging by the Rays’ home field being firmly at the top of the list, is the dome factor.  You might think the whitish background of the dome against a popup might not be so conducive to catching it, but perhaps the lack of sun and wind helps to make up for that.  And it’s not like fielders in non-domed parks never have to deal with whitish backgrounds — clouds and haze are a thing, after all.

High HR/FB equals high BABIP, high OFFB%, and low PU%?

It’s worth reminding you at this point that home runs are excluded from consideration in BABIP, but not in batted ball stats.  That’s one reason why fly ball pitchers tend to have lower BABIPs — they may allow more HR, but those don’t count as a knock against their BABIPs.  The other reason is that fly balls, especially popups, make for easier putouts.

So, if HR aren’t part of BABIP, why would HR/FB have an apparent strong-ish connection to BABIP?  The most obvious is that a high HR/FB is a sign of harder contact being made, for whatever reason, which you might expect to lead to a higher BABIP.  Of course, you would also expect a higher HR/FB in small stadia, where perhaps more balls are bouncing uncatchably off of outfield walls and dropping for hits.

Now, PU% and OFFB% generally move together, both moving against BABIP, whereas HR/FB moves together with BABIP.  That’s why I found it interesting that HR/FB divides PU% and OFFB%.  I think that’s easy to explain in the context of an individual pitcher, but maybe not so much in the context of park factors.  I’d like to hear your theories on it.

Oh, but before we make too much of this, I should tell you that the HR/FB factor appears to be the most prone to fluctuation of the bunch.

Putting it all together, kind of

As I am wont to do, I’ve regressed some of the various park factors to see how they might be able to explain each park’s BABIP factor:

BABIP = 0.48*LD% + 0.37*GB% – 0.05*PU% + 0.11*OFFB% + 0.09*HR/FB

The formula itself is, since it only applies to park factors, as useful as a poopie-flavored lollipop (Patches O’Houlihan) but it does have a 0.696 correlation to a stadium’s BABIP factor, meaning it can explain nearly half of the differences in BABIP factors (with a 0.484 R-squared).  The park it has the hardest time explaining — by far — is Fenway, no doubt largely thanks to The Green Monster’s extreme BABIP-boosting ways.  Take Boston out of the mix, and the correlation shoots to 0.772 (0.596 R-squared).  Remove the second-biggest outlier, Kauffman Stadium in KC (with its suspiciously-low LD% factor) and the correlation goes to 0.816, explaining 2/3 of the differences.  The exclusion of these outliers lends itself to the creation of a formula not tainted by them, which you probably don’t care about, yet here it is anyway:

BABIP = 0.52*LD% + 0.28*GB% – 0.03*PU% + 0.12*OFFB% + 0.11*HR/FB

That one achieves a 0.829 correlation to the remaining BABIP park factors (0.687 R-squared).

That can be whittled down to:

BABIP = 0.552*LD% + 0.320*GB% + 0.124*HR/FB

…which has a 0.821 correlation to BABIP factor, but if you remove any of those three factors, the correlation takes a major hit (though PU% and OFFB% together can mostly compensate for the loss of GB%).

I haven’t talked about what might contribute to a park’s GB% factor… well, groundskeeping might have a bit to do with it, but my guess is that it’s mainly due to less foul territory, and therefore fewer easy foul ball outs.

Methodology

For those who are curious, the formula I used to calculate each factor was:

(Home Pitching + Home Batting) / (Away Pitching + Away Batting) * 100

… which is a pretty standard park factor formula.  I then halved it like so: 0.5 + Factor/2 … this is based on the assumption that the player plays half their games away at a neutral-factor stadium.  When you consider that some teams play in divisions full of non-neutral opponent stadiums (e.g., Texas faces a bunch of pitcher’s parks), that’s probably not such a safe assumption to make, buuut it’s how park factors are done, and it’s a topic for a different conversation.



Print This Post



Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?


Sort by:   newest | oldest | most voted
RMD
Guest
RMD
3 years 3 months ago

Does anyone know why the Wrigley’s HR/FB is so much lower than Comiskey’s? Are the fences that different? They get more wind on the South Side?

ralph
Guest
ralph
3 years 3 months ago

Are pitcher PAs excluded from this analysis? I don’t know how big of an impact they would make, but I’d feel pretty confident that HR/FB will be lower for pitchers hitting than for non-pitchers hitting.

thrasius
Guest
thrasius
3 years 3 months ago

Actually for the Rays, I’d say that the size of the foul territory has more influence. IIRC, the Trop has the largest infield foul territory in the majors.

J. Cross
Guest
J. Cross
3 years 3 months ago

Steve, this is awesome… and I really wish you’d done this a few weeks ago.

So, let’s say (just for the sake of argument) that I imagine that HR’s and HR/FB will go up 8% for Mariners pitchers and hitters (16% in home games) and I’m trying to figure out how to adjust the other park factors. I assumed that extra base hits would be down but by less than HR’s are up (say 2%) but I also assumed that BABIP would be down a little. If it’s harder contact, as you say, leading to both high BABIP and high HR/FB than that wouldn’t apply to moving fences, of course. Is it possible that teams with larger OFs (low HR/FB) tend to have better defensive outfielders (low BABIP)? Is there something I’m missing here? Also, do you have some rough confidence ranges on those correlations?

Jack Donaghy
Guest
Jack Donaghy
3 years 3 months ago

Steve, this need you have to be the smartest guy in the room is… off-putting

Baltar
Guest
Baltar
3 years 3 months ago

That’s not hard to do in this room, and he is.

O'Jones
Guest
O'Jones
3 years 3 months ago

interesting article! you mention some of the parks being higher and some lower, but so many of these are so close to 100. Which of these values actual statistical outliers? or are they all (except maybe Rox, Sox, and Pads) essentially 100%?

Chad Young
Member
Member
Chad Young
3 years 3 months ago

Might be a small detail, but Re: HR/FB and BABIP – the other factor which might be pretty meaningful, is that every FB that turns into a HR does NOT turn into an out. BABIP on fly balls is low (I want to say it is like .150 or lower) which means that if a FB is hit, and it stays in play, it will lower BABIP, on average, but by becoming a HR, it does not impact BABIP. So each additional FB that becomes a HR takes 1 out of the BABIP denominator, but 0 (or .150 or something) out of the numerator, which increases BABIP.

Chad Young
Member
Member
Chad Young
3 years 3 months ago

Also (and this should have come first) – excellent article; really fascinating research!

Bip
Guest
Bip
3 years 3 months ago

A comment on the content:

In stadiums with thin air, one would expect balls to carry farther, leading to more homeruns, but one would also expects balls to slow down less as they carry. If a ball faces less air resistance, it will maintain more of its speed, meaning that it gives the fielder less time to get to the ball before it drops. This factor could increase HR/FB and BABIP at the same time. This also doesn’t depend on variation in the contact that hitters make.

Also, holding OFFB%, PU%, LD% and GB% constant, BABIP should still positively correlate with HR/FB since an OFFB that goes for a homer reduces the denominator of BABIP, whereas one that isn’t a homer is likely to be caught, increasing the denominator and not the numerator.

As for BABIP, PU%, OFFB% and HR/FB, I don’t see what requires explanation. Since no combination of those factors 100% explain the others, there is room for confounding factors to cause the relationships to be unclear.

An observation (I’m using ~ to mean correlates and !~ to mean negative correlation for convenience):
BABIP ~ HR/FB for reasons above.
HR/FB ~ OFFB% since more OFFBs probably means balls carry farther in general
therefore
BABIP !~ OFFB% seems to violate some transitivity rule. However, consider more OFFB% means slightly more homers and way more outs. The extra outs from extra catchable fly balls lowers BABIP, and the extra homers slightly increase BABIP, if you consider that they would be outs, given OFFB%. However, even though the outs to homers ratio is still very high in terms of BABIP (batting average on OFFB is probably around .120), it is very low on the scale of HR/FB. HR/FB is typically in the range of 11%. If we increase OFFB%, then we are adding batted balls with a HR/FB rate probably closer to 20%. This will greatly increase HR/FB ratio, but since OFFB% are still outs over 80% of the time, it will lower BABIP.

So, the issue is increasing OFFBs raises HR/FB and lowers BABIP. However, when you compare HR/FB and BABIP, you hold OFFB% constant, so we notice that increasing homers without increasing fly balls actually increases BABIP.

A comment on the writing:

It was too self-referential. We don’t need to be reminded all the time that you are weird, and that statistics is boring for normal people, or that we’re busy. If we’re reading your article, you can assumed we’re interested and probably know something about statistics. You’re not giving a speech on a street corner, this is fangraphs.

Bip
Guest
Bip
3 years 3 months ago

My math is a little weird. Batting average of OFFB is probably closer to .200 than I said.

Jeremiah
Guest
Jeremiah
3 years 3 months ago

How much of the correlation between HR/FB and PU% is driven by Coors field? It’s the most extreme for both. Perhaps removing Coors would change that relationship. On the other hand, places like Texas and Arizona that have thinner air and less pitch movement might also produce more solid contact, leading to fewer popups and more home runs.

Also, I personally think that park factors should be calculated with an implicit method. It seems to me that there are to many factors which affect scoring to separate them out. In other words, park factors go along with the strength of an offense and the opposing pitching staff to affect the final score. All of these need to solved simultaneously, relative to each other. Not that the current method is bad, but I think an implicit approach would be better.

AC_Butcha_AC
Member
AC_Butcha_AC
3 years 3 months ago

first of all: very nice article.. ver informative..

one minor mistake, though:

“thick” marine air, as you called it is actually lighter air. Dry air is actually thicker, heavier air.

confusing?
intuitively this seems to be false… but since we are humans and we breathe oxygen, we feel like non-humid air is light, whereas the air with high amounts of humidity seems thick and heavy. this air is obviously filled with water vapor i.e. water molecules which makes it tougher to breathe for us humans. but humid air is lighter because water molecules are lighter than the usual “stuff” in our air.

therefore, a ball flying through humid, sticky, “thick” air experiences less drag than in the heavier, dry air, leading to a greater distance in humid air.

Da MathEmagician
Guest
Da MathEmagician
3 years 3 months ago

I appreciate the research, but I have to question the inferences made. You are essentially inferring that the ballpark is the cause of PU%, etc. But this could very well be a case of spurious causation. The data are only over three years; in my opinion there’s no reason to believe the cause of these data could lie somewhere else, perhaps in the pitching staff and/or the defense behind the pitchers. Without controlling for these possible “lurking” variables, enthusiasm about these assumed ballpark factors should be tempered.

Tom P.
Guest
Tom P.
3 years 2 months ago

I may have missed it, but I haven’t seen tall outfield walls (such as Fenway’s Green Monster)discussed as a possible contributor to a park’s higher BABIP.

HRs are excluded from BABIP because they’re not playable. But neither are balls hit high off the wall at Fenway (and to a lesser extent, at other parks). A large percentage of wall ball doubles and singles are generally NOT playable balls. Since it’s not practical to exclude these from BABIP because of the difficulty in gathering accurate data on these hits from conventional sources, it makes sense that they’ll show up as increased BABIP in ballparks where these kinds of hits are frequent.

wpDiscuz