## The Correlation Between BABIP Rate and Three True Outcomes

First things first, I would like to credit my friend Elling Hofland for coming up with the main idea of this piece. He’s the one who provided me with his thoughts and theories that allowed me to expand on this topic in the first place. Give him a follow on Twitter for sports and stats-related banter; his handle is @ellinghofland.

BABIP, or batting average on balls in play, is an incredibly useful stat. It does a fantastic job at using both luck and quality of contact to give a better grasp as to how a player actually performs during batted-ball events. These batted-ball events only take up a certain percentage of a player’s plate appearances. BABIP rate focuses on how many plate appearances a player has relative to the number of batted-ball events they have. To calculate BABIP rate, you take at bats minus strikeouts and home runs, plus sacrifice flies, and divide that by plate appearances. For example, if a player has 600 PA during a single season along with a 300 batted-ball events, they have a BABIP rate of .500.

Now, if you look at the three variables taken out of that equation, you’re left with walks, strikeouts, and home runs, otherwise known as the “three true outcomes.” These are called true outcomes due to the fact that none of them (for the most part) involve defense on the field. A shortstop can’t screw up a strikeout, walk, or a home run. You can take these three true outcomes and turn them into a rate as well. If you add up a player’s strikeouts, walks, and home runs and then divide them by plate appearances, you get TTO rate.

Let’s look at Mike Trout. In 2017, Trout’s BABIP currently sits at .369. However, he has a BABIP rate of .550 along with a TTO rate of .435, meaning that 55% of his at bats end with a ball in play, while 43.5% of his plate appearances result in a strikeout, walk, or home run. Both BABIP rate and TTO rate are useful stats, as they essentially show how well and how often a player makes contact. While BABIP itself is useful, it can be hard to tell how luck is involved in a batted-ball event when it isn’t hit over a fence for a homer. BABIP rate attempts to bridge the gap between BABIP and the three true outcomes.

Miguel Sano is a well-known slugger. In his three seasons in the majors, he’s smashed the ball when he’s hit it, boasting exit velocities of 94.0 in 2015, 92.3 in 2016, and 93.1 in 2017. Despite these consistent EVs, his BABIP has fluctuated from 2015 to 2017, with marks of .396, .329, and .385, respectively. If we look at his BABIP rate from 2015-2017, they look like this: .429, .478, and .473. Despite the difference in his BABIP from 2016 to 2017, his BABIP rate has stayed nearly the same, meaning that he’s still making the same amount of contact with the ball despite fewer balls falling for hit in 2016. Looking solely at BABIP, it could be argued that 2016 was his “regression” to where he should be after sporting an incredibly high BABIP in 2015. In 2017, one could say his high BABIP is a cause for concern, as he may just be getting lucky. However, his BABIP rate shows that isn’t the case.

Let’s look at another player, Brandon Phillips. Phillips’ BABIP has been incredibly consistent during his past three years, sitting at .315 in 2015, .312 in 2016, and .305 in 2017. Additionally, his BABIP rates have been .820, .816 and .802. Phillips puts the ball in play nearly 80% of the time on a regular basis.

So, as you can imagine, there is a real link between BABIP rate and TTO rate. The more contact a player makes, less they tend to walk or strikeout. Thus, a high BABIP rate equals a low TTO rate. This is exactly what we see if we attempt to correlate these two stats. Below is a snapshot of a graph that shows TTO rate vs. BABIP rate.

Players names aren’t included because, A) it clutters the graph, and B) they aren’t necessary at this point. Accompanying this graph is a trend line with an R squared value, otherwise known as a correlation coefficient. Essentially, an R squared value measures how well your model fits your data, or in this case, how closely correlated TTO and BABIP rate are to each other. It turns out that the R-squared value is .991, which means that the relationship between BABIP rate and TTO rate fit very well together: in fact, you’ll find that TTO rate and BABIP rate are almost the exact opposites of each other. The players with the top 10 lowest BABIP rates in the MLB all have TTO rates of .437 or higher, meaning that their at bats result in an outcome of a walk, home run or strikeout 43.7% of the time. Inversely, players with the lowest BABIP rates all have TTO rates of .225 or lower.

We can also derive more information from these numbers using this correlation. Players who have a low BABIP rate have a very high OPS. Remember, these players also have high TTO rates. The top 10 players, Judge, Sano, K. Davis, Souza Jr., Reynolds, Morrison, J. Upton, C. Santana, Lamb, and Stanton all have an OPS of .841 or higher. The players with the highest BABIP rates (or lowest TTO rates) have an OPS of .798 or lower.

BABIP rate can tell us a lot of about a player. Just by glancing at a player’s BABIP rate, you can have an instant idea of how often the player walks, strikes out, or hits dingers. Not only that, but it you can tell you a lot about their offensive production. High TTO rates usually mean high hard-hit rates along with high exit velocities. BABIP rate also helps understand BABIP itself better and teaches that you can’t judge a player by BABIP all the time. In most cases, players with an over-inflated BABIP (relative to past performances), just tend to mash the absolute heck out of the ball, as told by their low BABIP rates and high TTO rates. On the opposite end, players with a steady BABIP will have very high BABIP rates and tend to be contact hitters that put the ball in play and don’t hit for power. BABIP rate, along with its correlation to TTO rate, has the potential to be a powerful, tell all offensive stat.

Print This Post

Scientist by trade. Annual hopeful/idiotic Twins fan. Writing as a new hobby.

Why not just do 1-tto rate?

I would assume a high tto guy has more babip fluctuations since the sample size is smaller

You’re right, that’s something that didn’t occur to me while I was writing. I think the main point I wanted to get across in this article was maybe BABIP itself should be looked at a bit differently. I should have articulated that better.

1.”his BABIP rate has stayed nearly the same, meaning that he’s still making the same amount of contact with the ball despite fewer balls falling for hit in 2016″…

I think that consistent ball in play rates means that any gains or losses in any of the three “true outcomes” (plus HBP and Sacrifices), only one of which has any influence on the number of balls in play (versus the number of non-BIP events). AKA, if his K% increases, but his BB% decreases, his BIP Rate may increase despite making less contact.

2. “In 2017, one could say his high BABIP is a cause for concern, as he may just be getting lucky. However, his BABIP rate shows that isn’t the case.”

How does BIP Rate show that we shouldn’t expect “elevated” BABiPs to regress?

3. The reason your BABiP Rate (I think you mean BIP Rate) and TTO Rate don’t have an R2 of 1 is Hit by pitches and sac hits. BIPs are definitionally (almost) everything that isn’t a TTO.

4. “Players who have a low BABIP rate have a very high OPS”

They do? show this. (showing that the worst OPS amongst the qualified hitters with the 10 lowest BIP Rates > the best OPS amongst those with the 10 highest BIP Rates, while it tells at least part of some story, doesn’t explain your general statement here, especially when the gap is so small.)

5. “In most cases, players with an over-inflated BABIP (relative to past performances), just tend to mash the absolute heck out of the ball, as told by their low BABIP rates and high TTO rates. On the opposite end, players with a steady BABIP will have very high BABIP rates and tend to be contact hitters that put the ball in play and don’t hit for power.”

It seems like there’s some meat here. Is this true? if so, why? is it correlation or causation? etc.

6. Sorry I’ve been a bit harsh in my criticism, there is some interesting stuff here, but more work to be done to flesh it out. Keep it up, this is a good start. And thank you for sharing it.

I love all the points you’ve made, thank you for your feedback. I’ll try address a few things.

For #2 of your points, I think BIP rate (and yes, I should have been calling it that) can tell us that if a player is continually making fewer batted ball events, yet still has a high BABIP, that all of their hits are getting hit hard and far, usually to the gaps. BIP rate eliminates luck, in my opinion, because it doesn’t factor in errors. For #3, I would have to do more research and have a better spreadsheet. That’s something I can work on! My thoughts on #3 have a lot to do with my thoughts on #5.For #5, I’d argue it’s causation. Because power hitters like Judge and Sano hit the ball so hard and so far but also strikeout and walk a ton, they will always have high TTO rates, which means they’ll have less batted ball events. Players that have steady a BABIP implies that they but the ball in play more often because they have a bigger sample size or batted ball events, leading to high BIP rates and low TTO rates. I don’t know if that makes sense, but that’s what I can rattle off the top of my head.

“BABIP rate” is basically 1-TTO-rate. of course they’re correlated. this is basically as meaningless as showing “home run rate” and “non-home run rate” are correlated inversely

I should have spent less time on BIP rate in relation to TTO and more time on BIP rate in relation to BABIP itself.

What the article calls “BABIP rate” is actually just “BIP rate”, i.e. balls-in-play per plate appearance. In any case, the fact that the correlation is extremely high (“in fact, you’ll find that TTO rate and BABIP rate are almost the exact opposites of each other”) is not surprising, as the two quantities are almost exactly equal by definition.

Given…

BIP = AB+SF-(HR+K)

BIP_rate = BIP/PA

TTO = HR+K+BB

TTO_rate = TTO/PA

And

PA = AB+BB+HBP+SF

(This is the definition of PA used for calculating OBP).

I can add and subtract BB on the right hand side of the BIP definition, i.e.

BIP = AB+SF+BB-(HR+K+BB)

However, in this expression, AB+SF+BB = PA-HBP, and HR+K+BB = TTO. Therefore, by definition:

BIP = PA – HBP – TTO

or (moving TTO to the left hand side and BIP to the right hand side)

TTO = PA – BIP – HBP

or

—————————————

TTO_rate = 1 – BIP_rate – HBP_rate

—————————————

where HPB_rate = HBP/PA

Thus, the correlation between TTO_rate and BIP_rate would be exactly 1 if HBP was not included in the definition of plate appearances.

Generally over the history of the game contact and babip seemed to be somewhat inversely correlated, until the mid 90s ks went up and babip did too before it plateaued at 300.

That is probably a lot due to two strike approach. If you take a full cut with 2 strikes you have more Ks but also better contact quality and if you shorten up and reach for pitches there is more bad contact.

Also there likely is a selection bias, high k hitters who don’t make loud contact lose their job fast. And hard hitters tend to have higher babips unless they are extreme pull and flyball guys like bautista (pull means more shift when it is on the ground and fb it is often either a hr or out).