## Sloan Analytics: Rosenheck on BABIP

Last weekend’s MIT Sloan Sports Analytics Conference included a number of Evolution of Sport presentations. Among the best was a study of BABIP factors titled “Hitting ‘Em Where They Are,” by Dan Rosenheck. He is the sports editor of The Economist and a writer for The New York Times’s Keeping Score column on sports statistics, and he gave an overview of his study prior to presenting it on Day Two of the conference.

——

**Dan Rosenheck:** “It was a great surprise to find out that one of the distinguished presenters on the Baseball Analytics panel was Voros McCracken. His discovery, in 1999, was that BABIP allowed by starting pitchers is, at the very least, extremely noisy and hard to predict from year to year. It was a revolution in sabermetrics and opened the door to a vast amount of research. It changed the way many of us understand the game.

“The BABIP question has been the Great White Whale of the sabermetric enterprise. It is the mystery that, 14 years later, has continued to defy the best efforts of quantitative analysts using public available data. Tom Tango’s FIP assumes that all pitchers have exactly league-average BABIP ability. Even a small increase in predictive ability of that question leads to a huge increase in the accuracy with which you can predict how valuable players will be.

“I studied a bunch of variables I thought might have something to do with hit suppression on balls in play. I came up with two — both FanGraphs stats — that seem to have significant predictive power. The first is pop up rate. The second is z-contact, which is when batters swing at a strike — balls in the strike zone — thrown by a pitcher. What percent of those times does the batter make contact? It turns out that, just like inducing pop ups, it reduces BABIP and correlates consistently year to year. Getting batters to swing and miss at your strikes has strong predictive power on hit suppression.

“I came up with a simple model with two curved fits, using data from 2005-2011, with an R-squared of .15. It accounted for 15 percent of the variance in BABIP for starting pitchers relative to rest of that team’s starting rotation. That factors out for defense and ballpark.

“Fifteen percent might not sound like a lot, and the data is noisy, but it’s a lot relative to zero, which is what FIP will tell you. This little equation correctly identifies every single major BABIP outlier of the last decade. If you look at its leader boards, the guys who most often appear as being projected to have the lowest BABIPs relative to their team, using only data from prior seasons — no cheating — it is Tim Wakefield, Ted Lilly, Barry Zito, Johan Santana, Matt Cain. It is the famous exceptions, one right after the other, after the other.

“The second thing is that it works out of sample. I calculated this equation in March 2012 on data from 2005-2011, and when I applied it to the 2012 season, the R-squared actually went up. It predicted the out-of-sample data even better than the in-sample data. There’s no over-fitting, no cheating or spurious relationships. This is real.

“The third thing that works well is you could have a 15 percent R-squared with a very narrow range of predictions. Let’s say you have the best guy at five points below his teammates and the worst at five points above. That might marginally improve your forecast, but it’s not game changing. This equation gets the magnitudes right. It can forecast very big outliers. The guys who have the lowest BABIPs — Chris Young when he was with the Padres, Jered Weaver now, some of the Ted Lilly seasons — it’s projecting these guys for 30, 40 points of BABIP below their teammates. Huge magnitudes, far and above what you would see in any of the standard projection systems like ZIPS, Steamer or PECOTA. I don’t think any of them are projecting anything close to 40 points of differential. And it’s getting them right.

“The reason the R-squared went up last year is that it made a very bold prediction that Jered Weaver was going to have a BABIP over 40 points lower than his teammates. It got it right to .001 of accuracy. That’s lucky, and just one great prediction, but overall it’s not just improving your accuracy at the margins. It’s identifying big outliers to a big degree.

“I will post my data online, so if anyone wants to poke holes in it, all the better for our understanding of this troublesome phenomenon. I think the best avenue for future research is looking at this equation — at basically the favorite and least favorite pitchers — and asking, ‘What do they have in common?’ The guys who have high pop up rates and low z-contact rates are the guys projected to be good hit suppressors, so what do they throw? How hard do they throw? Are they deceptive? And vice versa for the pitchers the equation doesn’t like.

“I had two hypotheses. I thought tall pitchers, like Young and Weaver, might be good at this. I also thought guys who throw a lot of changeups might be good at this. Cole Hamels and Johan Santana come up very high and they’re great changeup artists. But, in fact, the height and changeup percentage in my high and low BABIP samples were identical.

“I don’t have any piercing insights as to what the guys who are good at this are doing to be good at this. Fortunately, the data is available to everybody and the internet has plenty of smart people who can move our understanding of this issue even farther forward.”

Print This Post

*Interviews from Red Sox Nation*, was published by Maple Street Press in 2006. He can be followed on Twitter @DavidLaurilaQA.

Fascinating! I know SIERA takes BABIP “control” in to account. Does anyone know how much? Just curious, is it more or less than this model?

So how does this affect how I should be looking at a pitcher’s BABIP, FIP, xFIP, tERA, and SIERA? Thanks!

Studeis on those stats should have revealed it if one were especially better than the others. They haven’t, so you can continue to view them the same, though you might view one a s suggesting superior methodology for future research.

You should also be aware that this study produces descriptive not predictive stats.

No. Perhaps you missed the “out of sample” part.

SIERA’s the only one of those that tries to account for differences in pitcher BABIPs, but it could do a better job of that.

I figured out that zone contact% and popup rate were pretty important predictors of BABIP, so I used those (and line drive%) in basically an enhanced FIP I called BERA: http://www.fangraphs.com/community/index.php/introducing-bera-another-era-estimator-to-confuse-you-all/

My results showed that of the ones you listed, SIERA worked best for relievers and FIP worked best for starters. xFIP was a pretty good all-arounder. tERA was last place in everything. But my BERA formula beat both FIP and SIERA at their own strengths.

This isn’t really comparable to any of the ERA estimators out there. All of those equations (SIERA, xFIP etc.) test the relationship between ERA in a given year and its input variables in that same year. So sure, if you know a guy gave up a ton of line drives or got very few popups in a season, there’s a pretty good chance his BABIP was worse than average. But that doesn’t do you much good, because things like line drives allowed themselves are so noisy as to be virtually useless for predictive purposes. This is a purely predictive equation, that uses IFFB% and Z-Contact% from *past* years to forecast BABIP in *future* years. It’s basically comparable to the number you’d get if you looked at ZiPS or Steamer projections or whatever, and subtracted each starter’s forecast BABIP from that of the average of the rest of his team’s rotation.

I stood up and applauded my laptop.

Exciting stuff! I wish I could have gone to his presentation.

This seems similar to the pitcher x-BABIP predictor that Steve S. presented in Community Research.

Sounds neat. If it holds up hopefully we can have some kind of FIP that adjusts for this, so the numbers on guys like Matt Cain match better to what is actually going on.

CJ in Austin Tx–yes, it turns out we were studying the same questions and got similar results. I actually did this research a year ago, but didn’t get around to presenting it until now (though it was incorporated into my projections that won the 2012 Fangraphs forecasting competition). One big difference is that this is exclusively designed to be a projection of future BABIP, whereas Staude’s work followed the pattern of FIP, SIERA etc. and studied the relationships between BABIP and a host of variables within the same season. I believe he said he would publish more predictive work later on, but I haven’t seen it yet. Also, I don’t think he’s factoring out for park and defense, at least not using the same method that I do.

Steve 1–We *do* have “some kind of FIP that adjusts for this”–that’s eXactly what SIERA and tRA and all those formulas attempt to do. My criticism of those equations is that they are of limited use, since some of their inputs are just as noisy as BABIP itself.

Well, my BERA formulas which incorporate Zone Contact% and popup rates are predictive, as I viewed next season’s ERA as one measure of “true ERA.” But I handcuffed myself to using one season’s worth of data to draw these conclusions, as in FIP and SIERA, so it’s not as good as it could be. (You probably know all this, Dan — I was just clarifying for everybody).

You’re right that I didn’t factor out park or defense, though. The predictive model of BABIP I was working on had an R-squared of 0.164, though it’s an apples-to-oranges comparison to your 0.15, as mine is not relative to just a pitcher’s teammates.

I am a die-hard Angels fan and I’ve seen Jered pitch many times. It is clear that his cross-body motion provides some deception to his delivery, so that could be a contributor.

Jered also appears to enter each game well prepared, i.e. knowing the enemy. He has some great speed differentials between his fastball (89-90) and his curveball which I’ve seen touch the high 60’s at times. He mixes in his slider and change up quite well.

He also seems to throw from the exact same arm slot on all pitches making it a guessing game for the hitters.

The game plan combined with the deceptive delivery, speed differentials, mix of pitches, and repeated arm slot really seem to confound hitters. I’d say all of those factors contribute to the phenomenon you see with Jered.

Lucky us!

Attention, Steamer! Maybe roll this into the 2014 projections?

This has our attention, no question.

Dang. I was hoping you wouldn’t notice.

I don’t get the point. What Voros McCracken found is that BABIP is inconsistent for a pitcher year by year. That means there is no pattern in BABIP and it is randomly distributed.

But now what Dan Rosenheck said is basically that he found two consistent variables that can predict BABIP. So two consistent variables drive BABIP, which is inconsistent?

Price results in sales, and if price is consistent sales should be consistent right? If an independent variable is consistent while a dependent variable is not, you shouldn’t see such a good R-squared, should you?

I think what you’re pointing out is that this BABIP model is descriptive, taking year n variables to get year n BABIP. If you want a predictive (as opposed to descriptive) model, you need to account for year-to-year correlations in the variables themselves.

As far as I know, no one has ever found IFFBs to be a repeatable skill for pitchers in general, so that suggests that the present model won’t actually be as successful in prediction as description. Getting swings and misses on balls in the zone probably is a repeatable skill, so there will be some predictive value here.

Thanks for the explination. I could be wrong but as per my understanding, if it shows that Z-contact% is highly correlated year by year, that only tells you that Z-contact% is a repeatable skill.

If you are talking about correlation between z-contact% and BABIP, my question would be the same. How can a consistent variable be highly correlated to a inconsistent variable? For example, Greg Maddux’s BABIP from 1995 to 1999 are .244 .280 .280 .262 .324. FG only provides z-contact% after 2002. For Maddux, it was quite consistent at about 89%. You won’t be able to get a meanful coefficient by taking his almost-random BABIP and 89%.

Maybe I’m the one missing the point, but I thought this research dealt with the relationship between a pitcher’s BABIP and his rotation-mates’ BABIP. I.e. Maddux’s z-contact% predicts a BABIP of +-X% relative to the other pitchers on the team. Does that make sense?

Well, then still, if a pitcher’s BABIP is random year by year, is it possible to “predict” the difference between his and his teammates’?

I mean, it’s not possible to “predict” a random number, or the relation between a random numebr and others.

Unless BABIP is not random?

“Unless BABIP is not random?”

Dun dun DUNNNNNNNN

random is not the same as inconsistent.

I am sorry for the language. What I meant is inconsistent, not random. I am not a native English speaker and I would appreciate if someone can explain to me what “dun dun dun” means. Thanks!

its an… onomatopoeia? that implies a significant and suspenseful twist in a plot. It comes from movies in the earlier days of cinema where commonplace sound effects were used to illicit particular emotions. In this case Inspector Gadget (whose name references a television show that parodied such movies) was using the phrase to imply that you had probably never considered that the point of this article was that BABIP wasn’t random. the minor semantic point for you of Inconsistant vs. Random (its totally not your fault if you aren’t a native speaker) completely changes the nature of your statement. The article is telling us in an advancement in predicting BABIP on a case-by-case level, this, by its nature, means that BABIP isnt random and can be predicted. It still can’t be predicted very well (in comparison to some other skills) because it is very inconsistant.

Whoa, what? IFFB’s most certainly are a repeatable skill for pitchers. I get year-to-year correlations (on IFFB/(PA-HBP-BB-K-Bunts)) in the same ballpark as BB. Who told you otherwise?

My equation is purely predictive–it uses only Z-Contact% and IFFB% from past years to forecast next year’s BABIP (relative to rotation-mates).

Thanks for the clarification, Dan. That was really confusing the hell out of me because as read it I thought your full comments made that pretty darn clear.

I hate to be a dick, but I just can’t resist.

Great, another excellent piece of research correcting prior assumptions, which 99% of Fangraphs writers will completely ignore.

I recall reading an article in Hardball Times that found certain pitchers had consistently higher infield fly ball rates. And it makes some sense that it would be a repeatable skill that is associated with throwing 4 seam fastballs high in the zone. However, I have to think it is acting as proxy variable of some kind in the regression discussed in the article, because the number of infield fly balls is so small that the impact on BABIP is likely to be minor. For example, Justin Verlander is one of the highest IF flyball pitchers; yet the 35 infield flies that he induced is a small share of the 956 batters he faced.

That’s not right at all. Popups are basically equivalent to strikeouts; they’re virtually automatic outs and runners cannot advance. True talent among starters ranges from around 1.5% of batted balls to 6.5%. All other things equal, that’s a gap between the best and worst of 5% of batted balls in which the best will have a BABIP of .000 and the worst will have a BABIP of ~.300. So that’s 15 “guaranteed” points of BABIP right there.

I suspect that pitchers who generate high popup rates also probably induce slightly weaker contact than average in general, and thus have a marginally better-than-average BABIP on their non-popup batted balls as well.

Basically, Voros couldn’t really figure out what caused the variability in BABIP. Rosenheck is saying he can account for some of it.

I can’t wait to see the data posted online that supports this. I have always intuitively thought “If a guy gets a lot of swings and misses in the zone he must have good and/or effective/deceptive ‘stuff'” Glad to see this hypothesis holds water for babip suppression/weak contact possibly.

This is really interesting work. What is the equation?

I’m not sure I want to put it up publicly–I put a lot of time into this, and I’ve got a high-stakes fantasy league to win! I can email it if you promise not to redistribute.

I wonder if anyone has ever compared a pitchers actual BABIP against the seasonal BABIP of all the batters he’s faced. By averaging the season BABIPs of all opposing batters I wonder if you would reach some sort of baseline for what a pitchers BABIP “should” have been. Since the composite of all the batters BABIP will be different every year, I don’t think this would be very predictive, but there might be a trend over time where a pitcher might be consistently over or under his expected BABIP based on that composite of those seasonal averages.

The Economist has a Sports Editor?

Indeed. I edit our sports blog, Game Theory (http://www.economist.com/blogs/gametheory), which I would strongly encourage you to bookmark/sign up for RSS/follow on Twitter at @EconSports. I also write most of our (occasional) print sports stories.

Hi Dan,

Will you be releasing any of these award-winning projections for the 2013 season?

Furthermore, is there somewhere we can follow your work?

Sincerely,

A big fan of yours since 3 minutes ago when I read this article.

I haven’t done my 2013 projections yet. When I do, I don’t plan to post them publicly–again, I’m trying to win a fantasy league here, and a lot of my competitors read Fangraphs. But feel free to contact me privately.

Well, most of my sportswriting now appears on Game Theory, The Economist’s sports blog (http://www.economist.com/blogs/gametheory). All the baseball and most of the basketball writing there is by me (all signed with my initials D.R.) I also still do occasional columns for the New York Times’s Keeping Score column, and if you search the NYT website for my name you should get them all. When I have research worth reporting, I tend to present it at the MIT Sloan conference (as I did here, in 2012 at http://www.sloansportsconference.com/?p=8069 and in 2011 at http://video.mit.edu/watch/ease-of-domination-dan-rosenheck-7210). I don’t have a personal website or Twitter account or anything, though.

huh? you have to release your projections. everyone who does this kind of thing is in leagues. just man up and do release.

not releasing your projections sounds like you are not ready to be critiqued when you are wrong.

by the way, i love what you did and i’m rooting for it.

starting pitchers only.

Interesting stuff and would love to see the forumla you’ve devised, though this shouldn’t be groundbreaking for too many folks. Pop ups almost always go for outs so of course they’re going to suppress BABIP and guys that miss bats in the zone usually do so because of good stuff that is hard to square up. Softly hit balls, whether in the air on the infield or on the ground should lead to a lower BABIP. Still, always nice to read someone confirm this intuition and thanks for the hard work.

Have you tried adding any other variables to your regressions? do they improve or impair the R^2? Im thinking specifically about Edge%, Velocity, Well-Hit Avg… etc. Skills that, unlike LD%, are relatively consistent year-to-year

I did try with a broader range of variables, including velocity, but the accuracy gain was minimal and the equation would have gotten much more complicated. I like to keep it simple.

I agree — velocity is pretty much a non-factor in BABIP, it appears (it is a major factor in K%, though). Based on my research, I think the biggest thing that pitchers measurably do to lower their BABIP is to have good “rise” to their pitches, which causes batters to pop them up.

http://www.fangraphs.com/community/index.php/babip-and-innings-pitched-plus-explaining-popups/

http://www.fangraphs.com/community/index.php/proejcting-babip-using-batted-ball-data/

Great, thanks for the response guys. I am curious though in particular with Edge%, tracking of this is extremely new however, and as far as I know, almost no y-t-y data or anything else has been studied with it, regardless however, did either of you try it?

Yeah, I was talking about trying it out when it came out, but I’ve been busy with other assignments since then. I was also saying I’d like to see the vertical equivalent of it. I’ll get around to it eventually.

awesome

If you take the study a step further, you’ll find that although these types of pitchers are adept at reducing BABIP, when they do give up hits they are far more likely to go for extra bases and for home runs.