xBABIP-BABIP Leaders and Laggards

How’s that for a nerdy title. At least you know what you’re going to get. Using slash12’s updated xBABIP formula outlined by Jeff Zimmerman yesterday (with the 2009-2011 constants in place), we can look at players that ‘should’ be showing better BABIPs than they are right now. Since the constants in the formula change a little from year to year, let’s use the list as a general guide to batted ball luck instead of a specific prescription for doom or boon for each player.

First, the laggards. With the way we’ve defined our list, these are the players that have enjoyed the best luck so far this season. Their xBABIPs are lower than their actual BABIPs, and the ball might not bounce their way as much in the coming months.

BABIP xBABIP xBABIP-BABIP
Mark Trumbo 0.368 0.294395833 -0.073604167
David Wright 0.39 0.328930481 -0.061069519
Paul Konerko 0.391 0.330159763 -0.060840237
Andrew McCutchen 0.37 0.311959538 -0.058040462
Melky Cabrera 0.399 0.347390135 -0.051609865
Kirk Nieuwenhuis 0.388 0.337238095 -0.050761905
Bryan LaHair 0.396 0.345792453 -0.050207547
Joey Votto 0.432 0.38387037 -0.04812963
Miguel Montero 0.321 0.275167939 -0.045832061
Mike Moustakas 0.315 0.269212121 -0.045787879

The first two names on this list encapsulate perfectly why this sort of analysis is worth doing. If you just looked at straight BABIP and didn’t figure in the player’s batted ball mix, you’d think that David Wright was worse off than Mark Trumbo. But Trumbo has hit almost half as many line drives as Wright, and he while they’ve hit exactly the same amount of fly balls, Wright has hit 17 more ground balls… and is faster to boot. Basic BABIP theory holds that faster guys that hit the ball on the ground more can have higher BABIPs. Uh-oh Trumbo.

Even though Wright, Andrew McCutchen and Melky Cabrera fit this mold — fast ground ball hitters — you have to be skeptical of their ability to keep it up all year. And in the case of Cabrera in particular, if his BABIP falls closer to his .347 xBABIP, his fantasy value will plummet. He’s a bit of an empty batting average.

In some cases, this analysis piles on. We know Kirk Nieuwenhuis has a bad strikeout rate, and that his glove is iffy in center field, which might mean less playing time when his whole outfield is healthy. Seeing the disparity between his xBABIP and BABIP underlines the fact that he’s more of a deep leaguer than a mixed leaguer. He’s not going to play everyday, and when he does, he might strike out too much, or see his balls in play find gloves more often.

In other cases, this list is almost mitigating. Bryan LaHair strikes out too much and almost has a .400 BABIP! But look, he actually hits more ground balls than fly balls, which is rare for a slugger. That alone could keep his BABIP around .340 if we believe his xBABIP. The slide might not be as bad as some think. In some leagues, he’s a sneaky acquisition if the price has fallen far enough.

Let’s look at the leaders, or the least lucky.

BABIP xBABIP xBABIP-BABIP
Eric Hosmer 0.222 0.314083744 0.092083744
Todd Helton 0.248 0.328020134 0.080020134
Ichiro Suzuki 0.263 0.337031373 0.074031373
Freddie Freeman 0.298 0.371503311 0.073503311
Orlando Hudson 0.24 0.313051948 0.073051948
Jose Bautista 0.199 0.271729282 0.072729282
Curtis Granderson 0.278 0.346567901 0.068567901
Justin Morneau 0.25 0.317463235 0.067463235
Jemile Weeks 0.252 0.312791262 0.060791262
Carlos Santana 0.264 0.321655405 0.057655405

Eric Hosmer was the inspiration for all of this, and with his 113 ground balls to 59 fly balls, he really should have a better BABIP. Jose Bautista is another name that leaps off the list, with his .199 BABIP, but his 18 infield fly balls have been bad to his xBABIP. Instead look at Freddie Freeman! He’s got a decent-looking .298 BABIP — what’s the problem. Well, he hits more ground balls than fly balls, and has the second-most line drives on this list. He’s only hit two infield fly balls, and he’s still young enough that his feet aren’t going to fail him now. At least not completely.

Do we trust this completely? Curtis Granderson and Justin Morneau don’t have great career batting averages anyway, and yet this analysis puts them as unlucky currently. Maybe this is why it’s a good idea to use this as a general guide. Yes, Todd Helton and Ichiro Suzuki have better xBABIPs than Eric Hosmer and Curtis Granderson, but wouldn’t you rather go with the younger legs?

Finally, we close with two young players that have disappointed in Jemile Weeks and Carlos Santana. Both have hit more ground balls than fly balls, and while Weeks has 14 more line drives than Santana, it’s the catcher that has more power — and also one more infield hit than the speedster! Both could be sporting better BABIPs, though, and the primary problem with each is their batting average. They’re useful in all leagues, with upside to be better than useful.




Print This Post

Graphs: Baseball, Roto, Beer, brats (OK, no graphs for that...yet), repeat. Follow him on Twitter @enosarris.


35 Responses to “xBABIP-BABIP Leaders and Laggards”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. pianow says:

    Love this piece. Great time to try to get Kid Weeks low. In some leagues, get him for free off the wire.

    Vote -1 Vote +1

    • JR says:

      I’ll post this here and see if anyone has any insight, re: Jemile Weeks.

      When he stole his 10th base of the season, he was leading the AL in SB (if I remember correctly). Since then, he has gone 28 games with a SB. He has gotten on base 43 times in 126 PA over that span, for a .341 OBP. But he’s had 0 SB and 3 CS.

      He had a “minor ankle injury” just before his last SB, and then a hip strain in early June. I’m not sure if he’s still not fully healthy, or what happened to his speed? Without the SB, he becomes *much* less useful in fantasy.

      Vote -1 Vote +1

    • BStu185 says:

      Jemile Weeks has already been owned by 5 teams (myself included) in my shallow bench 12 team roto league. Seems like a lot of guys want to believe that he’ll get better, but he just can’t be started right now. If he’s not even stealing bases, he is utterly useless.

      Vote -1 Vote +1

  2. stefangfg says:

    Said this before, will say again, BABIP is unhelpful in this context unless you break down contact intensity for each type of batted ball. In 2011 hard hit GBs went for hits 58% of the time while weak hit ones only 19%. Hard hit fly balls 40% of the time vs 5% for weak … We do have Contact/Batting F/X numbers avail. You can learn the speed off the bat and ball trajectory. The guy who left BP to work for the Astros did a series on this 11/2011. Please refine BABIP to incude contact intensity and quality before trying to use BABIP as a predictive stat!!!

    Vote -1 Vote +1

    • Eno Sarris says:

      The minute you add hard hit / soft hit into anything, you add even more stringer bias than you do with line drive percentage in there. If anything, I’d like to see a speed factor in there that’s not IFH, because that seems really random. Put Bill James speed score in there over IFH and it’s better.

      Put hard hit / softly hit, it’s more bias. If you’re talking about velocity and angle, you’re talking hit f/x, which is not available to the public.

      Vote -1 Vote +1

      • stefangfg says:

        What you call stringent I call refinement I guess. LDs btw are not always hard hit balls. Weakly hit LDs however tend to go for hits at almost the same rate as hard hit ones: in 2011 73% vs 68%. But weakly hit GBs and FBs rarely go for hits. Of course there is some bias if you are not actually using the hit f/x, but the discrepancy between weak and hard contact for GBs and FBs is SO HUGE if you discount the accuracy of the hard/weak determination by 50%, you still have a statistical refinement. And people won’t be scratching their heads about why their guy is so lucky/unlucky. Guys who hit the ball harder will get hits 25% more of the time on GBs and FBs.

        So there is some bias without access to hit f/x (is it released after the year ends?), but it is better than using a stat, BABIP, that’s devoid of crucial variable and leads to silly and misleading conclusions, ie, Hos my god he has been so unlucky for the past two months, how can that happen? The correct deduction is that Hoz is hitting weak ground balls, trying to pull the ball. You’ll confirm that deduction by looking at his Pull vs Oppo data and comparing it to last year.

        Vote -1 Vote +1

      • Eno Sarris says:

        ‘stringer’ not ‘stringent.’ as in, you have to pay someone to watch the game and decide which balls are weakly hit ground balls. I guarantee you that adds noise. If we watched the same game and recorded what we thought were weak line drives and hard hit line drives, you’d see very different results. Look at BIS. they have Line Drive (Fly ball), Fliner (liner), Fly (fliner) and all sorts of crazy definitions between a fly ball and a line drive… and the more is decided by an eye ball, the worse the confusion will be. If we could get batted ball angle off bat (vertical angle) and velocity, we’d be getting somewhere. I’ve seen batted ball distance and horizontal angle, but I’ve never seen this weak grounder vs hard-hit grounder you’re talking about, and if it comes from a stringer, I’m skeptical it refines much.

        Vote -1 Vote +1

      • stefangfg says:

        Here’s the link: http://www.mastersball.com/index.php?option=com_content&view=article&id=1926:taking-babip-to-the-next-level&catid=957:chance-favors-the-prepared-mind&Itemid=70

        With GBs and FBs you have a 40% hit discrepancy to deal with between hard and weak contact. You can drive a truck through that much “noise” and still come out with valuable cargo. Subjective, sure, but don’t fall prey to the fallacy of only looking for your keys under the streetlight. There’s valuable insights out there in the subjective darkness. At the very least, these empirical observations should force you to heavily qualify your use of BABIP, to recognize it’s limitations and understand how faulty a lucky/unlucky deduction is. Cameron came close to recognizing this with his timmy piece. Timmy is getting hit HARD so his BABIP is gonna be high. He’s not unlucky, he’s serving up meatballs and guys are making hard contact quite a bit.

        Aren’t you guys powerful enough to get access to hit f/x? Time to push for refinement or get left behind I think.

        Vote -1 Vote +1

      • Eno Sarris says:

        I have no idea what this means “In the name of full disclosure, I only have data classified as hard and soft for my personal use.”

        Who gave him that data? Did he classify it himself? Where did it come from.

        Here’s the great thing about ground balls and fly balls: they are easy to classify, and they are predictive ( http://www.fangraphs.com/blogs/index.php/525600-minutes-how-do-you-measure-a-player-in-a-year/ ).

        Here’s the bad thing about line drives: they are really hard to classify, and they are not predictive. (http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/)

        So you want to take ground balls and make them harder to classify? A weak versus a hard ground ball is by nature harder to classify. To me, that makes them more like a line drive. Which is less predictive.

        You can say that a hitter has hit a bunch of weak ground balls and that’s why his BABIP is bad. It’s another thing entirely to say that he will CONTINUE to hit weak ground balls and will continue to be bad. And then a third different thing to say that what you classified as a weak ground ball will continue to be classified as a weak ground ball.

        I barely want line drive rate in the xBABIP formula because line drive rates are not predictive and therefore muss the formula up. If you want to make ground balls and fly balls less easy to determine — I return to the fact that you and I would have really different ideas about what a weak ground ball, strong ground ball and a line drive are — then I doubt that they are more predictive because of it.

        If Todd wants to show that x degrees of vertical angle and x mph of speed off the bat predict that the hitter will continue to hit at x degrees of vertical angle with x mph of speed (and have xBABIP), then that might be interesting. That doesn’t seem to be what he’s saying. I have no idea what a weak ground ball is to him.

        Vote -1 Vote +1

      • slash12 says:

        How does speed help your babip outside of IFH and BUH? If it’s a ground ball out of the infield, it doesn’t matter how fast you are, it’s a hit. I think having IFH and BUH setup the way they are in the equation, it does reward players using speed to their advantage.

        Vote -1 Vote +1

      • stefangfg says:

        Ah Eno you keep harping on the subjective bias of the information, which is just a circumstantial thing. We can take that out of the equation with access to hit f/x. As far as predictive power, it should interest you to know that adjusted BABIP taking into account speed off bat and angle has an r squared double that of “dumb” BABIP, so double the predictive power. 80 mph seems to be the breaking point between hard and week contact. Please read the two article series by Mike Fast over at BP in Nov of last year. See this conclusion:

        Such an adjusted BABIP metric has a better split-half correlation for pitchers with at least 300 balls in play in 2008, with a correlation coefficient of r=0.26, compared to r=0.13 for normal BABIP. In fact, if we further adjust the BABIP by also crediting the pitcher with .536 of an out for every ball in play hit harder than 80 mph, we can improve the split-half correlation coefficient to r=0.34.

        Vote -1 Vote +1

      • Eno Sarris says:

        I have no problem with using Hit F/x to refine BABIP. It is my understanding that MLBAM is keeping Hit F/x numbers for its members alone. I do know that Baseballheatmaps has batted ball distance (but not time, so we don’t know the arc), and that HitTrackerOnline has batted ball velocity (and some other data), but the ‘whole picture’ that I’m looking for from Hit F/x is not available to non-baseball entities from my inquiries.

        Vote -1 Vote +1

      • jessef says:

        Taking linedrive% out of xBABIP essentially makes it useless. The reason that Joey Votto is good is because he hits linedrives. Just because data are qualitative doesn’t mean that they aren’t useful.

        Vote -1 Vote +1

    • slash12 says:

      Taking LD’s out of the equation really cripples it. Guys like Joey Votto, and Matt Kemp regularly put up huge LD%’s and that’s the reason they consistantly hit for high average. Yes it fluxuates a LOT, but there’s still some skill to hitting line drives. If you take out LD%’s almost everyone is going to fall in the (roughly) .290 -> 320 range or so, GB% and FB% just don’t have a large predictive power towards BABIP.

      Vote -1 Vote +1

      • Eno Sarris says:

        It’s true about line drives. I guess what I’m saying is that if we want to refine anything, I’d like to refine line drives so that they become more predictive and stable. I’m guessing stringer bias is at the heart of it.

        AS for speed, I’m not saying that speed helps outside of IFH or BUH — I’m saying that speed is not necessarily accurately reflected by IFH and BUH because there are inconsistent opportunities for IFH and BUH. So if we used, just as an example, bill James’ speed score, that might be more stable and more accurately reflect their true talent speed.

        I didn’t mean to speak ill of your work. It’s a great formula, and obviously I like it. This commenter had a specific issue so I talked about where I might go with the formula if I had better multi-variable regression skills.

        Vote -1 Vote +1

  3. kid says:

    Bautista’s BABIP is crazy. I get that hitting infield flies will suppress that, but still, .199? Sheesh.

    Granderson’s LD rate is 26% – could a BA rebound (to say .270 or higher) be coming?

    Vote -1 Vote +1

    • JR says:

      Maybe this has already been done, but I wonder what sorts of BABIPs three-true outcome guys generally have. Cause none of those three outcomes affects BABIP.

      Maybe everything Bautista hits hard this year is going over the fence? :-)

      Vote -1 Vote +1

      • Eno Sarris says:

        I do think three true outcome affects BABIP a little — in the denominator. Fewer balls in play. Also, a TTO hitter probably is a fly-ball hitter, which means lower BABIPs (see Bautista’s low BABIP).

        As for Granderson, I think there’s some chance he betters his BA for sure.

        Vote -1 Vote +1

  4. Andrew says:

    This is excellent, Eno.

    Vote -1 Vote +1

  5. AF says:

    And in the case of Cabrera in particular, if his BABIP falls closer to his .347 xBABIP, his fantasy value will plummet. He’s a bit of an empty batting average.

    I enjoyed the article, but the above kind of comment is a pet peeve of mine, as it fails to distinguish between the regression that is obviously going to occur, and a truly worrying fall in value. Obviously, Melky Cabrera, like every other player in the majors, is unlikely to hit .364 for the rest of the year. But according to ZIPS, he should hit .304 for the rest of the year– and that’s based on a projected BABIP of .333. If his BABIP instead matches his xBABIP of .347, and he otherwise meets his ZIPS projection, he will hit about .315 for the rest of the year, with an additional 8 HR and 10 steals, for a final line of around .340/97/14/73/20. That’s pretty damn valuable. It’s actually better than I would have expected from eyeballing his standard stats.

    Vote -1 Vote +1

    • Eno Sarris says:

      This is a great point. Don’t have much to add.

      Vote -1 Vote +1

      • AF says:

        I note your comment on Bryan LaHair, which makes almost the opposite point, even though xBABIP predicts about the same amount of regression: “The slide might not be as bad as some think.” What the LaHair comment gets right and the Melky comment gets wrong is making the call in relation to what “some think.” Everyone knows that hot hitters will regress. What the advanced stats help you do is assess whether a particular hot hitter is likely to regress more or less than what you’d predict just by eyeballing his conventional stats.

        Vote -1 Vote +1

  6. Dingbat says:

    Great column, but is there any way you can round off those xBABIP and xBABIP-BABIP numbers to 3 or 4 decimal places? All of those extra digits don’t add any information and simply make the numbers hard to read.

    +7 Vote -1 Vote +1

    • Eno Sarris says:

      I left those digits because I find them mostly irrelevant. Just look at the general placement. This xBABIP is not so refined that we ‘know’ it to even four decimal points.

      Vote -1 Vote +1

      • schmenkman says:

        Agree with Dingbat on this one. There is something to be said for 1) cleaning up the clutter to make it easier to read, and 2) presenting batting averages in the customary format (three decimals and no leading zero). In Excel that’s easy enough to do with the #.000 format, but I don’t know where this data is copied from (I’m assuming you didn’t manually enter all 9 decimal places).

        Vote -1 Vote +1

      • schmenkman says:

        *three decimal places, that is

        Vote -1 Vote +1

  7. Nate says:

    Looks like Coco Crisp just missed the cutoff with a disparity of .059. His LD% is down and GB% and FB% is up, which must take a little off its value, but a .219 BABIP must suggest some sort of progression. Does anyone see any stats suggesting anything different?

    Vote -1 Vote +1

  8. MikeInNJ says:

    OK, I know I’m probably missing something obvious here…but how can a BABIP be *lower than* a guy’s batting average? Bautista is sporting a BABIP well below his real BA and this is befuddling me. Help a confused brother out…???

    Vote -1 Vote +1

    • AF says:

      To get BABIP, you subtract HR from hits and K’s+HR from at-bats. So if you have a lot of homers and not too many strikeouts, it’s possible to have a BABIP lower than BA.

      For example, say you hit .250 with 11 HR and 20 Ks in 100 AB. Your BABIP is 25-11/(100-10-20) = .200, which is less than your .250 batting average.

      Vote -1 Vote +1

      • MikeInNJ says:

        Ahh, thanks. Good to know I wasn’t missing something obvious. That’s a bit different than what the glossary says which implies BABIP=H/BIP.

        “Batting Average on Balls In Play (BABIP) measures how many of a batter’s balls in play go for hits. While typically around 30% of all balls in play fall for hits, there are three main variables that can affect BABIP rates for individual players:”

        Vote -1 Vote +1

      • AF says:

        Mike, what you’re missing is that HRs are not balls in play. So it’s not H/BIP, it’s H-HR/BIP.

        Vote -1 Vote +1

  9. jdm says:

    This may be a dumb question but I am assuming this linear regression for xBABIP uses the data for all MLB players for multiple seasons and not a select group of hitters. Then this would suggest it represents an equation for the interaction between all hitters and pitchers. If this is the case, can the same concept, xBABIP-BABIP be applied to pitchers to analyze individuals? I feel like I’ve read an article on here that articulates the notion that pitchers have more control over BABIP than hitters (i.e. inducing weakly hit grounders), if true it would limit the usefulness of this metric, but are there some limited statistical applications of xBABIP to pitchers?

    Vote -1 Vote +1

    • jdm says:

      I am guessing that it would be a poor metric to use considering that BABIP is highly variable and ERA in relation to BABIP would depend be highly dependent on the interaction of LOB% and BABIP and isolating BABIP’s affect on ERA and considering xBABIP in a season wouldn’t be all that great of a predictor/meaningful but I’m just a little curious to your perspective on it.

      Vote -1 Vote +1

  10. Sam says:

    I think what could be even more useful, particularly for pitchers, in evaluating players and determing whether BA will regress or improve is taking a close look at BABIP by league and even division.

    Especially for pitchers NL vs AL could make a difference, or AL East vs AL west could also make a difference, particularly since AL East Parks such as Fenway and Yankee Stadium play hitter heavy (doubles off the monster anyone? That could drive a BABIP).

    I guess what I’m saying is that if we are trying, in a way, to predict a players BABIP over the course of the rest of the season in order to try and figure out whether he will sustain a BA or not, it might be helpful to know that a particular player is going to play a certain percentage of his remaining games against a certain set of opponents and in a certain set of parks and versuse a certain set of pitchers.

    If player A is going to play 60% of his games at Fenway and Yankee Stadium versus Petco and Safeco. Or if I guy is gonna get a bunch of games in the second half versus the royals pitching staff versus the nationals staff.

    Considering this might shed light on why a guy has a .368 BABIP versus another guy with a .250 BABIP. Sometimes it’s who your playing, 30-40 AB a year in the AL central versus Verlander can’t be good for these kind of metrics, you know?

    Vote -1 Vote +1

    • Josh says:

      Sam,

      Who is getting 30-40 ABs a year vs Verlander? At the most they might see him 4 times (12-16 ABs). More likely in the 6-12 range.

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>