Eating Crow: xBABIP and the Shift

A few days ago, I looked at the effects the shift may be having on players using the difference between their BABIP and xBABIP. The observed drop in a player’s BABIP, compared to their xBABIP, was 41 points. As reader phoenix2042 pointed out, I was using a dated formula for xBABIP. By using an updated xBABIP formula, I still found a difference, but not as much of one.

The main problem with the using the old BABIP formula is that the league wide BABIP value has dropped over the last couple of years. Here are the BABIP values for all seasons that batted ball data is available here on Fangraphs:

Season: BABIP
2002: 0.293
2003: 0.294
2004: 0.297
2005: 0.295
2006: 0.301
2007: 0.303
2008: 0.300
2009: 0.299
2010: 0.297
2011: 0.295
2012: 0.295

The original xBABIP formula publish used 2007 to 2009 data which were some of the highest BABIP values for the years being examined. I got a hold of Robert Boden (aka: slash12) and asked him to re-run the xBABIP formula with recent data. He gladly re-ran it and here are his comments on the new formula:

I know more now than I did when I originally developed my first xBABIP equation. So I decided to go back to the drawing board and do something from scratch. I think the resulting equation should be significantly more accurate at calculating xBABIP. One big improvement is that it now incorporates bunt hits. I also re-worked the regression to use individual batted balls instead of batted ball percentages. This will result in a better, more accurate equation.

The logic of the new equation is a little different. You earn your bunt hits, and your infield hits, so you get 100% credit for these in your xBABIP. Likewise, if you hit an infield fly ball you get 0% credit for that. What remains are: line drives, outfield fly balls (non-home run), and ground balls (that weren’t infield hits), the equation assigns a expected BABIP to each of these remaining batted ball types.

He was able to keep the same basic formula and just change the year-to-year constants. Here is the formula and constants:

xBABIP = (( GB – IFH ) * (GB-IFH constant) + (FB-HR-IFFB) * (OFFB Constant) + LD * (LD Constant) + IFH + BUH ) / (GB + FB + LD + BU + – HR – SH)

Constants 2009 2010 2011 2012 2009-2011 avg
GB-IFH 0.221 0.161 0.182 0.159 0.195
OFFB 0.098 0.156 0.148 0.121 0.134
LD 0.763 0.800 0.710 0.750 0.740

I have re-created a spreadsheet that people can use to quickly calculate xBABIP for themselves (Appendix).

Using the new xBABIP formula, I re-ran the analysis. In addition to the new formula, I added 3 new players (Jose Bautista, Josh Hamilton and Adrian Gonzalez) to the data group. The group’s average BABIP, weighted to PAs, is 13 points lower than the groups xBABIP. The difference is significantly less than the 41 point difference I previously calculated.

Here an example player to show how much of an effect a shift may have on a player’s AVG:

Consider the following player:
600 AB
90 K (15% K%)
20 HR
10 SF
0.320 BABIP

The previous decline in BABIP by 0.041 dropped the player’s AVG to 0.267 from 0.300. Using the new value of 0.013 for the BABIP decline, the player’s AVG drops to 0.289.

With the recent drop in league wide BABIP, the previous xBABIP formula I used was dated. When I used it to calculate the difference in xBABIP and BABIP of players that are getting shifted, I found more of a difference than I should have found using an updated formula. Using the new formula, I still found a drop in BABIP, but just not as much of one.

Appendix

The following is a procedure for downloading and using the xBABIP spreadsheet. First download the spreadsheet from Google Docs by going to File, Download As and select the desired format (don’t select .csv). Open the spreadsheet in Excel or OpenOffice (they are the only two formats I verified). Next, go to a hitter’s Standard data (like for Dustin Pedroia). The Minor League data needs to be hidden by selecting the “Minor Leagues” link (red box in image). Select and copy all the yearly data (some funkiness happens with the career data).

Finally, open the downloaded spreadsheet and Paste the copied data into the spreadsheet (select/highlight the Yellow box that designates the location to paste this data). Some of the columns are hid in order to only show the data being used for the calculations.

Now the More Batted Ball needs to be copied and pasted like the Standard data. Paste the More Batted Ball data after selecting/highlighting the blue box.

The xBABIP values will be automatically generated in 5 different columns. You will need to match up the correct year from the raw data to find the corresponding xBABIP value. Besides the xBABIP value that is generated, the BABIP value is also calculated. Hopefully you find the information useful and let me know if you have any questions.




Print This Post

Jeff writes for FanGraphs, The Hardball Times and Royals Review, as well as his own website, Baseball Heat Maps with his brother Darrell. In tandem with Bill Petti, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @jeffwzimmerman.


7 Responses to “Eating Crow: xBABIP and the Shift”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. phoenix2042 says:

    This is awesome. Thanks so much for doing this. I think this should actually be a regular fangraphs post, because it concerns more than just fantasy implications, but a leaguewide strategy of increased shifting. Some commentators have been calling this the “year of the shift,” and I think you have shown here that it makes a clear difference in BABIP. The next thing I wonder is if it affects extra bases: I know some players talk about taking the ball the other way to beat the shift for infield singles that roll down the line where no one is playing. But that takes away the possibility of the extra base hit or HR down the pull side.

    Vote -1 Vote +1

  2. mcbrown says:

    Two quick reactions:

    1. The league-wide BABIP already reflects the overall amount of shifting being done, so comparing BABIP to xBABIP for an individual player where his xBABIP has been calculated from a regression over the entire league (i.e. one which includes both shifted and non-shifted defenses) may not actually tell us how the shift is affecting that player unless we know exactly how many more shifts he is facing than league average (and this is before we even consider the small sample size issues with an individual BABIP over even an entire season).

    2. The year-to-year volatility in the calculated xBABIP constants makes me rather uneasy. There doesn’t seem to be a good real-world explanation for why e.g. OFFB should have led to 60% more hits in 2010 than 2009, which leads me to believe the regressions may be over-fit.

    3. If 2012 is the year of the shift as many are saying, shouldn’t we expect the league-wide BABIP to be lower than last year? I find it puzzling that it isn’t, unless (drum roll) the general effectiveness of the shift has been overstated.

    Vote -1 Vote +1

    • mcbrown says:

      Er, 3 quick reactions.

      Vote -1 Vote +1

    • slash12 says:

      In response to your #2, I wonder if batted ball classification bias has anything to do with it. Perhaps in 2009 more “flyball/liner in betweeners” were classified as Line drives, and maybe in 2010 more of the same hits were classified as fly balls. Similarly GB’s look out of whack in 2009 as well (indicating more “ground ball/line drive in beetweeners” went the ground ball route. Do we know how accurate, and consistent the batted ball classification systems are? That would be another interesting thing to research.

      Vote -1 Vote +1

  3. etrain says:

    Great tool and study – I think Ichiro is an interesting case. Is BABIP a declining skill (waning speed, less solid contact)? If I did this right (and I think I did), Ichiro’s xBABIP for 2012 is .320 (a bit off from his career norms) and his BABIP is .263 leaving a -0.074 differential.

    So would one expect regression in Ichiro’s BABIP (in the best possible meaning of the word) or is this the sign of a rapidly declining skillset (or both)?

    Vote -1 Vote +1

  4. R M says:

    Interesting…

    On a side note, I really hate the expression “eating crow”. And I only see it on this site.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *