A few days ago, I looked at the effects the shift may be having on players using the difference between their BABIP and xBABIP. The observed drop in a player’s BABIP, compared to their xBABIP, was 41 points. As reader phoenix2042 pointed out, I was using a dated formula for xBABIP. By using an updated xBABIP formula, I still found a difference, but not as much of one.
The main problem with the using the old BABIP formula is that the league wide BABIP value has dropped over the last couple of years. Here are the BABIP values for all seasons that batted ball data is available here on Fangraphs:
The original xBABIP formula publish used 2007 to 2009 data which were some of the highest BABIP values for the years being examined. I got a hold of Robert Boden (aka: slash12) and asked him to re-run the xBABIP formula with recent data. He gladly re-ran it and here are his comments on the new formula:
I know more now than I did when I originally developed my first xBABIP equation. So I decided to go back to the drawing board and do something from scratch. I think the resulting equation should be significantly more accurate at calculating xBABIP. One big improvement is that it now incorporates bunt hits. I also re-worked the regression to use individual batted balls instead of batted ball percentages. This will result in a better, more accurate equation.
The logic of the new equation is a little different. You earn your bunt hits, and your infield hits, so you get 100% credit for these in your xBABIP. Likewise, if you hit an infield fly ball you get 0% credit for that. What remains are: line drives, outfield fly balls (non-home run), and ground balls (that weren’t infield hits), the equation assigns a expected BABIP to each of these remaining batted ball types.
He was able to keep the same basic formula and just change the year-to-year constants. Here is the formula and constants:
xBABIP = (( GB – IFH ) * (GB-IFH constant) + (FB-HR-IFFB) * (OFFB Constant) + LD * (LD Constant) + IFH + BUH ) / (GB + FB + LD + BU + – HR – SH)
I have re-created a spreadsheet that people can use to quickly calculate xBABIP for themselves (Appendix).
Using the new xBABIP formula, I re-ran the analysis. In addition to the new formula, I added 3 new players (Jose Bautista, Josh Hamilton and Adrian Gonzalez) to the data group. The group’s average BABIP, weighted to PAs, is 13 points lower than the groups xBABIP. The difference is significantly less than the 41 point difference I previously calculated.
Here an example player to show how much of an effect a shift may have on a player’s AVG:
Consider the following player:
90 K (15% K%)
The previous decline in BABIP by 0.041 dropped the player’s AVG to 0.267 from 0.300. Using the new value of 0.013 for the BABIP decline, the player’s AVG drops to 0.289.
With the recent drop in league wide BABIP, the previous xBABIP formula I used was dated. When I used it to calculate the difference in xBABIP and BABIP of players that are getting shifted, I found more of a difference than I should have found using an updated formula. Using the new formula, I still found a drop in BABIP, but just not as much of one.
The following is a procedure for downloading and using the xBABIP spreadsheet. First download the spreadsheet from Google Docs by going to File, Download As and select the desired format (don’t select .csv). Open the spreadsheet in Excel or OpenOffice (they are the only two formats I verified). Next, go to a hitter’s Standard data (like for Dustin Pedroia). The Minor League data needs to be hidden by selecting the “Minor Leagues” link (red box in image). Select and copy all the yearly data (some funkiness happens with the career data).
Finally, open the downloaded spreadsheet and Paste the copied data into the spreadsheet (select/highlight the Yellow box that designates the location to paste this data). Some of the columns are hid in order to only show the data being used for the calculations.
Now the More Batted Ball needs to be copied and pasted like the Standard data. Paste the More Batted Ball data after selecting/highlighting the blue box.
The xBABIP values will be automatically generated in 5 different columns. You will need to match up the correct year from the raw data to find the corresponding xBABIP value. Besides the xBABIP value that is generated, the BABIP value is also calculated. Hopefully you find the information useful and let me know if you have any questions.