For a few years, it’s struck me as unusual that pitching and hitting metrics are asymmetric. If the metrics we use to evaluate one group (FIP or wRC+) are so good, why don’t we use them for the other?
One issue is that we’re not used to evaluating pitchers on an OPS-type basis, and similarly we’re not used to evaluating hitters on an ERA basis. Fine. But there’s a bigger issue: Why do pitching metrics put so much more emphasis on the removal of luck?
While most sabermetricians are aware of BABIP, and recognize the pervasive impacts it can have on a batting line, attempts to (precisely) adjust hitter stats for BABIP are surprisingly uncommon. While there do exist a few xBABIP calculators, these haven’t yet caught on en masse like FIP. And xBABIP doesn’t appear on player pages in either FanGraphs or Baseball Prospectus.
xBABIP itself isn’t even the end goal. What you probably really want is xAVG/xOBP/xSLG, etc. Obtaining these is a bit cumbersome when you need to do the conversions yourself.
Moreover, it strikes me that xBABIP cannot be converted to xSLG without some ad hoc assumptions. Let’s say you conclude a player would have gained or lost 4 hits under neutral BABIP luck. What type of hits are those? All singles? 2 singles and 2 doubles? 1 single, 2 doubles, 1 triple? The exact composition of hits gained/lost affects SLG. Or maybe you assume ISO is unaffected by BABIP, but this too is ad hoc.
At least to me, whenever a hitter performs better/worse than expected, we really care to know two things:
- Is it driven by BABIP?
- If so, what is the luck-neutral level of performance?
As I’ve attempted to illustrate, answering #2 is not so easy under existing methods. (Nor do people always even attempt to answer it, really.) Even answering #1 correctly takes a little bit of effort. (“True talent” BABIP changes with hitting style, so it isn’t always enough just to compare current vs. career BABIP. And then there are players with insufficient track record for career BABIP to be taken at face value.)
Compare this to pitchers. When a pitcher posts a surprisingly good/bad ERA, we readily consult FIP/xFIP/SIERA. Specific values, readily provided on the site. So why not for hitters?
Here I attempt to help fill this gap. The approach is to map a hitter’s peripheral performance to an entire distribution of hit outcomes. These “expected” values of singles, doubles, triples, home runs, and outs, can then be used to computed “expected” versions of AVG, OBP, SLG, OPS, wOBA, etc.
Recovering xAVG and xOBP isn’t that different from current xBABIP-based approaches. The main extension is that, unlike xBABIP, this provides an empirical basis to recover xSLG, and also xWOBA.
- Calculate players’ rates of singles, doubles, triples, home runs, and outs among balls in play. (Unlike some other BABIP settings, I count home runs as “balls in play” to estimate an expected number.)
- Regress each rate separately on a common set of peripherals. You’ll now have predicted rates of each for each player. (Keeping the explanatory variables common throughout ensures the rates sum to 100%.)
- Multiply by the number of balls in play (again counting home runs) to get expected counts of singles, doubles, triples, home runs, and outs.
- Use these to compute expected versions of your preferred statistics.
What explanatory peripherals are appropriate? Initially I’ve used:
- Line drive rate, ground ball rate, flyball rate, popup rate
- Speed score
- Flyball distance (from BaseballHeatMaps.com), to approximate power
- Speed * ground ball rate
- Flyball distance * flyball rate
These explanatory variables differ somewhat from those in the xBABIP formula linked earlier. The main distinctions are adding flyball distance (think Miguel Cabrera vs. Ben Revere) and using Speed score instead of IFH%. (IFH% already embeds whether the ball went for a hit. Certainly in-sample this will improve model fit, but it might not be good for out-of-sample use.)
|Spd||FB Dist/1000||FB Dist missing||(Spd*GB%)/1000||(FB Dist*FB%)/10000||LD%||GB%||FB%||IFFB%/100||Pitcher dummy||Constant|
- These are rates among balls in play (including home runs)
- Each observation is a player-year (e.g. 2012 Mike Trout)
- I’ve used 2010-2012 data for these regressions
- Currently I’ve only grabbed flyball distance for players on the leaderboard at BaseballHeatMaps. This is usually about 300 players per year, or most of the “everyday regulars.” (Fear not, Ben Revere/Juan Pierre/etc. are included.) The remaining cases get an indicator for ‘FB Dist missing.’
- LD%, GB%, FB%, and IFFB% are coded so that 50% = 50, not 0.50.
- Pitcher proxy = 1 if LD% + GB% + FB% = 0. Initially I haven’t thrown out cases of pitcher hitting, nor other instances of limited PA.
- Notice the interaction terms. The full impact of GB% depends both on GB% and Speed; the full impact of FB% depends on both FB% and FB distance; etc. So don’t just look at Speed, GB%, FB%, or FB Distance in isolation.
- Don’t worry that the coefficients on pitcher proxy “look” a bit funny for HR rate and Outs rate. (Remember that these cases also have LD%=0, GB%=0, and FB%=0.) In total the average predicted HR rate for pitchers is 0.01% and their predicted outs rate is 94%.
- Strictly speaking, these are backwards-looking estimators (as are FIP and its variants), but they might well prove useful in forecasting.
I next calculate xAVG, xOBP, xSLG, xOPS, and xWOBA. For now, I’ve simply taken BB and K rates as given. (xBABIP-based approaches seem to do the same, often.)
Early results are promising, as “expected” versions of AVG, OBP, SLG, OPS, and wOBA all outperform their unadjusted versions in predicting next-year performance. (At least for the years currently covered.)
Which players deviated most from their xWOBA? Here are the leaders/laggards for 2012, along with their 2013 performance:
|Name||2012 wOBA||2012 xWOBA||Difference||2013 wOBA||Name||2012 wOBA||2012 xWOBA||Difference||2013 wOBA|
|Brandon Moss||0.402||0.311||0.091||0.369||Josh Harrison||0.274||0.355||-0.081||0.307|
|Giancarlo Stanton||0.405||0.332||0.073||0.368||Ryan Raburn||0.216||0.290||-0.074||0.389|
|Will Middlebrooks||0.357||0.285||0.072||0.300||Nick Hundley||0.205||0.265||-0.060||0.295|
|Chris Carter||0.369||0.298||0.071||0.337||Jason Bay||0.240||0.299||-0.059||0.306|
|John Mayberry||0.303||0.238||0.065||0.298||Eric Hosmer||0.291||0.349||-0.058||0.350|
|Torii Hunter||0.356||0.293||0.063||0.346||Gerardo Parra||0.317||0.369||-0.052||0.326|
|Jamey Carroll||0.299||0.244||0.055||0.237||Daniel Descalso||0.278||0.328||-0.050||0.284|
|Cody Ross||0.345||0.291||0.054||0.326||Jason Kipnis||0.315||0.365||-0.050||0.357|
|Melky Cabrera||0.387||0.333||0.054||0.303||Rod Barajas||0.272||0.322||-0.050||–|
|Kendrys Morales||0.339||0.286||0.053||0.342||Cameron Maybin||0.290||0.339||-0.049||0.209|
Is performance perfect? Obviously not. The model does quite well for some, medium-well for others, and not-so-well for some. Obviously this is not the end-all solution for xHitting.
Some future work that I have in mind:
- A still more complete set of hitting peripherals. I’m thinking of park factors, batted ball direction, and possibly others.
- Testing partial-season performance
- Comparing results against projection systems like ZiPS and Steamer
Otherwise, my main hope from this piece is to stimulate greater discussion of evaluating hitters on a luck-neutral basis. Simply identifying certain players’ stats as being driven by BABIP is not enough; we really should give precise estimates of the underlying level of performance based on peripherals. We do this for pitchers, after all, with good success.
Above I’ve contributed my two cents for a concrete method to do this. A major extension to xBABIP-based approaches is that this offers an empirical basis to recover xSLG and xWOBA. While the model is far from perfect, even in its current form it generates “expected” versions of AVG, OBP, SLG, OPS, and wOBA that outperform their unadjusted versions in predicting subsequent-year performance. (Not just for leaders/laggards.)
Comments and suggestions are obviously welcome!