Over the last few years, Alex Chamberlain has published a series of posts exploring the concept of xISO. Like the most commonly known xFIP, this metric is supposed to be an “expected” ISO, based on batted ball metrics. Nobly, Alex kept his model quite simple, using only statistics available on the FanGraphs player pages: Hard%, FB%, and Pull%.
I have very little formal training in statistics, most of it is self-taught to help me in my day job, so I’m also going to keep things simple. Inspired by Alex’s work, I began to experiment with improving the xISO model. I started building linear models including more predictors, and even introduced higher order and interaction terms. While these all improved the model slightly, I didn’t feel that the added complexity was worth the slight improvement. Along the way, I noticed that, although Chamberlain makes mention of the correlation between first half xISO and end of season ISO, if I calculated first half xISO and compared to second half ISO, I would find the initial xISO model to be a worse predictor of second half ISO than the actual first half ISO.
As I was running these calculations, I also became acquainted with the publicly available Statcast data through Daren Willman’s Baseball Savant site. Although the gathering of input data becomes a bit more tedious, surely some combination of exit velocity and launch angle information would improve an xISO model, and perhaps produce something which produces a better correlation between first and second halves. Let us see!
First things first, since Statcast is so new, we only have one full season of data. Ideally, we could use multiple years of data to build the model, but for now, we’ll stick with 2015 full season to train the model. As it turns out, the Statcast parameter that correlates best with ISO is the average exit velocity for line drives and fly balls (LDFBEV). This makes sense, right? It also makes sense that we can exclude ground ball exit velocity in an ISO predictor. Launch angle seems to have some relationship with ISO, but it’s relatively weak.
So, we’ll hang our predictive hats on LDFBEV and see what else can help. After constructing various models, we can pretty quickly see that Pull%, Center%, and Oppo% don’t add much additional explained variance between model and data, nor do Soft%, Med%, and Hard%. This isn’t surprising, since we already have an objective hard contact measure. Ultimately, the one traditional batted ball statistic that helps is GB%. In fact, in the final regression, adding GB% nets us about 18% more explained variance between model and data. This also makes sense. It’s pretty hard to hit a ground ball double or triple, and really hard to hit a home run.
So we’re down to two predictors, GB% and LDFBEV. If we ran a regression with only these two predictors, we would undersell the players who hit the ball really hard. To solve this, we’ll simply include another term in the regression, simply the square of the exit velocity. Throw in a constant term, and we’re ready to run the regression using all 2015 qualified hitters (141 of them). Here’s what comes out:
First things first, we see an R-squared value of 0.75. This is pretty decent; it means our really simple model explains 75% of the variance of of the ISO data. The regression coefficients are as follows.
xISO = -0.358973*(GB) – 0.108255*(EV) + .00066305*(EV)^2 + 4.66285
With this equation, one can look up the relevant data on FanGraphs and Baseball Savant, and calculate the current xISO for any given player. We’ll get to that, but first, I think it’s important to check whether the new xISO model can do a better job predicting future performance than a player’s current ISO. One could also check how quickly xISO stabilizes, compared to ISO, but I won’t attempt that here. What I will do is produce the necessary splits for GB%, LDFBEV, and ISO from FanGraphs and Baseball Savant, calculate 2015 first half xISO for all qualified, and compare to second half ISO. Unfortunately, the number of qualifying players common to the first and second half in 2015 was only 109, but this is what we have:
It’s hard to see from the plot, but the R-squared values tell the story: first half xISO does a better job than actual first half ISO at predicting second half ISO. Interestingly, it seems that several players significantly increased second half ISO compared to first half xISO or ISO, and relatively fewer saw a large decrease. I don’t know why this is, but perhaps it is related to the phenomenon detailed by Rob Arthur and Ben Lindbergh on the sudden power spike in 2015.
Having roughly demonstrated the predictive power of our new xISO, let’s show its utility by looking at a few interesting 2016 performers, as of May 22nd:
Trevor Story: ISO = .327, xISO = .272
Domingo Santana: ISO = .142, xISO = .238
Troy Tulowitzki: ISO = .190, xISO = .182
Chris Carter: ISO = .349, xISO = .355
Christian Yelich: ISO = .205, xISO = .201
One of the first half’s great surprises, Trevor Story has a slightly inflated ISO, but he does hit the ball pretty hard, and does not hit many ground balls. While he probably won’t sustain an ISO north of .300, he’s a good bet to beat his Steamer ROS projected ISO of .191. Santana and Yelich are two guys who hit the ball hard, but are are held back by their ground ball tendencies. Chris Carter currently leads the pack in LDFBEV, and is a deserved second in ISO. Troy Tulowitzki fans: sorry, but it appears his days of .250 ISOs are a thing of the past.
So that’s it! We’ve got a cool new tool to use. Perhaps not surprisingly, I’ll be mostly using it for fantasy. Dedicated FanGraphs readers will also note that Andrew Perpetua has been doing work with Statcast data on “these electronic pages” recently as well. His use of launch angles introduces more sophistication into the models, but also more complication. My intent here is to present something which can be evaluated by anyone with a few clicks and a calculator. Please reach out with any qualms, criticisms, or suggestions for improvement!