2016 has been a garbage year. At least, that’s what everyone seems to be talking about right now as the year draws to a close. But here in the baseball world, it’s been a banner year for many reasons, not the least of which is the new era of analysis that has arrived thanks to publicly available Statcast data. I, and I’m sure every other FG reader, have enjoyed following the quality Statcast analysis being developed in these electronic pages, particularly Andrew Perpetua’s “xStats”. In fact, I’m going to go ahead and stake the claim that I may have ‘coined’ (or at least influenced the creation of) the term xStats in the comments section of Andrew’s first xBABIP post. Inspired by the work of Perpetua, along with Alex Chamberlain (BIS-based xBABIP and xISO), and frequent leaguemate and Trevor-Story-lover Andrew Dominijanni (statcast xISO), I’ve decided to spend the offseason digging into xStats a bit deeper.
Perpetua has developed a great set of data using his binning strategy, most recently explained and updated this week, producing xBABIP, xBACON, and xOBA numbers based on Statcast’s exit velocity/launch angle data, along with the resulting ‘expected’ versions of the typical slash-line stats, xAVG/xOBP/xSLG. Throughout the year, I followed these stats fairly closely, often using ‘xStats’ to influence my fantasy baseball decisions. Given the opaque nature of translating a slash-line to actual fantasy stats, I generally went to the spreadsheet with the simple question “over- or under-performing?”, but that was about as far as I got. I found myself coming to probably-wrong conclusions such as “hey, maybe Sandy Leon isn’t actually that bad.” I was frustrated at my inability to turn a seemingly useful tool into actionable numbers for fantasy purposes.
This post serves as a starting point for that translation process. Way back in 2011, Jeff Zimmerman explained a basic approach for projecting R and RBI using only AVG, BB%, and HR% as inputs. I’ll similarly start here by coming up with simple models that translate rate stats (AVG, OBP, ISO) into fantasy-relevant ones, and then finally sub in the ‘x’ versions of those stats to come up with an ‘xFantasy’ line. I’ll stress that these are meant to be simple — I train the models based on all players that reached at least 300 PA in 2016, and I introduce a few team-related factors and shortcuts to improve fits, but I’m not looking to create a new Steamer or ZiPS here, just easy translations.
Starting with the surprisingly easy model, HR per PA is modeled well by ISO alone, with an R2 of .902 (excuse my simpleton’s application of statistics here; if you’re hoping for RMSE, p-values, etc., this will be a very disappointing post for you).
HR/PA = 0.2814*ISO – 0.01553
Runs and Runs Batted In
R and RBI per PA are interesting given their strong dependence on lineup position. To de-convolute that a bit, I’ve combined R+RBI into a single category (we can always separate them later). ‘R+RBI’ could be modeled using SLG alone, with an R2 of .758, but we can do better by separating SLG into AVG and ISO, and including terms for ‘team R+RBI total’ (player R/RBI totals are influenced by the team’s overall run production) and ‘average batting order position.’ Tanner Bell’s preseason post from this year explains and tabulates the influence of team offense and lineup position on R+RBI production. After doing some work to combine and normalize the data from Tanner’s tables, you can see the dependence of R+RBI/PA on lineup position can be roughly modeled as quadratic:
Average batting order position doesn’t appear to be easily accessible within the FanGraphs leaderboards, but thanks to the new ‘splits leaderboards’, it is possible to calculate with some elbow grease. Integrating all these factors to modify the original SLG model, R+RBI/PA is modeled by ‘SLGmod’ with an R2 of .807.
R+RBI/PA = 0.3292*SLGmod – 0.04751
SLGmod = AVG + 1.800*ISO + 2.061e-4*TeamR+RBI – 2.023e-3*ABO2 + 1.227e-2*ABO
TeamR+RBI = season total R+RBI for player’s team
ABO = average position of player in batting order
I mentioned that R+RBI could be separated later. Rather than demand the model predict the breakdown of R vs. RBI for each player, and introduce more sources of variation, I’m taking a shortcut here. The model calculates a value of x(R+RBI), and that is decomposed into R and RBI according to the actual proportion of R vs. RBI accumulated by the player in 2016. For instance, Mike Trout had 123 R and 100 RBI (223 R+RBI), and the model predicts 214.3 R+RBI, so we’ll give him (123/223)*214.3 = 118.2 R, and (100/223)*214.3 = 96.1 RBI.
SB per PA is a strange beast, a stat that’s much more dependent upon the whims and opportunities of the player and team than it is on the physical speed of the player. It can be tough to model given the large number of players that never run, or very rarely run. Much like SLG and R+RBI, I found that the SPD metric alone predicts SB/PA well, with an R2 of .662 when using a third-order polynomial fit. Is SPD cheating a bit? Maybe. For the uninitiated, it uses SB%, SB attempt frequency, triples percentage, and runs-scored percentage as inputs. You can see how SB/PA would fall directly out of that calculation, especially given the fact that teams tend to only turn runners loose on the basepaths if they are above a certain SB%. In any case, I’ll continue by modifying SPD to improve the fit, though the contribution of xStats to SB/PA will be much smaller than for the other stats.
Two rate stats serve to improve the fit, and they make intuitive sense: OBP, as players need to be on base in order to steal bases, and ISO, as players that hit for too much power tend not to spend as much time standing on first base, trying to steal second. I’ll again include a team factor, ‘team SB/PA,’ to quantify teams’ (or managers’) willingness to send runners, as well as ‘average batting order position,’ as players near the middle of the order tend not to steal as often. In this case I may have failed my initial criteria of a simple model, but it’s nevertheless a nice fit. Integrating it all into ‘SPDmod’, we can model SB/PA with an R2 of .834.
SB/PA = 0.2200*SPDmod3 – 0.3524*SPDmod2 + 0.2132*SPDmod – .04170
SPDmod = SPD/10 + 0.8206*OBP – 0.4670*ISO + 9.180*TeamSB – 9.192e-4*ABO2
TeamSB = average steals per plate appearance for player’s team
Does batting average need its own section? I’m just going to use xAVG.
Now that I’ve reinvented the wheel and created a sort-of-okay way to calculate a 5×5 line based on rate stats, it’s a simple matter of substituting in the Perpetua xStats versions of AVG, OBP, and ISO to arrive at an ‘xFantasy’ line. I’ve also done a quick calculation of 2016 $ values using my normal z-score method, along with x$ values to allow easy comparison (no positional adjustments to either of them, though). The full sheet with 429 players’ 2016 xFantasy stats is found here, and I’ll include below the top-10 and bottom-10 players* whose lines improved/declined most when using xStats:
As one might hope, the top of the list is populated by several of the players that were identified as xStats’ undervalued darlings in 2016, like Mauer and Morales. In Belt, we might be seeing a place where park factors could improve xStats, though the disparity between his 17 HR and 29 xHR is still hard to ignore. Meanwhile, at the bottom of the list, it seems likely that the xSB model fails to adequately predict the SB totals for MLB’s most prolific runners, with Villar, Hamilton, and Nunez all getting hammered in the xSB category. But, it’s also possible that this is a knock-on effect from speedy players getting an unfair shake in xOBP. With Blackmon, it’s certainly possible that this is the other end of the park-factor spectrum, with his 20 xHR flagging way behind the 29 HR he put up.
Finally, one might ask how we solve the ‘Gary Sanchez problem,’ and it’d be quite useful to see what xStats project for players that only played partial seasons, to get an idea of what they ‘should’ have done over a full complement of PAs. Much like the ‘Steamer600’ projections hosted here at FanGraphs, I’ve calculated xFantasy600 values, where each player’s xFantasy line is normalized to 600 plate appearances. Or in other words, in this case, we’re evaluating players on a per-PA basis. Below, we have the top 20 players by xFantasy600 (x$600) in 2016:
Some new names rise to the top here, with Trea Turner, Gary Sanchez, and Trevor Story checking in as the third- (!!!), eighth- (!!), and 16th- (!) best players by xStats in 2016. On the one hand, they all appear to have over-performed in 2016 (check their wOBA vs. xOBA scores), but even regressing back to xStats in 2017 would comfortably land them among the best players in fantasy. The rest of this list is generally a who’s who of the best players in baseball, outside of Rickie Weeks, who was apparently highly effective as a platoon player last year. It’s fun to see that Big Papi went out on top, as the king of xFantasy. Miggy comes in at a very close No. 2, and I’ve seen him kicking around as a second-rounder on some early 2017 rankings – he might be the biggest bargain in drafts this year if that holds up. Overall, I’m very satisfied with this list’s ability to peg the best fantasy players, outside of the potential issue of underrating SBs.
The next step in this process is to evaluate xStats and xFantasy as a predictive tool. Throughout 2016, I pondered the fact that xStats might tell you more about “what happened” rather than “what will happen.” However, it’s hard to resist the allure of using them to project forward in-season, as they should stabilize faster than their standard statistical counterparts. One thing I have theorized is that xStats might be most helpful in evaluating ‘new swing’ guys, ‘new pitch’ guys, or new call-ups, as we wouldn’t expect traditional projection systems to capture these sorts of things. Craig Edwards has actually released an exceedingly timely look at “Did Exit Velocity Predict Second-Half Slumps, Rebounds?” I’ve now started work on the next chapter of the xFantasy story, comparing first-half and second-half numbers for 2015/2016 (the ‘Statcast era’) using traditional stats, xStats, and Steamer projections (h/t to Andrew Perpetua for updating his sheet to include first/second-half xStats splits).
This first look at xFantasy was a fun exploration of rudimentary projections and xStats. Hopefully others find it interesting; hit me up in the comments and let me know anything you might have noticed, or if you have any suggestions.