Altuve vs. Judge: Should Awards Be Based on Skill or Results?

Last week, before the awards, Bill James argued on Twitter that Altuve should be the MVP because Judge had the highest run value, but Altuve had the highest win value, because Judge had lost a couple wins by being un-clutch. Tom Tango supported him some in that argument.

We know that clutch exists and it does affect real-world wins. Stats like WPA try to cover that; however, we also know that the year-to-year correlation of clutch is pretty weak, at least with hitters, so it could be considered pretty random. For that reason many people in the Twitter discussion rejected Bill’s argument and said we should go with run-based values.

I thought like that too, but then again, we do often use results-based metrics.

For example, in pitching, many still use ERA or ERA+ to judge pitchers even though that is heavily affected by factors like luck and defense. Because of that, people have started to use context-independent stats like FIP with pitchers, although the writers are still split on this (although it starts to lean more the FIP way as evidenced by the Scherzer vs. Kershaw vote). We know a small part of FIP under- or over-performance is skill, but most is pretty random.

With hitters that is very different. While people see batting average as a stat that might be affected by luck, the more advanced hitting stats like OBP or wRC+ are usually seen as a hard skill, even though they are affected by randomness. Of course one factor is actual over-performing that is not sustainable (we know that a guy hitting 27% liners likely won’t repeat even if he really did it and thus his BABIP was earned), but there is also random BABIP luck, as well as HR/FB luck. But still, unlike pitching, the existing consistent independent stats like xwOBA are not used yet. Of course there are some good reasons for that, as ERA is also affected by team defense while hitters face a mix of defenses, so that hitting is a little less random, but still randomness can make a big difference. For example, Marwin Gonzalez and Chris Taylor were only expected to be a bit above league average by xwOBA, but their actual results were great. This not only affects their wRC+, but also their supposedly objective WAR.

I feel we don’t have a consistent stance here. In some cases we use context-dependent stats (like WAR for hitters or RA9 WAR for pitchers) and in other cases we use context-independent stats like FIP (although even that isn’t true as FIP is dependent on HR/FB luck to some degree – which could be corrected by using xStats-based HR rates). And if we want to base our awards based on results, then Bill James is correct and we probably should use context in our value metrics (WPA for example), but if we want to go by skill we need to ignore that. But we probably also need to change how we evaluate WAR, wRC+, and other supposedly objective stats.

Of course there is also a third way: we can accept that there is no one objective way to judge awards, and everyone has their own set of criteria. To me, that is a very valid solution, although it is not really appealing to me personally because if we argue like that, why not go with triple slash again? (That was a rhetorical question — I would hate that.)