A Discussion About Improving WAR

Jeff Passan is one of the most aggressive advocates for FanGraphs in the mainstream media, regularly citing data and concepts from our leaderboards and helping to educate the masses about different ways of viewing baseball. He’s certainly not an old-school guy who wants to be left alone with his pitcher wins and RBIs, and he’s more than happy to embrace new ideas supported by data. But he still has some problems with WAR, and specifically, the defensive component that can allow lesser hitters to be listed as among the most valuable players in the game alongside some of baseball’s greatest sluggers. To get an entire sense of his argument, read the whole piece, but here’s a selection that sums up his argument:

Defense does have its place in WAR. Just not in its present incarnation, not until we know more. Not until we can account for positioning on the field. Not until we can find out the exact speed a ball leaves a bat and how quickly the fielder gets a jump and the angle on the ball and the efficiency with which he reaches it. Not until we understand more about fielding, which will allow us to understand how to properly mete out value on a defensive play, which may take years, yes, but look how long it took us to get to this point, where we know more about hitting and pitching than anyone ever thought possible.

The hackneyed Luddites who bleat “WAR, what is it good for, absolutely nothing” should not see this as a sympathetic view. On the contrary, WAR is an incredible idea, an effort to democratize arguments over who was best. Bringing any form of objectivity to such singularly subjective statements is extremely challenging and worthwhile work.

Which is why this at very least warrants more of a conversation among those who are in charge of it. They’ve changed WAR formulas before. They’ll change them again. And when they do, hopefully the reach of defensive metrics will be minimized.

I don’t agree with everything Passan wrote in the piece, but his criticisms of the metric aren’t entirely off base. It is easier to evaluate run scoring than run prevention. WAR is flawed and an imperfect model. Some of the assumptions in the construction of the model may be entirely incorrect, and as we get more information, we may very well find that some of the conclusions that WAR suggested were incorrect, and maybe not by a small amount. Just as the statistical community is quick to highlight the problems with pitcher wins and RBIs, it is fair for Passan to highlight the problems with WAR, especially if the purpose of that discussion is to help improve the model.

So let’s talk about Passan’s suggestion to improve WAR. Primarily, he suggests lowering the value of defense in the calculation, perhaps by regressing a player’s calculated value by some degree. This isn’t the first time this has been suggested, and there are plenty of people who I respect who hold a similar opinion. It’s not a crazy suggestion, and it might even be a better alternative. But let’s work through the implication of that change so that we can evaluate the two methods side by side.

Right now, we hand out 1,000 WAR per 2,430 games — 30 teams each playing 162 games — and split it so that 570 of those 1,000 WAR go to position players, with the remaining 430 credited to the pitchers. This 57/43 split accounts for the fact that hitters are responsible the entirety of the half of the game that is scoring runs, and also some portion of the half of the game that is preventing runs. The fact that we’re giving position players 57% of the pie implies that we think run prevention is 86% pitching and 14% defense, but those numbers weren’t handed down on stone tablets and reasonable people could argue for a different proportioning between pitchers and fielders.

If, for instance, the defensive component of WAR was simply halved — suggesting that pitchers are 93% responsible for run prevention, with defenders making up just 7% of the pie — we’d have to take the credit for the runs prevented and move them from the position players to the pitchers, so instead of a 57/43 split, we’d have a 53.5/46.5 overall split between position players and pitchers in WAR. Perhaps that’s preferable, but is there evidence for a smaller gap between position players and pitchers than what we are currently using in WAR?

One way of testing this is to look at MLB’s actual payroll allocations. Back in February, Wendy Thurm helpfully broke down each team’s payroll, noting the totals and percentages that went to position players, starting pitchers, and relief pitchers. If we combine the totals for starters and relievers, and then combine line-up and bench, we can see what MLB teams have settled on as the position player/pitcher split, at least in terms of pay.

Overall, her numbers added up to just over $3.3 billion in total league expenditure on 2014 player salaries. Of that $3.3 billion, $1.9 billion was allocated to hitters and $1.4 billion was allocated to pitchers. The payroll split that teams have decided upon this year? 57/43, the same proportion we are currently using in WAR. While this is not any kind of definitive answer, we do not find evidence that the teams themselves are spending their money in a way that suggests that pitchers are closer in value to position players, which would be a necessary conclusion of constraining the defensive calculations.

And it’s not like the idea that position players are significantly more valuable than pitchers is a novel sabermetric concept. Beat writers have long argued that even the most elite pitchers are not strong MVP candidates because they don’t play everyday, and thus aren’t as valuable as a position player who both hits and fields every day for six months. If we think that constraining defensive value would improve WAR, we have to simultaneously argue that pitchers have been dramatically underrated (and underpaid) for quite some time, as the historical batter/pitcher split in payroll has persisted over the years.

That may be the correct position, but it is worth understanding that diminishing the value of defense in WAR means that we have to explain why teams are overvaluing position players and undervaluing pitchers when it comes to spending. Maybe they are, but I think it’s worth considering that we don’t have much in the way of evidence that teams buy into a smaller position player/pitcher split than what is currently modeled. The fact that the division of WAR matches the division of payroll isn’t a smoking gun, but it is at least a point that should make us pause before we consider whether or not the imperfections of WAR could be improved by moving value from the position players’ side of the ledger to the pitchers’ side.

In fact, there appears to be as much evidence that the 57/43 split underweights defensive value as there is that it overweights it. Robert Arthur noted the following a few weeks ago when he modeled the value of defensive metrics in a linear regression attempting to encapsulate ERA.

Just as with BP’s FRAA, the UZR-based dWAR of FanGraphs contributes some accuracy to our model of ERA. And, as with BP’s defensive metric, if any error is being committed, it’s that we are not weighting defense enough. For optimal accuracy, we should be accentuating the differences between players’ defensive statistics, not regressing them.

These results shouldn’t be entirely surprising. Defensive WAR is not a truth revealed from on high; it was designed (by very capable sabermetricians) with full knowledge of the fact that it improved our understanding of runs allowed. The coefficients which translate defensive play into runs weren’t chosen arbitrarily from a hat or a random number generator, but rather calibrated with at least some attention given to the resulting models’ ability to fit things like ERA. For this reason, we shouldn’t be surprised to find that our defensive metrics are well-suited to predicting ERA. Indeed, I would bet that the small error observed in both models (FG and BP), in which defensive metrics are perhaps slightly underutilized, is by design.

Considering this experiment, I don’t think that there exists any particular issue with the weighting of defensive WAR as a whole, despite Passan’s argument. There might be a problem with Alex Gordon’s dWAR in particular (or Adeiny Hechavarria’s, or whoever’s). Yet, the overall weighting of dWAR is reasonably accurate, or it would have been discarded for something different.

It is entirely correct to state that we lack the confidence in our defensive estimates that we have in our offensive estimates. However, that statement remains true even if we regress defensive metrics, and the reality may very well be that constraining the defensive component of WAR might very well make the metric less correct, not more so. Any total value metric, like WAR, will have to make some assumption about the value of a positional player’s defensive contribution. A smaller range of contributions might be more palatable, but may in fact be less correct.

Uncertainty goes both ways. There is not currently strong evidence that defensive metrics themselves are too aggressive in assigning 14% of the value of run prevention to position players. Just as the correct number might be 10% or 12%, it might also be 16% or 18%. I would suggest that there is more evidence in favor of something between 10-15% than there is for something between 5-10% (or 15-20%, for that matter), so we should at least be aware of the possibility that constraining defensive value would make WAR a worse model of player value, not a better one.

That said, this is not any kind of declaration that the 57/43 split is gospel and cannot be changed, or that we are not open to adjusting the defensive component of WAR in any way. We want the model to reflect the best possible use of the data we have, and of the public understanding of how baseball players should be valued. Certainly, there are flaws with the model that we would like to rectify, and we have had numerous discussions about how to implement changes that could address some of these issues.

Catcher framing, for instance, is something that will be a significant addition to WAR, and we have spent a lot of time thinking about the proper way to implement the very high values suggested by the framing metrics into WAR. We have not yet implemented it because we are not yet convinced that we have the best solution, so we have left the model incomplete and wildly incorrect for some players, while attempting to acknowledge that limitation along the way. The addition of framing — which probably isn’t too terribly far away at this point — will also have ramifications for how we think about pitcher WAR, as the acceptance of framing runs saved necessarily requires pitchers to receive altered credit for their contributions to walks and strikeouts at the least.

The reality is that disentangling run prevention is difficult and there is no magical solution that erases all of the models problems. Regressing the defensive component might be a worthwhile endeavor, and it’s something we have considered and will continue to consider. We understand that a lot of people prefer a runs allowed basis for pitching WAR, and have had many conversations about whether to alter the calculations that we use for pitchers. Passan’s critiques of the model might go too far, but he’s not wrong that the model has issues, and that there are areas to be improved upon.

We are not attempting to stand as gatekeepers of the current calculation, keeping better calculations out because this is what we have now. We’ve made a great number of changes to WAR over the years, from adding things like baserunning on non-stolen base plays for position players to crediting pitchers for their infield flies. Last year, we settled on a unified replacement level with Baseball Reference. The model is constantly being reviewed and updated, and our hope is that it continues to improve over time.

So I will put this question to our readers. Passan has suggested that WAR would be better if the defensive component was minimized, and that some of the credit for run prevention was moved from position players to pitchers. Do you agree? There’s a simple yes/no poll below, but I’m interested in hearing more in-depth responses as well. If you agree that a 57/43 split puts too much emphasis on position players contributions to run prevention, what is the batter/pitcher split that you would prefer? If you would prefer that we used a regressed version of a defensive component in WAR, how much would you regress the number, and to what mean would you regress it?

Our hope is that WAR is always as good a model as it can reasonably be. How would you make it better, considering the ramifications of the suggested changes? And can you show that changing the model would indeed make it better, and not just more palatable to our current perception of player value? If improvements to the model can be shown to be reasonable, they will be made. We are not ignorant of WARs imperfections, nor do we want them to continue any longer than need be. Our goal is to push the model forward, and we are open to suggestions on how to do just that.

We hoped you liked reading A Discussion About Improving WAR by Dave Cameron!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Dave is the Managing Editor of FanGraphs.

newest oldest most voted
Carl
Guest
Carl

I don’t understand why regressing the defensive component of WAR necessarily involves changing the hitter/pitcher WAR balance. Couldn’t you regress defensive performances to the average while leaving the total amount of WAR unchanged?

TKDC
Guest
TKDC

This is exactly what I was going to say. However, I’d only be in favor of this if it were done due to the belief that the metric is likely wrong for outliers. How exactly should it be regressed? Dan Uggla graded out as an above average defensive player in 2013 despite three consecutive years as below average, and getting way past his prime. Is that 6.5 number regressed the same as Leonys Martin’s 6.9 number this year? Are we just minimizing the differences between the best and worst outcomes? Aside from aesthetics, is there a good reason to do so?

Tangotiger
Guest
Tangotiger

If you regress ALL fielding performance to the average, then you are evaluating those players ONLY on their offense.

How can you then allocate the 57/43 split to nonpitcher/pitchers, if the nonpitchers absorb all the offense and the pitchers absorb all the defense?

You can decide on using only the positional adjustment (so all SS get the same +7.5 run credit). While that will move you away from a 50/50 split, it still won’t move you enough.

Alternatively, you can do a “team fielding”, so instead of 57/43, you do 50/7/43, and just not allocate any fielding (other than positional adjustment).

There’s alot to think about here.

Roger
Guest
Roger

Perhaps the positional adjustment should be a larger part of the 14% positon players’ contribution to run prevention, and UZR less so. This would reduce the role of defensive measurement in WAR until it becomes more accurate, without reducing the overall value of defense.

Jianadaren
Guest
Jianadaren

That’s exactly what regressing UZR to the mean would do. UZR would approach 0 and the positional adjustment would approach 100% of the 14%.

ReuschelCakes
Guest
ReuschelCakes

One of Dave’s points that is lost is that there is no evidence that these defensive values should be regressed – it is only our *suspicion* of the data that makes us want to…

Said differently, no one questions an outlier offensive year like Chris Davis’ 52 batting runs / 6.8 WAR in 2013. Because we can observe his discrete outcomes, his 0.348 ISO, his 0.370 OBP, his 29.6% HR/FB, etc… we KNOW that this season is an outlier… but we also KNOW that he really did hit 53 HRs and 42 2Bs…

For dWAR we do not as explicitly/intuitively KNOW the latter and therefore ASSUME that the outliers are measurement-based rather than outcome-based…

munchtime
Guest
munchtime

If you take a defensive metric (UZR, for example) and regress it to the mean before incorporating it into WAR, you certainly do not “evaluate players only on their offense”. It would create a smaller range of values, but there would still be a range of values.

What you are describing is replacing defensive metrics with a constant value for everyone. I haven’t seen anyone suggest doing that.

Jianadaren
Guest
Jianadaren

What he meant is that as you regress a defensive metric to the mean (0 WAR) you approach “evaluat[ing] players only on their [oWAR]” – i.e. offense and positional adjustment only – because the defensive metric component will approach zero.

The 7% defense component would be made more and more out of pure positional adjustment and less and less out of the metric.

Cool Lester Smooth
Guest
Cool Lester Smooth

But what if you give +7.5 to a guy who gets a +15 UZR, and a -7.5 to a guy who gets a -15 UZR. Wouldn’t that eliminate the problem you’re describing?

Cool Lester Smooth
Guest
Cool Lester Smooth

(I’m not saying that’s the correct amount of regression, to be clear)

Catoblepas
Guest
Catoblepas

Nope! Consider a hypothetical four-player league, with two pitchers and two hitters. 100 WAR is earned, split 57/43, with 50 going to hitters for hitting, 43 going to pitchers, and 7 going to hitters for fielding. Hitter A gets 28 offensive wins and 0 defensive wins, while Hitter B gets 22 offensive wins and 7 defensive wins. Hitter B is just barely worth more than Hitter A, 29 to 28 overall.
We decide that that’s way too wide a range of defensive value, and it needs to be regressed. Hitter A is regressed toward the mean, and Hitter B away, so that the new defensive values are 2 and 5, respectively. Now, we didn’t change the amount of fielding value given out, but without any change in performance, we now see Hitter A as more valuable, at 30 wins, vs. Hitter B at 27. Offense has become more important, since the range of defensive values has narrowed and consequently its impact on our evaluation of the players.
This is a super simplified example obviously, but as far as I understand it the same forces are at play in the league, so hopefully it helps.

Yirmiyahu
Member

Agreed. I don’t think the primary argument here is that WAR overvalues defense; it’s that UZR does a pretty clumsy job of measuring defense. It’s not unreasonable to halve the fielding runs that go into WAR, but that’s not because defense is less of a component; it’s because the numbers themselves are untrustworthy.

I think the more logical (and maybe more common) suggestion is to use multi-year and/or regressed fielding runs in the defense portion of things. I know Dave will counter with the argument that then we’re describing true talent rather than what actually *happened*, but I’ve never cared for that argument. And if you understand how UZR works, it’s not measuring what actually happened anyway (many plays are entirely thrown out of the data).

Cool Lester Smooth
Guest
Cool Lester Smooth

Yeah, I don’t have a problem with single-year WAR totals. The defensive numbers just have to be vocally taken with a massive grain of salt when we only have one year of data.

So, when talking Gordon and Trout (for instance) you say “WAR grades them out similarly, but seeing as how the defensive numbers for each player are massive outliers, we should probably err on the side of Trout,” rather than “well, this just shows that the mainstream isn’t valuing defense highly enough.”

We don’t necessarily do that right now, and it’s a problem.

Blue Wonder
Guest
Blue Wonder

It’s Brian McCann Guy. Right on.

Cool Lester Smooth
Guest
Cool Lester Smooth

Huh, it would be nice if there were also losers with way too much time on their hands who obsessively cataloged every time I was right, you know?

Besides being wrong about McCann, I’ve also been yelled at for suggesting that Martin Prado and Randall Delgado was an absurd underpay for a player of Upton’s caliber. And I was a huge booster of Denard Span and Jonathan Lucroy heading into this season, and Matt Carpenter heading into last year, but no one likes to talk about that.

EthanB
Member
EthanB

Yeah, this one was of my thoughts. For every Alex Gordon who might lose 5 runs of value, wouldn’t there be a Matt Kemp who picks up 5, thus leaving the total position player WAR unchanged?

Juicy-Bones Phil
Member
Juicy-Bones Phil

Instead of devaluing defense, couldn’t you increase the value of offensive actions? I’m sure there is a logarithm that would keep the values distinct but still close enough to not inflate pitching numbers.

Andy
Guest
Andy

It’s more likely to work the other way. If you increase the value of offensive events, you increase the value of defense. E.g., if that double is worth 1.5 net runs instead of 1.0, the player who prevents it gets credit for 1.35 runs and instead of 0.9 run.

But the larger problem is that you can’t arbitrarily change the value of offensive events. Their value is determined by their relationship to runs scored, the latter, of course, being a known, measurable quantity.