A Discussion About Improving WAR

Jeff Passan is one of the most aggressive advocates for FanGraphs in the mainstream media, regularly citing data and concepts from our leaderboards and helping to educate the masses about different ways of viewing baseball. He’s certainly not an old-school guy who wants to be left alone with his pitcher wins and RBIs, and he’s more than happy to embrace new ideas supported by data. But he still has some problems with WAR, and specifically, the defensive component that can allow lesser hitters to be listed as among the most valuable players in the game alongside some of baseball’s greatest sluggers. To get an entire sense of his argument, read the whole piece, but here’s a selection that sums up his argument:

Defense does have its place in WAR. Just not in its present incarnation, not until we know more. Not until we can account for positioning on the field. Not until we can find out the exact speed a ball leaves a bat and how quickly the fielder gets a jump and the angle on the ball and the efficiency with which he reaches it. Not until we understand more about fielding, which will allow us to understand how to properly mete out value on a defensive play, which may take years, yes, but look how long it took us to get to this point, where we know more about hitting and pitching than anyone ever thought possible.

The hackneyed Luddites who bleat “WAR, what is it good for, absolutely nothing” should not see this as a sympathetic view. On the contrary, WAR is an incredible idea, an effort to democratize arguments over who was best. Bringing any form of objectivity to such singularly subjective statements is extremely challenging and worthwhile work.

Which is why this at very least warrants more of a conversation among those who are in charge of it. They’ve changed WAR formulas before. They’ll change them again. And when they do, hopefully the reach of defensive metrics will be minimized.

I don’t agree with everything Passan wrote in the piece, but his criticisms of the metric aren’t entirely off base. It is easier to evaluate run scoring than run prevention. WAR is flawed and an imperfect model. Some of the assumptions in the construction of the model may be entirely incorrect, and as we get more information, we may very well find that some of the conclusions that WAR suggested were incorrect, and maybe not by a small amount. Just as the statistical community is quick to highlight the problems with pitcher wins and RBIs, it is fair for Passan to highlight the problems with WAR, especially if the purpose of that discussion is to help improve the model.

So let’s talk about Passan’s suggestion to improve WAR. Primarily, he suggests lowering the value of defense in the calculation, perhaps by regressing a player’s calculated value by some degree. This isn’t the first time this has been suggested, and there are plenty of people who I respect who hold a similar opinion. It’s not a crazy suggestion, and it might even be a better alternative. But let’s work through the implication of that change so that we can evaluate the two methods side by side.

Right now, we hand out 1,000 WAR per 2,430 games — 30 teams each playing 162 games — and split it so that 570 of those 1,000 WAR go to position players, with the remaining 430 credited to the pitchers. This 57/43 split accounts for the fact that hitters are responsible the entirety of the half of the game that is scoring runs, and also some portion of the half of the game that is preventing runs. The fact that we’re giving position players 57% of the pie implies that we think run prevention is 86% pitching and 14% defense, but those numbers weren’t handed down on stone tablets and reasonable people could argue for a different proportioning between pitchers and fielders.

If, for instance, the defensive component of WAR was simply halved — suggesting that pitchers are 93% responsible for run prevention, with defenders making up just 7% of the pie — we’d have to take the credit for the runs prevented and move them from the position players to the pitchers, so instead of a 57/43 split, we’d have a 53.5/46.5 overall split between position players and pitchers in WAR. Perhaps that’s preferable, but is there evidence for a smaller gap between position players and pitchers than what we are currently using in WAR?

One way of testing this is to look at MLB’s actual payroll allocations. Back in February, Wendy Thurm helpfully broke down each team’s payroll, noting the totals and percentages that went to position players, starting pitchers, and relief pitchers. If we combine the totals for starters and relievers, and then combine line-up and bench, we can see what MLB teams have settled on as the position player/pitcher split, at least in terms of pay.

Overall, her numbers added up to just over $3.3 billion in total league expenditure on 2014 player salaries. Of that $3.3 billion, $1.9 billion was allocated to hitters and $1.4 billion was allocated to pitchers. The payroll split that teams have decided upon this year? 57/43, the same proportion we are currently using in WAR. While this is not any kind of definitive answer, we do not find evidence that the teams themselves are spending their money in a way that suggests that pitchers are closer in value to position players, which would be a necessary conclusion of constraining the defensive calculations.

And it’s not like the idea that position players are significantly more valuable than pitchers is a novel sabermetric concept. Beat writers have long argued that even the most elite pitchers are not strong MVP candidates because they don’t play everyday, and thus aren’t as valuable as a position player who both hits and fields every day for six months. If we think that constraining defensive value would improve WAR, we have to simultaneously argue that pitchers have been dramatically underrated (and underpaid) for quite some time, as the historical batter/pitcher split in payroll has persisted over the years.

That may be the correct position, but it is worth understanding that diminishing the value of defense in WAR means that we have to explain why teams are overvaluing position players and undervaluing pitchers when it comes to spending. Maybe they are, but I think it’s worth considering that we don’t have much in the way of evidence that teams buy into a smaller position player/pitcher split than what is currently modeled. The fact that the division of WAR matches the division of payroll isn’t a smoking gun, but it is at least a point that should make us pause before we consider whether or not the imperfections of WAR could be improved by moving value from the position players’ side of the ledger to the pitchers’ side.

In fact, there appears to be as much evidence that the 57/43 split underweights defensive value as there is that it overweights it. Robert Arthur noted the following a few weeks ago when he modeled the value of defensive metrics in a linear regression attempting to encapsulate ERA.

Just as with BP’s FRAA, the UZR-based dWAR of FanGraphs contributes some accuracy to our model of ERA. And, as with BP’s defensive metric, if any error is being committed, it’s that we are not weighting defense enough. For optimal accuracy, we should be accentuating the differences between players’ defensive statistics, not regressing them.

These results shouldn’t be entirely surprising. Defensive WAR is not a truth revealed from on high; it was designed (by very capable sabermetricians) with full knowledge of the fact that it improved our understanding of runs allowed. The coefficients which translate defensive play into runs weren’t chosen arbitrarily from a hat or a random number generator, but rather calibrated with at least some attention given to the resulting models’ ability to fit things like ERA. For this reason, we shouldn’t be surprised to find that our defensive metrics are well-suited to predicting ERA. Indeed, I would bet that the small error observed in both models (FG and BP), in which defensive metrics are perhaps slightly underutilized, is by design.

Considering this experiment, I don’t think that there exists any particular issue with the weighting of defensive WAR as a whole, despite Passan’s argument. There might be a problem with Alex Gordon’s dWAR in particular (or Adeiny Hechavarria’s, or whoever’s). Yet, the overall weighting of dWAR is reasonably accurate, or it would have been discarded for something different.

It is entirely correct to state that we lack the confidence in our defensive estimates that we have in our offensive estimates. However, that statement remains true even if we regress defensive metrics, and the reality may very well be that constraining the defensive component of WAR might very well make the metric less correct, not more so. Any total value metric, like WAR, will have to make some assumption about the value of a positional player’s defensive contribution. A smaller range of contributions might be more palatable, but may in fact be less correct.

Uncertainty goes both ways. There is not currently strong evidence that defensive metrics themselves are too aggressive in assigning 14% of the value of run prevention to position players. Just as the correct number might be 10% or 12%, it might also be 16% or 18%. I would suggest that there is more evidence in favor of something between 10-15% than there is for something between 5-10% (or 15-20%, for that matter), so we should at least be aware of the possibility that constraining defensive value would make WAR a worse model of player value, not a better one.

That said, this is not any kind of declaration that the 57/43 split is gospel and cannot be changed, or that we are not open to adjusting the defensive component of WAR in any way. We want the model to reflect the best possible use of the data we have, and of the public understanding of how baseball players should be valued. Certainly, there are flaws with the model that we would like to rectify, and we have had numerous discussions about how to implement changes that could address some of these issues.

Catcher framing, for instance, is something that will be a significant addition to WAR, and we have spent a lot of time thinking about the proper way to implement the very high values suggested by the framing metrics into WAR. We have not yet implemented it because we are not yet convinced that we have the best solution, so we have left the model incomplete and wildly incorrect for some players, while attempting to acknowledge that limitation along the way. The addition of framing — which probably isn’t too terribly far away at this point — will also have ramifications for how we think about pitcher WAR, as the acceptance of framing runs saved necessarily requires pitchers to receive altered credit for their contributions to walks and strikeouts at the least.

The reality is that disentangling run prevention is difficult and there is no magical solution that erases all of the models problems. Regressing the defensive component might be a worthwhile endeavor, and it’s something we have considered and will continue to consider. We understand that a lot of people prefer a runs allowed basis for pitching WAR, and have had many conversations about whether to alter the calculations that we use for pitchers. Passan’s critiques of the model might go too far, but he’s not wrong that the model has issues, and that there are areas to be improved upon.

We are not attempting to stand as gatekeepers of the current calculation, keeping better calculations out because this is what we have now. We’ve made a great number of changes to WAR over the years, from adding things like baserunning on non-stolen base plays for position players to crediting pitchers for their infield flies. Last year, we settled on a unified replacement level with Baseball Reference. The model is constantly being reviewed and updated, and our hope is that it continues to improve over time.

So I will put this question to our readers. Passan has suggested that WAR would be better if the defensive component was minimized, and that some of the credit for run prevention was moved from position players to pitchers. Do you agree? There’s a simple yes/no poll below, but I’m interested in hearing more in-depth responses as well. If you agree that a 57/43 split puts too much emphasis on position players contributions to run prevention, what is the batter/pitcher split that you would prefer? If you would prefer that we used a regressed version of a defensive component in WAR, how much would you regress the number, and to what mean would you regress it?

Our hope is that WAR is always as good a model as it can reasonably be. How would you make it better, considering the ramifications of the suggested changes? And can you show that changing the model would indeed make it better, and not just more palatable to our current perception of player value? If improvements to the model can be shown to be reasonable, they will be made. We are not ignorant of WARs imperfections, nor do we want them to continue any longer than need be. Our goal is to push the model forward, and we are open to suggestions on how to do just that.



Print This Post



Dave is the Managing Editor of FanGraphs.


Sort by:   newest | oldest | most voted
Carl
Guest
Carl
1 year 8 months ago

I don’t understand why regressing the defensive component of WAR necessarily involves changing the hitter/pitcher WAR balance. Couldn’t you regress defensive performances to the average while leaving the total amount of WAR unchanged?

TKDC
Guest
TKDC
1 year 8 months ago

This is exactly what I was going to say. However, I’d only be in favor of this if it were done due to the belief that the metric is likely wrong for outliers. How exactly should it be regressed? Dan Uggla graded out as an above average defensive player in 2013 despite three consecutive years as below average, and getting way past his prime. Is that 6.5 number regressed the same as Leonys Martin’s 6.9 number this year? Are we just minimizing the differences between the best and worst outcomes? Aside from aesthetics, is there a good reason to do so?

Tangotiger
Guest
Tangotiger
1 year 8 months ago

If you regress ALL fielding performance to the average, then you are evaluating those players ONLY on their offense.

How can you then allocate the 57/43 split to nonpitcher/pitchers, if the nonpitchers absorb all the offense and the pitchers absorb all the defense?

You can decide on using only the positional adjustment (so all SS get the same +7.5 run credit). While that will move you away from a 50/50 split, it still won’t move you enough.

Alternatively, you can do a “team fielding”, so instead of 57/43, you do 50/7/43, and just not allocate any fielding (other than positional adjustment).

There’s alot to think about here.

Roger
Guest
Roger
1 year 8 months ago

Perhaps the positional adjustment should be a larger part of the 14% positon players’ contribution to run prevention, and UZR less so. This would reduce the role of defensive measurement in WAR until it becomes more accurate, without reducing the overall value of defense.

Jianadaren
Guest
Jianadaren
1 year 8 months ago

That’s exactly what regressing UZR to the mean would do. UZR would approach 0 and the positional adjustment would approach 100% of the 14%.

ReuschelCakes
Guest
ReuschelCakes
1 year 8 months ago

One of Dave’s points that is lost is that there is no evidence that these defensive values should be regressed – it is only our *suspicion* of the data that makes us want to…

Said differently, no one questions an outlier offensive year like Chris Davis’ 52 batting runs / 6.8 WAR in 2013. Because we can observe his discrete outcomes, his 0.348 ISO, his 0.370 OBP, his 29.6% HR/FB, etc… we KNOW that this season is an outlier… but we also KNOW that he really did hit 53 HRs and 42 2Bs…

For dWAR we do not as explicitly/intuitively KNOW the latter and therefore ASSUME that the outliers are measurement-based rather than outcome-based…

munchtime
Guest
munchtime
1 year 8 months ago

If you take a defensive metric (UZR, for example) and regress it to the mean before incorporating it into WAR, you certainly do not “evaluate players only on their offense”. It would create a smaller range of values, but there would still be a range of values.

What you are describing is replacing defensive metrics with a constant value for everyone. I haven’t seen anyone suggest doing that.

Jianadaren
Guest
Jianadaren
1 year 8 months ago

What he meant is that as you regress a defensive metric to the mean (0 WAR) you approach “evaluat[ing] players only on their [oWAR]” – i.e. offense and positional adjustment only – because the defensive metric component will approach zero.

The 7% defense component would be made more and more out of pure positional adjustment and less and less out of the metric.

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

But what if you give +7.5 to a guy who gets a +15 UZR, and a -7.5 to a guy who gets a -15 UZR. Wouldn’t that eliminate the problem you’re describing?

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

(I’m not saying that’s the correct amount of regression, to be clear)

Catoblepas
Guest
Catoblepas
1 year 8 months ago

Nope! Consider a hypothetical four-player league, with two pitchers and two hitters. 100 WAR is earned, split 57/43, with 50 going to hitters for hitting, 43 going to pitchers, and 7 going to hitters for fielding. Hitter A gets 28 offensive wins and 0 defensive wins, while Hitter B gets 22 offensive wins and 7 defensive wins. Hitter B is just barely worth more than Hitter A, 29 to 28 overall.
We decide that that’s way too wide a range of defensive value, and it needs to be regressed. Hitter A is regressed toward the mean, and Hitter B away, so that the new defensive values are 2 and 5, respectively. Now, we didn’t change the amount of fielding value given out, but without any change in performance, we now see Hitter A as more valuable, at 30 wins, vs. Hitter B at 27. Offense has become more important, since the range of defensive values has narrowed and consequently its impact on our evaluation of the players.
This is a super simplified example obviously, but as far as I understand it the same forces are at play in the league, so hopefully it helps.

Yirmiyahu
Member
1 year 8 months ago

Agreed. I don’t think the primary argument here is that WAR overvalues defense; it’s that UZR does a pretty clumsy job of measuring defense. It’s not unreasonable to halve the fielding runs that go into WAR, but that’s not because defense is less of a component; it’s because the numbers themselves are untrustworthy.

I think the more logical (and maybe more common) suggestion is to use multi-year and/or regressed fielding runs in the defense portion of things. I know Dave will counter with the argument that then we’re describing true talent rather than what actually *happened*, but I’ve never cared for that argument. And if you understand how UZR works, it’s not measuring what actually happened anyway (many plays are entirely thrown out of the data).

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

Yeah, I don’t have a problem with single-year WAR totals. The defensive numbers just have to be vocally taken with a massive grain of salt when we only have one year of data.

So, when talking Gordon and Trout (for instance) you say “WAR grades them out similarly, but seeing as how the defensive numbers for each player are massive outliers, we should probably err on the side of Trout,” rather than “well, this just shows that the mainstream isn’t valuing defense highly enough.”

We don’t necessarily do that right now, and it’s a problem.

Blue Wonder
Guest
Blue Wonder
1 year 8 months ago

It’s Brian McCann Guy. Right on.

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

Huh, it would be nice if there were also losers with way too much time on their hands who obsessively cataloged every time I was right, you know?

Besides being wrong about McCann, I’ve also been yelled at for suggesting that Martin Prado and Randall Delgado was an absurd underpay for a player of Upton’s caliber. And I was a huge booster of Denard Span and Jonathan Lucroy heading into this season, and Matt Carpenter heading into last year, but no one likes to talk about that.

EthanB
Member
EthanB
1 year 8 months ago

Yeah, this one was of my thoughts. For every Alex Gordon who might lose 5 runs of value, wouldn’t there be a Matt Kemp who picks up 5, thus leaving the total position player WAR unchanged?

Juicy-Bones Phil
Member
Juicy-Bones Phil
1 year 8 months ago

Instead of devaluing defense, couldn’t you increase the value of offensive actions? I’m sure there is a logarithm that would keep the values distinct but still close enough to not inflate pitching numbers.

Andy
Guest
Andy
1 year 8 months ago

It’s more likely to work the other way. If you increase the value of offensive events, you increase the value of defense. E.g., if that double is worth 1.5 net runs instead of 1.0, the player who prevents it gets credit for 1.35 runs and instead of 0.9 run.

But the larger problem is that you can’t arbitrarily change the value of offensive events. Their value is determined by their relationship to runs scored, the latter, of course, being a known, measurable quantity.

Jesse
Guest
Jesse
1 year 8 months ago

You should also stop using FIP for pitcher WAR. It’s pretty clear that not all pitchers can be captured accurately by FIP. We use wOBA for hitters, not FB, LD, GB rates, etc…

Brian
Guest
1 year 8 months ago

I could not possible AGREE MORE.

Basing pitcher WAR on FIP is like basing hitter WAR on xBABIP.

WAR should be awarded on the basis of what happens, not on predictive/evaluative metrics. I love FIP and xFIP and SIERA more than ERA, but awarding wins based on them strikes me as strange.

NotTango
Guest
NotTango
1 year 8 months ago

Raw FIP isn’t really meant to be predictive. That’s not the point. It’s an accounting of events in the past that we know the pitcher has the most control of. That’s it. Any added predictive ability it has when it comes to future run prevention is just a fringe benefit over ERA, RA9 etc.

If you want to predict future run prevention, use a projection system. If you want to predict future, defense-independent pitcher performance, use what that projection system spits out for that pitcher’s FIP.

tz
Guest
tz
1 year 8 months ago

Exactly. FIP is a descriptive stat period. The reason for xFIP, SIERA, etc. is to build a predictive stat by tweaking the noisiest components (ex. HR/FB rate for xFIP)

noseeum
Guest
noseeum
1 year 8 months ago

This topic comes up at least once a year. Every time it does I ask the same question and get no answer. So pretty please someone address this!!!

Those of us who take issue with FIP often end up drifting into the predictive vs. descriptive debate and FIP based WAR supporters seem to always respond with essentially “FIP is descriptive. Be gone with you!!”

But to me this is a cop out. ERA is simple. ERA is the number of earned runs given up divided by the number of innings pitched, then multiplied by 9 to get the earned runs given up per 9 IP.

FIP on the other hand is a series of calculations based on certain events, K/BB/HR (all good so far), but then multiplied by a constant whose purpose is “solely to bring FIP onto an ERA scale”. So to me FIP is not just a description of events that happened. Multiplying by the constant is essentially an attempt to say “this is the ERA this pitcher should have gotten”. That to me is a judgement call that is then treated as an actual result. Maybe I’m completely misinterpreting but why does FIP have to match ER?. Why can’t FIP be on a different scale? I understand of course that it helps for comparison so there’s value but when using it as a data source for a calculation how can adjusting FIP to the ERA scale not corrupt it as an input?

I’m certainly open to the fact I’m misunderstanding something and worrying about something that I don’t need to but I would love for someone to explain to me why that’s the case.

Thank you!

Mark L
Guest
Mark L
1 year 8 months ago

Noseeum, it’s multiplied to look like ERA purely so it’s in a format that non-hardcore baseball fans can understand. If you print in the paper “this guy’s FIP is 1591561587”, no-one other than Fangraphs readers will have the foggiest idea what you’re on about.

noseeum
Guest
1 year 8 months ago

Mark L thanks for replying. I know that but as you imply, it is simply to make FIP comfortable for human consumption. But when using FIP as an input for another statistic, it seems to me that constant makes it so you can’t just say “FIP describes what happened”. You have to say “FIP describes what would have happened if he pitcher had league average BABIP and league average timing. FIP is the ERA we think this pitcher should have earned.”

Perhaps this is much ado about nothing but my concern is that there is a lot of judgement built into FIP. It is not just a descriptive stat. It’s got an opinion, so to speak, which seems to corrupt it as a data point. I’d feel more comfortable if adjustments of that sort were done at the WAR level than at the input source, but perhaps it’s mathematically equivalent.

EthanB
Member
EthanB
1 year 8 months ago

FIP is not a predictive metric. It may be (and is) more predictive of future ERA than ERA is, but it is not fundamentally a predictive metric. All of the inputs (K, BB, HBP, HR, IFFB) actually did happen. It certainly does not include *everything* that happened, using ERA or RA9 would give the pitcher credit for things the defense did. The true answer is somewhere in the middle, but it involves a nearly impossible to answer, almost philosophical, question of who gets credit for things like random variation.

Jason
Guest
Jason
1 year 8 months ago

But isn’t that the point? That we know that not all variation is random?

Brian
Guest
1 year 8 months ago

But using the batter’s actual batting line treats hitters and pitchers differently.

Pitchers get WAR presently based on FIP, the outcome that should have been determined based on their merit, stripped of chance, stripped of the outcomes determined by defense.

Batters do not. Random chance and the performance of the defense are still weighed in batter WAR. Why?

Cliff
Guest
Cliff
1 year 8 months ago

We should at least use ERA instead of FIP when evaluating career WAR because we know ERA is a more accurate record of what happened than FIP over a large enough sample. Why use second-best data?

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

Brian, there’s an argument to be made that using FIP instead of RA9 for pitcher WAR is the same thing as using wOBA instead of RE24 for position player WAR, in that it ignores sequencing and provides a context-independent evaluation of the pitcher’s performance.

I don’t agree, personally, but that is an argument.

Breezy
Guest
Breezy
1 year 8 months ago

But couldn’t you attribute differences in ERA and FIP to bad defense? So you would double count the bad defense to the player and the pitcher WAR?

A chopper groundball up the middle that every shortstop makes a play on outside of Derek Jeter shouldn’t penalize a Yankees’ pitcher because he’s out there.

Jackie T.
Member
Member
Jackie T.
1 year 8 months ago

I don’t think you’re understanding what FIP is.

http://www.fangraphs.com/library/pitching/fip/

Bats Left, Throws Right
Member
Member
Bats Left, Throws Right
1 year 8 months ago

The polls are misguided. The important question is the second one, and the only reasonable answer–“I don’t know”–isn’t given. From this piece it sounds like the 57/43 split was decided basically by starting from 50/50 and adding in an amount that seemed right for fielding. Was it empirically derived? Wasn’t there some evidence in The Book (I might be wrong) suggesting that great pitchers influence plate appearances more than great batters do, suggesting that the starting point should not be 50/50?

Really though, why do we insist on giving out 100% of the wins (minus replacement level)? We all acknowledge that there are things that aren’t measured by WAR as well as things that WAR claims to measure but may do so inaccurately. Why not have, for example, 54% wins for position players, 43% wins for pitchers, and 3% unassigned because they’re indeterminable?

Tangotiger
Guest
Tangotiger
1 year 8 months ago

I think it’s perfectly acceptable to have some “other” bucket.

Are most people ok with that?

Crumpled Stiltskin
Guest
Crumpled Stiltskin
1 year 8 months ago

I think this makes sense, especially because we already know almost for certain that not all pitchers rely on the defense to quite the same extent, ie. those that induce weak contact.

tz
Guest
tz
1 year 8 months ago

A bucket for “other” would be just fine.

Or, maybe someone could come up with sensible error bars for the different parts of the model. Something like, small error bars for official scorer rulings (hit vs. error), bigger ones for UZR ratings on plays, and an error bar for the split of run prevention between the pitcher and the fielders.

Matthew
Member
Member
1 year 8 months ago

I agree. I think it would be an almost scientific admission that we have some sort of experimental error. Instead of assigning credit where we could be wrong, just admit we aren’t sure what to do with it.

Error bars would be a very interesting concept. We could say within absolute confidence that a player is worth ___ WAR with a degree of error of +/- _____. As of right now, WAR is very precise, but not entirely accurate. If we try to make it more accurate, it will end up less precise. Off the top of my head, error bars would be a good way to do that.

ralph
Guest
ralph
1 year 8 months ago

It’s not even (all) experimental error, necessarily.

How much credit should a fielder get for being well-positioned by the manager or defensive coordinator? How much credit should the manager/defensive coordinator get?

DavidKB
Guest
DavidKB
1 year 8 months ago

Error bars is exactly what I was going to mention. That’s the only way to solve this without additional information (e.g. ball trajectories). Reducing the weight of an uncertain part of a model is absolutely the wrong approach. You just put on the correct error bar. A player whose value is derived mainly from defense will just have a much larger error bar than a DH.

Tim
Guest
Tim
1 year 8 months ago

I’ll chime in in favor of the idea that if you listed your error bars, none of this would be a problem.

John Wright
Member
1 year 8 months ago

I’m an accountant who likes the double entry system as much as the next guy, so I understand the idea of giving out 100% of the credit, but I don’t understand why it all has to go to pitchers and position players. I completely agree with the idea of an “other” bucket.

In addition to position players and pitchers, teams pay coaches and scouts and stat guys, and presumably they all have an effect on talent identification and tactics (like fielder positioning) which have a non-zero win value.

Nathan
Guest
Nathan
1 year 8 months ago

Ha…that 3% can go to to the managers for whenever we get manager WAR.

Chris from Bothell
Guest
Chris from Bothell
1 year 8 months ago

I may be able to say I was present at the moment that Fangraphs developed the sabermetric equivalent of dark matter.

Zach
Guest
Zach
1 year 8 months ago

Two possible explanations for the 57/43 payroll split exist. The first is that most MLB teams have 13 position players and 12 pitchers on their roster (or even 14 and 11).

The second is that most MLB teams know and realize that pitching is more volatile than hitting. That’s due both to the increased injury risk and the simple fact that pitching (or at least runs-based results)is less predictable. Given that, it would be foolish for teams to invest as heavily in pitchers (league-wide) as in hitters, even if the true split were 50/50 in terms of runs/wins created/prevented.

Yirmiyahu
Member
1 year 8 months ago

Excellent points. And I would add that the decision to have more pitchers on the roster than hitters is based on workload/health rather than a determination that pitchers are more valuable.

Robby
Guest
Robby
1 year 8 months ago

I still think the special cases that get brought up to say that WAR is too defense-heavy, are mostly due to people’s lack of appreciation for defense as a whole, even in a subconscious way. The built-in biases we all have are hard to counter and correct. If some of those lesser players Passan highlighted were making the same plays but were wearing a Trout jersey, hard to say many wouldn’t give those defensive plays more credence/value

Breezy
Guest
Breezy
1 year 8 months ago

Couldn’t agree more. Preconceived notions are a bitch.

grant
Guest
grant
1 year 8 months ago

Not sure if it’s lack of appreciation for defense as a whole, as much as lack of confidence in the defensive metrics. The defensive metrics clearly have some flaws, I think that’s more an issue than undervaluing defense as a whole.

Pirates Hurdles
Guest
Pirates Hurdles
1 year 8 months ago

Flaws sure, but it seems like many people completely dismiss things like UZR which is really ridiculous. Its not like UZR is rating guys highly/lowly in a manner that does not lien up with traditional defense evaluations and scouting. We have exceptions, but most of the time its an issue of people not understanding what a 0 UZR looks like at SS or CF (all MLB guys that play those spots are very good defensively). I know I struggle with that watching McCutchen play CF nightly and seeing a negative UZR get posted.

Crumpled Stiltskin
Guest
Crumpled Stiltskin
1 year 8 months ago

Doesn’t the percentage impact of the pitcher on run prevention depend on the individual pitcher? (i. e. Matt Cain and others that induce weak contact.) And not only on the individual pitcher, but possibly the individual defense? (Since run scoring is based on sequencing, I could imagine a piling on effect were the defense truly bad enough, though that could be completely wrong.)

And isn’t it possible that the level of credit a pitcher deserves depends on how good they are?

Clayton Kershaw this year, or Pedro Martinez in the past (or a few other pitchers here or there) are so many standard deviations better than the average pitcher that even the best defensive performance behind them is unlikely to contribute to a significantly higher percentage of positive outcomes: That is, team wins. Shouldn’t these types of pitchers perhaps also receive a larger percentage of WAR credit than an average pitcher that relies more on defense to contribute to his own and therefore his team’s success?

(I know that FIP already partially addresses these factors by paying attention to K, BB and BIP, but with truly great pitchers, I could see the metric still understating their contribution.)

Pale Hose
Guest
Pale Hose
1 year 8 months ago

Dave – in articles that introduce new metrics to Fangraphs and articles like this you do a great job in identifying and explaining the merits of a metric. On the other had the flaws and drawbacks of the metrics only get a throwaway line such as the model isn’t perfect and there are flaws, but this is the best we got. It would be helpful to have a more open discussion on the flaws and the drawbacks to the metrics so that we can better understand the entirity of the model.

ralph
Guest
ralph
1 year 8 months ago

I can’t help but wonder if it isn’t a huge problem that defense is not measured against a replacement level.

Of course replacement value for defense is a tough thing to nail down, since there’s a huge supply of fielders who could do well in LF, for instance.

What I’d prefer to see instead is to rate fielders against a minimally-acceptable level of defense that MLB has historically shown it can live with at each position.

ralph
Guest
ralph
1 year 8 months ago

I think all kinds of neat things could be done/accrue with this approach.

For one, it eliminates the issue of the baseline average performance being overly influenced by outliers. I think a reasonable guiding principle should be consistent crediting of balls fielded based on batted ball profile.

That is, that if a fielder makes a play and gets credit for preventing 0.7 doubles versus a minimally-acceptable fielder (MAF) in the year 2005, a fielder should receive that same credit for preventing 0.7 doubles on the same ball hit in 2015, unless we have very strong reason to believe that something fundamental has changed in how plays are made. Naturally, the value of preventing 0.7 doubles will change based on the offensive environment, and that’s how this approach would scale to different offensive eras.

I could imagine this also being a great way to illustrate the value of defense. For any given batted-ball profile that a fielder fields, one could generate an AVG/SLG line that the minimally-acceptable fielder (MAF) would have allowed, as well as the AVG/SLG line given up by a particular fielder in question.

Additionally (and I know this introduces weirdness), that info could be presented as a raw number of singles, doubles, HR, and maybe triples prevented. That would provide a very intuitive feel for how much value a defender added.

tz
Guest
tz
1 year 8 months ago

You would need a real good body of batted ball data to do this, and most likely in the form that MLB has just introduced for its teams (with flight paths of balls, etc.)

Some club’s analytical team should be all over this idea. Even though it may take a while to build the reliable data, the insights it could provide would be worth discovering before the rest of the pack.

ralph
Guest
ralph
1 year 8 months ago

Agreed, tz. I think/hope something like this will be a byproduct of the new data coming online soon.

Though theoretically UZR data on a season of Mike Morse in left or Shin-Soo Choo in CF would seem to provide starting points that we could run with right now.

ralph
Guest
ralph
1 year 8 months ago

This approach also seems like it would provide a good place from which to launch a re-examination of how credit should be allocated for run prevention between pitchers, fielders, and perhaps coaching staff.

It does raise the interesting question about balls that are pretty much always fielded by MAFs. Should anyone get any credit for those?

ralph
Guest
ralph
1 year 8 months ago

And now I see I’m in good company as Sean Forman is touching on some similar themes here: http://www.sports-reference.com/blog/2014/09/the-real-problem-with-baseballs-defensive-stats/

ralph
Guest
ralph
1 year 8 months ago

Sorry to keep replying myself, but now I’m wondering just how much WAR acceptance/understanding problems stem from using the term “replacement” instead of something self-defining/intuitive like “minimally acceptable.”

Hawk Harrelson
Guest
Hawk Harrelson
1 year 8 months ago

The definition of “minimally acceptable” changes based on how well a player hits. Wouldn’t you put up with Jose Canseco defense if you got PED Barry’s offense?

Hawk Harrelson
Guest
Hawk Harrelson
1 year 8 months ago

The definition of “minimally acceptable” changes based on how well a player hits. Wouldn’t you put up with Jose Canseco defense if you got PED Barry’s offense? That’s why one replacement level is defined, not as a sum of component replacement levels, but as a single number of runs.

ralph
Guest
ralph
1 year 8 months ago

But that’s the problem, right now replacement value is only defined for hitting, despite the fact that teams might theoretically be willing to go below replacement value hitting-wise for a particularly excellent fielder.

Right now WAR is RAR + DRAA … which really seems to create problems since different units are being added together.

Would you put up with Jose Canseco’s defense at shortstop in that scenario? If no, then minimally-acceptable SS defense is above that line.

But really, the point would be to empirically define minimally acceptable. Maybe take the 1 or 2 (near-)full-season fielding performances at a given position every year and make that the baseline?

ralph
Guest
ralph
1 year 8 months ago

Oops, I guess that’s not quite right. I think I was overly-influenced by a possibly-faulty recollection of WARP being based on VORP + defense.

But I still feel like if I were designing a system from scratch, breaking down hitting and fielding components into a minimally-acceptable levels would make the most sense to me. And if you need a fudge factor to re-balance things to an overall minimally-acceptable player, so be it.

Nathaniel Dawson
Guest
Nathaniel Dawson
1 year 8 months ago

There’s no such thing as “replacement” level defense or “replacement” level offense. A player is judged by their overall ability relative to replacement level player. No matter where you set the bar for replacement level defense or replacement level offense, there would conceivably be players that could be below one or the other and still have enough overall talent to be an above replacement level player. How low would you have to estimate replacement level defense before Barry Bonds offense didn’t make him at least a replacement level player?

steverino
Guest
steverino
1 year 8 months ago

I strongly disagree with the notion that quantifying defense relative to offense and pitching is limited by lack of actionable data on things like positioning, vector, etc. A pitcher’s ERA is not reflective of hard-hit outs or fluke or bloop hits any more than is defensive WAR. A batter’s OPS is not reflective of which singles were hammered off the top of the Green Monster, which doubles were slow rollers down the line, which bloops were hits and which blasts were outs any more than defensive WAR captures vector. While all of these metrics have flaws, and while defense may have the furthest yet to go, we have to be careful we don’t single out flaws that are not unique to defense.

Bip
Member
Member
Bip
1 year 8 months ago

I think part of the difference between measuring offensive value and defensive value is that while both are influenced by luck, only one is influenced by measurement error.

Sure a guy may have a good wOBA one year because of a bunch of seeing-eye grounders and some very borderline home runs, but we know that those events happened and we know what the results were. This presents a problem when trying to assess the player’s talent, because there is a disconnect between how well he did his basic job as a hitter and the results he got. However, there is no question about the value contributed as a result of his actions.

With defense, not only can luck and player variation call into question the measurement of talent, but we also have to question the value assigned to each event. If a player misses a play, we want to count it against him if it was makeable, but if it was not makeable, we don’t. There is no clear way to distinguish what is makeable and what the chances are. This leads to error even when trying to assess value itself.

Jeff
Guest
Jeff
1 year 8 months ago

Those flaws for ERA and OPS tend to cancel out over large samples, around a season’s worth. Defensive metrics have much, MUCH smaller sample sizes. The most common minimum reliable sample size I’ve seen listed for defensive metrics is three seasons. So the flaw with defensive WAR is that the component values are more noisy, and much less reliable.

I’m surprised this article didn’t really bother discussing this issue with UZR again (though I know Dave has mentioned them in the past).

John Wright
Member
1 year 8 months ago

If Inside Edge is indicative of UZR in any way, the Heywards and Gordons of the world are amassing most of their defensive value on the outcomes of just 50–60 non-routine, non-impossible chances this year. Don’t mess up, guys!

Also, outfielder run prevention values in particular would seem to be subject to large opportunity-based variances. What if a particular outfielder gets 70 questionable fielding chances and another gets 30? It could be partly due to playing behind a ground ball–heavy staff, or partly due to random variation. The results may represent “value” of some kind but they may not be telling us which player is better.

Eric R
Guest
Eric R
1 year 8 months ago

“Defensive metrics have much, MUCH smaller sample sizes. ”

Easy fix. We need to triple the balls in play to roughly even them out. Lets see. Decrease strikeouts — IDK, six strikes and you’re out?

But, walk rates will probably go way up, so lets double the number of balls as well.

But, HR rates will probably go way up, can we move the mound up like 6 feet and make it a foot taller?

But, pitcher injuries will probably increase [being closer to batted balls and games probably averaging 300 pitches on each side, so lets give pitchers some carbon fiber armor and expand full-season rosters to 35 [but teams *must* carry at least 20 pitchers]

Hmmm… games are probably going to be pretty long, lets get rid of commercials.

emdash
Guest
emdash
1 year 8 months ago

It would help clarify the discussion if there were more clarity on the run value particular actions or plays have. Without knowing that we as consumers of the data have to take the calculated defensive WAR on faith, because there’s no way for us to independently verify it. That makes it hard to accept that, say, Jason Heyward has provided *so* much value on defense that he’s one of the best players in the NL despite his unimpressive offense.

Tim
Guest
Tim
1 year 8 months ago

I think it probably makes sense to regress the impact of defense on WAR, specifically because we don’t have as good an understanding of how to measure defense correctly, and as a result the data we have is suspect and perhaps should have as big an impact on player WAR. As we have better tools and better understanding of how to measure defense, then it would be an appropriate time to increase defensive metrics impact on WAR.

everdiso
Member
everdiso
1 year 8 months ago

I agree regarding either regressing the D component more, or in using a 2-3yr sample of some kind instead.

Of course, we could always just have different types of war – nothing wrong with more options.

This is especially true for pitching?

Instead of just FIP-war, why don’t we also have xFIP-war and SIERA-war as options to look at and evaluate as well? Especially since i think even the mkst ardent FIP users realize that it’s not an overwhelmingly better stat than the others.

And how about this fancy new stat? I give it the nighly technical name of “AVG”. To calculate it, i add up ERA, FIP, xFIP, and SIERA and then divide by four. This gives me a number which includes all four of these useful stats, reinforces the qualities seen by all the metrics, and reduces the impact of qualities seen by only one metric. I’m sure weighting them more appropriately is better, but i’ve been having fun caluclating my own AVG stat, because it has fewer weird anomalies that don’t pass the smell test, and has so far gotten rid of pretty much every weird outlier found by looking at just ine of the four metrics.

Anyways, back to war, i’d love to have access to more and different variations on war, given that at this point its hard to even use any single war amount in a discussion without listing a series of disclaimers first anyways.

telejeff
Member
telejeff
1 year 8 months ago

Robert Arthur’s response, in my opinion, sets forth the better argument. It would be helpful to the discussion to see if, as he suggests, the predictive accuracy of WAR really would be improved by increasing rather than decreasing the weighting of defense.

It would be particularly ironic to reduce the importance assigned to defense in 2014 as defense (and bullpen) failings appear to be what explains how the Detroit Tigers are not peforming better than the KC Royals, as explained in a nice article on FanGraphs last week.

Robert Arthur
Guest
Robert Arthur
1 year 8 months ago

“It would be helpful to the discussion to see if, as he suggests, the predictive accuracy of WAR really would be improved by increasing rather than decreasing the weighting of defense.”

Well, actually, I did do that in the linked article. I made a linear model:
team ERA = (team pitching WAR) + (team defensive WAR)*(some regression factor)

Then I checked to see how well the model fit team ERA, while I varied the regression factor. Low values of the regression factor, i.e. close to zero, tended to push extreme defensive performances back to the mean. I found that the model was best fit when defensive WAR was actually slightly overemphasized, such that extreme defensive performances were accentuated.

I think a lot of people are suggesting this idea–of downweighting the extreme values of dWAR. But when I did that, it was less predictive of runs allowed, suggesting that on balance, those extreme values are more signal than noise. Which is not to say that particular extreme values are not potentially inaccurate, but overall, dWAR is best served by not regressing out the extremes (apparently, and according to the method I used).

Mike K
Member
Mike K
1 year 8 months ago

I think a useful change could be to do like you did with pitcher WAR. That is, leave the “official” version of WAR unchanged (until you find good reason to change it). But – like with using WAR, RA9-WAR, or 50/50 for pitchers – give the ability to compare a player’s WAR with full credit for defense, regressed (regress to mean for the position, perhaps?), or no defense (still include position adjustments).

So now we’d have the ability to look at leaderboards differently, based on our own preference for how much defense to include. There is a downside in that if we picked zero or regressed defense, we wouldn’t see how pitchers would get more (or less) value. But that’s currently a downside if we pick RA-9 WAR isn’t it?

JJ
Guest
JJ
1 year 8 months ago

Yeah, I don’t see a problem with having more information like that available. You run the risk of adding more subjectivity in debates with one guy citing regular WAR and the other saying he prefers regressed-WAR, but that may be worth it to have more information. And we shouldn’t be looking to appeal to those who are in the minority, “be-all, end-all” group who use the numbers wrong. In other words, don’t be afraid to make this information available because some may misuse it, or it may add subjectivity. Just as with WAR now, you have to know to use it correctly (isn’t perfect, tenths of wins aren’t enough to declare one player more valuable than another, etc..)

Sandy
Guest
Sandy
1 year 8 months ago

“Overall, her numbers added up to just over $3.3 billion in total league expenditure on 2014 player salaries. Of that $3.3 billion, $1.9 billion was allocated to hitters and $1.4 billion was allocated to pitchers. The payroll split that teams have decided upon this year? 57/43, the same proportion we are currently using in WAR.”

It’s worth noting here that if a team is carrying 14 hitters and 11 pitchers (which is pretty standard), position players are then 56% of the roster. Even if we account for those teams that carry 12 pitchers, hitters’ total value is only 2-3% higher than their proportion of total roster spots. So this is a weaker argument than it first appears.

Ray Lankford's moustache
Guest
Ray Lankford's moustache
1 year 8 months ago

Furthermore, given what we know about risk factors for pitchers, the inherent risk coefficient would serve to depress pitcher salaries beneath their yearly expected contribution.

Matthew
Member
Member
1 year 8 months ago

Hopefully MLBAM loves us as much as we love MLB.TV and this debate will be moot in a few months.

jmoultz
Member
jmoultz
1 year 8 months ago

This. To answer Dave’s question near the end of his post, I don’t think it’s worth anyone’s time developing another inaccurate model to “better” measure the components of runs scored and prevented until the new MLBAM system is up and running. Once access is granted to that data a reliable analysis tool is created, this debate will be a short one and all facets of the game on the field will be able to be measured with an extreme degree of accuracy.

Improving WAR may be a worthy endeavor to improve evaluation of seasons pre-2015 but I, for one, have zero interest in sitting around an arguing over historical defensive performance. The tools we have right now do a good enough job in that regard.

Aaron Reese
Guest
1 year 8 months ago

This is two distinct issues.

WAR’s ratios are not the problem. Defensive metric inconsistency is. Zone stats are fundamentally flawed. It may be a short-term fix to limit the influence that D-stats have on WAR, but in the end, you’d be corrupting a good formula to accommodate flaws found in D-stats.

You/we should focus on getting WAR to portray the distributive value of runs and run prevention as accurately as possible. If the evidence suggests that defense is worth 10-15% of overall value, then we should work to refine that number to its most precise (which seems to be exactly what everyone is doing).

Separately, we should figure out how to accurately evaluate defensive contributions for D-stats such as DRS, UZR and +/-.

Matt P
Guest
Matt P
1 year 8 months ago

I think the problem is that UZR and DRS aren’t 100% accepted. As long as that’s the case then the defensive component of WAR won’t be fully trusted.

This is hardly unique to defense. But the difference between defense and offense is that we have a number of basic offensive stats that quantify the impacts of a play. Even people that don’t believe in advanced stats agree that a double is better than single. Or a home run is better than a triple. As a result, it’s easier to understand offensive stats than defensive stats because it’s easier to quantify the value of a hit. There’s an understanding that some hits are better than others and that it’s easier to quantify them.

The only basic statistics for defense are errors and chances, or basically fielding percentage. Fielding percentage doesn’t differentiate between a hard play and an easy play. This means that people have no frame of reference when it comes to defensive stats. How do they define how difficult a specific play is? There’s no double for defense.

Since basic stats can’t fix this problem the only answer is more disclosure. A play-by-play UZR would allow people to see how players get their defensive values and therefore quantify the impacts of a play. Inside Edge tries to do something similar but isn’t really successful.

Until there’s further disclosure, people will be skeptical of UZR/DRS results because they won’t understand how they were created and won’t have a frame of reference to define them.

Scrapper
Guest
Scrapper
1 year 8 months ago

First, I thank both Passan and Cameron for having a thoughtful dialogue about this and for “disagreeing without being disagreeable,” which doesn’t happen nearly enough. I do believe, however, that Dave only addressed part of Jeff’s argument in his rebuttal. My reading of Jeff’s article is that he is dismayed not just by some of the extreme defensive ratings but also by the extremely wide variance in the different attempts to measure defense, with Baseball Reference and Fangraphs being at the forefront of that “dispute.” For avid readers that use and support both sites, we see one site say a guy is average defensively and another site suggests that he is a defensive star and that variance just feels unacceptable.

One thing that could be interesting would be to take an outfielder whose defensive ratings differ wildly between the two sites and to then show a videotape of all of the plays that the player was involved in during the season. Would probably be about 300-400 plays during the course of the season. Running under the video would be the numbers showing how each site treated that play for purposes of its rating metric(ie. 60% chance of catch, CATCH,+ .40 runs added). This of course would require a lot of work and also the cooperation between the two sites but it would certainly give readers and analysts tremendous insight into how and why UZR and DRS may differ in a particular case.

ralph
Guest
ralph
1 year 8 months ago

Yes, absolutely. And I kinda think that Passan should have done something like this so he could have included his take in the article.

Sandy
Guest
Sandy
1 year 8 months ago

It’s also worth noting that our sample sizes with fielding are smaller than they are with batting, especially for outfielders. For instance, Robinson Cano (picked because I wanted a guy who played a MI position every day) has 585 PAs this year, and (per Inside Edge) 426 defensive play opportunities. Mike Trout has 630 PAs, and 360 defensive opportunities. So our single-season fielding metrics are more likely to be subject to the vagaries of small sample sizes. [The exceptions to this are catchers and first basemen.]

JJ
Guest
JJ
1 year 8 months ago

Would it make sense/be possible to use more than one year of defensive data for the player? Example, instead of taking a guys 6 month UZR (which we all know to be incredibly noisy), can we use some kind of average of the player’s career defense? Or maybe their last two seasons?

Obviously, this can’t be done with rookies, and that is a problem. Likewise, it may fail to account for changes in skill level year to year, as we are using what happened in the past, and the player may have slowed down, gotten injured, etc. and that is making him worse in the field. But I think that may be a reasonable trade off considering the single-year numbers we are getting right now probably aren’t all that accurate either, so who knows whether they are even seeing the change in skill-level.

I don’t know really. I liked the idea of making a change to the weight of defense, but after reading this, that isn’t going to be an easy task if we want it done correctly. And I’m certainly no authority on this. I defer to Dave, Tang0, etc. considering they are the reason I know anything at all about this kind of stuff.

Steven
Guest
Steven
1 year 8 months ago

The issue here is Passan thinks statistics savvy people are looking at WAR as a holy grail. Anybody that thinks WAR is the complete story of what a player is, is really no different than people who thought BA/HR/RBI were the complete story for a hitter. It’s a tool that simply measures what happened and should not define a player.

Don’t like that a over the wall catch is worth as much as an easy flyout? Well I wouldn’t like it if two identical flyballs were worth a different value because one outfielder was positioned well and the other was athletic enough to make up for their poor positioning. WAR and defensive metrics just take events and the value of their results to measure what happened. The idea isn’t to measure a player’s jump on a flyball or ability to scale a wall.

I understand the frustration people have in seeing Jason Heyward as a top 10 player and Matt Kemp rated as a replacement level player, but their performances on the year placed them where they are. Anybody that has some mild understanding of statistics understands what a small sample size is, and defense is just that. Not enough plays are made for defensive metrics to normalize as fast as it does for hitting or pitching, but they happen, and they have a value.

Mike Pozar
Member
Mike Pozar
1 year 8 months ago

These discussions are really enjoyable and I love consuming them. Sometimes though I think the divergence of opinion is based on a fundamental disagreement on what the goals of a metric like WAR should be.

Is WAR about trying to strip away circumstance/context to get a picture of a player’s “true skill set,” or is it about trying to quantify the amount of value he added to his team in a given season?

In some ways, it is the former. For hitting we remove sequencing, for example, by treating all outcomes the same regardless of what game situation they came in. Likewise we normalize for park factors to get a better picture of how a player’s offensive production is the result of his own abilities rather than his home ballpark.

But in other ways, we don’t take it all the way. With park factors, the assumption is that each park impacts all hitters the same way and just divide by a constant (the run factor for that park). But this isn’t true – a left-handed power pull hitter playing in a park with a short porch in right field will have his numbers inflated dramatically, whereas a guy who just walks and hits infield singles won’t have his numbers inflated nearly as much. Yet both get their run values (wRAA) normalized by the same factor.

Likewise if a player maintains an unsustainably high BABIP, his WAR will be higher. It’s not reflecting his “true skill set” and we can expect him to regress next year (and can even model and quantify the extent to which he’ll regress). But one way or another he did get those hits and add value (runs, wins) to his team, so WAR gives him that credit, even if the high BABIP was luck-driven.

A similar case to BABIP could be made for defensive metrics. A large part of the fluctuation in defensive value from year to year is the result of a relatively small sample of plays that could break either way. One year a defender might get more opportunities to make plays that highlight his range, and accumulate lots of value, and the next year they don’t get those same opportunities (and/or there are more plays just outside their range), and don’t accumulate as much value. The metric doesn’t necessarily represent their true skill set, but it does reflect value added (he made more plays / saved more runs in one year than the other), even if a large part of that is luck-driven.

So what is WAR intended to be? Value added, or a measurement of “true skill”?

ralph
Guest
ralph
1 year 8 months ago

Yep. And that’s not to mention other things like where one bats in a batting order. Rates being equal, a leadoff man will accumulate more WAR than a cleanup hitter.

Pirates Hurdles
Guest
Pirates Hurdles
1 year 8 months ago

WAR by definition is a counting stat, not a measure of skill. It’s runs generated, which is an attempt at a real life measure of value. It is combined measure of seasonal performance, which in no way is measuring true talent.

Mike Pozar
Member
Mike Pozar
1 year 8 months ago

I would tend to agree that WAR is primarily a counting stat centered on measuring seasonal performance, more so than being about representing true talent.

But if that’s the case, why do we remove sequencing? Why do we say that a single in a high-leverage situation has the same value as a single in a low-leverage situation?

Tim
Guest
Tim
1 year 8 months ago

Because if you want that then WPA is pretty much perfect?

Mike Pozar
Member
Mike Pozar
1 year 8 months ago

Totally disagree that WPA would be the right metric if we want to remove sequencing. WPA is the right idea – look at each player, look at what *actually happened*, and how that contribution impacts his team’s chances of winning the game. But WPA credits the batter and pitcher with *all* the difference in Win Probability between the start and end of the AB. This is wrong.

Once a batter puts the ball in play, there are many ways other players can add value. If a great baserunner takes an extra base, the right thing to do would be to assign that value to the baserunner, not the batter. If a great fielder makes a great play to make an out or throw someone out, the right thing to do would be to assign that value to the fielder, not the pitcher (or certainly at least partial value).

Furthermore, WPA is based on the very rough, table-based win probabilities that are just based on score, outs, and baserunners; it doesn’t take into account the quality of the teams, pitching matchups, etc. If it’s the bottom of the ninth in a tie game with runners on first and third with one out, the WP will be the same whether it’s Craig Kimbrel on the mound or Jokey McMinors; or if it’s Slowy McFat on third or Speedy McHamilton; or if it’s Barry Bonds at the plate or Endy Chavez at the plate. It doesn’t actually capture the true WP.

Hollinger
Guest
Hollinger
1 year 8 months ago

As someone who doesn’t have the deepest understanding of the math involved in getting there, the issue to me has never appeared to be the percentage of WAR given over to defensive metrics, but with the metrics used to allocate that percentage. My instinct has always been that it seemed to skew too heavily for the very best and very worst defensive players. I think that’s what Passan is getting at with the regression comments. It feels like the difference between the best defensive rating and the worst defensive rating are too far apart. I’m sure there’s a better way to state that, but y’all are smarter than me, so I’m sure you understand what I’m getting at.

Phantom Stranger
Guest
Phantom Stranger
1 year 8 months ago

About dWar, the one elephant in the room is poorly handled park and team adjustments. There are 30 different parks MLB plays in and it’s much tougher figuring out park effects for defense than offense on an individual. Another big thing is positioning and how teams set their defense, which once again operates at a team level but somehow gets assigned to the individual. I don’t think it’s a coincidence we are seeing some massive individual defensive seasons being put up now that sabermetric shifting has taken over for some teams. Basically, some players are getting defensive credit for their stats department’s excellent positioning data.

Defense should really be treated as a concept of interrelated positions like a Venn diagram, since each position interacts with several other positions at the edges. All infielders get impacted by the quality of the first baseman and so forth. A leftfielder and first baseman have no relationship on the field. This is a radical concept but a better defensive system probably splits up the value of an individual out among the players interacting on a play. 3 players interact on a 6-4-3 double play but only gain two outs. Figure out what each player’s contribution on that double play was for the two outs. Most would probably guess the 2nd B’s turn is the most important in that double play, he might deserve more of the two outs than the other fielders.

Will
Guest
Will
1 year 8 months ago

It seems that there is value both in quantity and reliability of WAR for a given player. What I mean is that the value of Mike Trout isn’t only in the fact that he could put up a 10-12 WAR year, but that his minimum is probably 7. Inherently there is more risk with a pitcher, even someone like Clayton Kershaw, and so that might depress spending to below what his true value on a single year basis reflects.

Through this lens, it would be interesting to look at spending per WAR between starting pitchers and relief pitchers, as relievers notoriously have high variance in their performances and there has been a marked decline in Papelbon-like contracts recently for high end relievers.

LARDO CALORIESIAN
Guest
LARDO CALORIESIAN
1 year 8 months ago

COMMENT FARMING!

Plucky
Guest
Plucky
1 year 8 months ago

A major problem with integrating defense (and catcher framing for that matter) into WAR is reconciling them with what the concept of “replacement level” is supposed to mean. The concept of replacement level is at heart an empirical rather than philosophical one. Looking at hitting talent and pitching talent, the idea that the talent distribution is such that there is a more or less uniform level that is so ubiquitous as to be a floor for ML contribution has a pretty solid emprical footing. Perhaps it’s been done and I haven’t seen it, but as far as I know there has not been an equivalent analysis of the distribution of fielding or framing talent.

Thus, the defensive contribution to WAR is more an adjustment to an offensive statisic than it is a true contribution. Combining this with the positional adjustment built into WAR makes the result something of a kludge. It’s not at all obvious that “average” ought to be the baseline for defensive value, especially considering that 1) The basic assumption behind WAR as a statistic rejects average as the basis for comparison for both hitting and pitching 2) From a fielding perspecitve, “average” fielding is something you observe from strongly truncated distributions with a lot of selection bias- when looking at third basemen for example, you are looking at a sample of players that on their current team are likely defensively superior to the first baseman and corner outfielders and inferior to the SS, 2B, and CF. But if you look at the list of starting players for any particular position in MLB, it’s usually easy to find 3 or 4 that would be playing a different position on many or most other teams.

A tangible example is Manny Machado last year- he would be the shortstop on all but about 5 teams in all MLB, and yet because he actually happens to be on one of those 5 teams we observe him as a 3B. Should we be calibrating 3B defensive data from a sample that includes Machado? It’s not a trivial question because in 2013 he singlehandedly warped 3B statistics to the point that only 15 of 37 3B’s with >=350 PAs rated as being above average defensively. If Machado had been a SS, then basically every other 3B’s WAR would have appeared about 0.2 wins higher. Given that WAR is a zero-sum (or ahem, a 1000-sum) game, it’s mathematically necessary to do something like that, but it obviously makes zero logical sense given what “replacement level” is supposed to mean- How valuable a particular player is in relation to replacement level should not depend noticeably on how good Manny Machado’s defense is, because Manny Machado is pretty much the opposite of replacement level. No player’s WAR is significantly affected by how well Giancarlo Stanton hits, because his PA’s represent a tiny fraction of the MLB aggregate.

To truly integrate defense into WAR, there needs to be 1) an emprical study of defensive value to see if average is an appropriate baseline, or if another should be used 2) coming up with an empirical alternate-position model to estimate a player’s potential alternate value at other positions 3) re-calibrating the built-in positional adjustments in the WAR methodology based on that model

ralph
Guest
ralph
1 year 8 months ago

Thanks for providing those numbers on Machado and 3B — it reinforces my thoughts above that defensive WAR should accumulate against a “minimally-acceptable fielder” baseline.

One other note — even if Stanton somehow magically had a much greater number PAs in the game, that still wouldn’t affect the replacement hitting baseline one bit, since how Stanton hits has no relation to the pool of injury-replacement players out there.

Plucky
Guest
Plucky
1 year 8 months ago

Conceptually, you are 100% correct about Stanton’s hitting. In the mathematical specification of WAR however, replacement level hitting is defined in reference to average, -20 runs per 600 PA iirc. This level was chosen for essentially emprical reasons- it matched the actual distribution of hitters, and in aggregate got you to the place where a team playing at replacement level will win 48 games in expectation, and so it’s an acceptably good way to define replacement level hitting even if it’s not perfectly compatible with the concept. Thus, while conceptually Stanton’s offense should in no way relate to replacement level, in the actual specification it has some effect because he is a (very small) part of the average that is the calculation base.

While a logical purist might want a different mathematical specification of offensive WAR, you will bump up against the problem that in order to match reality and the zero-sum nature of actual wins, rep level in the free agent era equates at a team-wide level to a high-40s win total and has been pretty stable since the collapse of the reserve clause system and the generally high mobility of marginal players

Joe
Guest
Joe
1 year 8 months ago

hitting goes from replacement level, far below average. Average players get about 2 WAR.

fielding goes against the average though, not the replacement level player.

ralph
Guest
ralph
1 year 8 months ago

Interesting, I think I may have had some old forms of WAR(P) on my mind? I guess I’d still prefer a system like what I described versus the reality, as it does seem the more logical way to approach things. Nut at least the way it is now has a much smaller impact on offense valuation than defense valuation.

noseeum
Guest
1 year 8 months ago

In addition to the replacement vs. average issue, Stanton’s hitting is compares to all hitters and a positional adjustment is added. That is a lot of players. Defense, OTOH, is only compared to players of the same position. As the previous commenter pointed out, one outlier defensive player can skew the stats so much that a good deal more than half of the third basemen are considered “below average”. Stanton just can’t have that much impact on offensive average or replacement level. I don’t even think Bonds could have!

I’m essentially repeating the point above but just hoping to add some clarity. Great point @Plucky!

Jason
Guest
Jason
1 year 8 months ago

In 2004, Barry Bonds raised the NL OPS by 5 points.

Jason B
Guest
Jason B
1 year 8 months ago

…and average hat size by 3/16.

Jim Price
Guest
Jim Price
1 year 8 months ago

Unless I missed something, my take from Passan’s article was not that defense is weighted too heavily but that there is a lot more variability in the defensive ratings even from the same data set. The inexactitude of measuring defense creates some large “error bars” within WAR when a player derives most of his value from defense.

Pirates Hurdles
Guest
Pirates Hurdles
1 year 8 months ago

What is the evidence that there is “more variability” in UZR? Compared to what? I’m sure someone has studied UZR variation on a runs basis compared with wRC+. It is quite possible that UZR variation is fully explained by limited attempts each season and thus a small fraction of plays heavily altering the outcome. This is fine when measuring seasonal value and may explain year to year variation quite well. Kind of like looking at seasonal triples for hitters, rare events (like great defensive plays) will inherently have greater variance year to year.

Jason
Guest
Jason
1 year 8 months ago

UZR discards plays when there is a shift. What happens to the missing WAR for those thousands of plays?

jruby
Member
Member
jruby
1 year 8 months ago

It goes directly to Alex Gordon

Jimmer
Guest
Jimmer
1 year 8 months ago

that was funny!

Bip
Member
Member
Bip
1 year 8 months ago

There are so many comments here and so many interesting points to consider. I wish I could dedicate my day today to reading all of them lol. Fangraphs comments are some of the best I’ve encountered on the internet, just wanted to say that.

The thing I want to reiterate, that I have mentioned earlier, is the difference between value and talent. This is somewhat analogous to the difference between stats that measure past events and those that are meant to predict future value. This discussion often gets played out when talking about RA WAR vs. FIP WAR for pitchers. The basic question is, do we ever want to measure just what happens and ignore how much role a player played in the event? If all we ever want to measure is what a player did and nothing else, then we may have to consider what others are saying about offensive stats, and how they don’t actually award players for what they are doing. If this is what we want, then a player getting a hit on a weak ground ball or hitting a deep fly ball in the one park where it is not a homer can prevent a serious problem.

Justin
Guest
Justin
1 year 8 months ago

I just thought of an interesting implication of the way we measure defense. I think it’s generally agreed that there is some measurement error involved. Take Dave’s example upthread:

– Flyball to the gap, will be a double or an out
– 10% chance of a catch
– .7 run value for a double
– -.3 run value for a catch

Under current methodology, if a fielder makes the catch, they are credited with .9 runs. If they don’t make the catch, they are credited with -.1 runs.

Now consider a very simple example of the measurement error. Let’s say the 10% chance of a catch has some uncertainty such that in actuality it is a 5% chance half the time and a 15% chance half the time.

When an average fielder makes the catch, Bayes’ theorem says that now the catch actually had a 12.5% chance of being made. Alternatively, when an average fielder doesn’t make the catch, Bayes’ theorem says that now the catch actually had a 9.7% chance of being made.

The new run values would then be .875 runs for the catch and -.097 runs for the no catch.

So essentially, if we admit that there is measurement error in the determination of what % of fielders would have made the play, then the run values assigned under the current methodology are incorrect. This is because the actual result of the play should influence our beliefs about how likely the catch was to be made.

Aaron (UK)
Guest
Aaron (UK)
1 year 8 months ago

This would be an excellent adjustment. And it would automatically help to regress above- or below- average players to their long-term means [their priors, in Bayesian terminology].

e.g. if Morse makes a 5% play, chances are it was actually a 35% play. And if Simmons fluffs a 99% play, maybe it was only a 90% play.

This serves to discount positioning somewhat, which may not be to everyone’s taste.

Jim
Guest
Jim
1 year 8 months ago

When you start out asserting that defense accounts for 14% of run prevention, and thus that set portion of WAR, aren’t you setting yourself up for wacky numbers when you are throwing out all the shift plays?

Also, because of that fixed number, and the fact that most shift plays outs are made in the infield, does then that portion of defensive WAR transfer from infielders to outfielders? Could explain the weirdness of Alex Gordon?

Just spitballing.

Jay
Guest
Jay
1 year 8 months ago

One interesting thing I haven’t seen mentioned is allowing the position player/pitcher WAR allocations to change depending on the specific pitching and hitting environment of the given year. Strikeouts have been on the rise dramatically over the last few years. In 2013 there were 5909 more strikeouts than there were in 2003. That’s 5909 fewer outs that were recorded by fielders, a difference of almost 20%. To me that certainly seems to imply that pitchers deserve a larger slice of the pie today than they did a decade ago because they are recording far more outs without the aid of their defense.

Miffleball
Guest
Miffleball
1 year 8 months ago

I have to admit that I’ve found this entire conversation fascinating as to how to better integrate WAR components and I think that it is critical to address two points in passans article that Dave skimmed over – positioning and stop watches.
In regard to the first, by not accounting for the placement of a defender on the field based on scouting and hit patterns it makes what we’re otherwise difficult catches appear easy and vice versa.
As the second goes, passan described his interview with either uzr or drs regarding how they determine degree of difficulty (basically the eye test).
Between the two, passan suggested that there is a sufficient degree of imprecise valuation going into dWAR that it is potentially worth regressing to the mean until we have the data capabilities to more effectively quantify what is actually happening (perhaps much in the way that pitchfx has changed our ability to evaluate pitchers).
Between the fact that sample sizes are too small to suggest accuracy in combination with the imprecise gathering, it is a valid suggestion that Heyward or Gordon are being overly rewarded and kemp overly penalized. Unfortunately, I think there probably isn’t a better approach, but given the size of the error bars on dWAR, some sort of regression among fielders is probably reasonable until the error bars can be narrowed.

Miffleball
Guest
Miffleball
1 year 8 months ago

The other point that might be worth addressing is that WAR is treated as a zero sum stat on the presumption that it actually equates to wins. However, if you add teams total WAR and compare it to win total you get widely variable suggestions of what replacement level is (low 30s to high 40s). Given that it doesn’t actually correlate, perhaps there is room for rethinking the need for a zero sum system

Andy
Guest
Andy
1 year 8 months ago

I think that’s basically a sequencing problem. Individual WAR is determined from the run value of individual events, but depending on luck, a) the sum of the run values may not equal the actual runs scored; and b) the pythagorean runs may not equal the wins.

It remains the case that the total WAR in any season is fixed, and gives a fairly highly consistent estimate of replacement value.

BenRevereDoesSteroids
Guest
BenRevereDoesSteroids
1 year 8 months ago

I can’t wait for MLBAM to save us from DRS and UZR. The thing is, as it stands now, people can’t help but be skeptical of the two. There is little to no transparency for the two, so what are we supposed to think? I mean, right now there are guy with a 10+ run discrepancy between their DRS and UZR (like Andrelton Simmons). That is a full win. What are we even suppose to think? It isn’t like we can just go break down those numbers. We basically have to guess which one is closer.

Marco
Guest
Marco
1 year 8 months ago

Others have said it, but I’ll pile on:

The issue is not the size of the defensive contribution relative to other parts of the game, it’s the volatility of the defensive contribution which leads to funky results on a year to year basis.

Zimmerman’s article on this recently (http://www.fangraphs.com/blogs/alex-gordon-uzr-and-bad-left-field-defense/) is superb.

Joe
Guest
Joe
1 year 8 months ago

Suggestion – stop whinning?

Passan is the worst writer on yahoo. 95% of his expert predictions are incorrect.

I disagree that defense being harder to measure makes it less significant. True it brings in some error possibility, but diminishing it in WAR will make WAR in error on purpose.

Also, he doesn’t argue that defense is overvalued, just that it shouldn’t count as much because it’s harder to measure.

Whether a fielder got the ball due to superior positioning or chasing it down a long way is also irrelevant. The value of the out is equal. That’s like letting a pitcher off the hook for a homer because a catcher called for a stupid pitch.

byron
Member
byron
1 year 8 months ago

With strikeouts on the rise, it only makes sense that pitchers would be responsible for more run prevention than they used to be. There are fewer balls in play than there used to be, and there are fewer baserunners, so a hit dropping in is less damaging than it used to be.

Scrapper
Guest
Scrapper
1 year 8 months ago

The comments could have probably used a moderator since there are so many issues being thrown around, and its almost impossible to direct the dialogue into something useful. Hopefully Dave and Jeff can address some of these points in the upcoming podcast. I’ve not listened to these podcasts in the past but I’m planning on making an exception for this one.

Andy
Guest
Andy
1 year 8 months ago

Yes, I would like to see a forum where we could comment on these articles. The way it works now, there is a surge of comments the first few hours, and it’s very hard to keep up. Then within a day or two, people move on to the next article. On a forum, where the topic remains on the front page for an extended period of time, continuing responses and debate I think are encouraged.

I think it would also be easier to follow if comments were listed directly in order of being made, rather than having responses to one comment being listed right after that comment and before the following comment. If the comment being responded to is quoted–another feature we need to be enabled–then it isn’t necessary to see the original posting of that comment to follow the discussion. In this way, one can jump into the discussion a little late and not have to go through the entire list of comments. Sometimes that isn’t enough and one has to scroll back a little, but the point is one doesn’t have to read through the entire thread.

And an edit function, please.

ralph
Guest
ralph
1 year 8 months ago

Agreed, if you want to discuss anything, you better get in early.

One thing that might be helpful, though not perfect — the link to the RSS 2.0 feed at the top of the comments gives you a strictly chronological view.

Kevin
Guest
Kevin
1 year 8 months ago

Isn’t this all going to change once we get the info Passon thinks we need, i.e. exact speed a ball leaves a bat, how quickly the fielder gets a jump, the angle on the ball, the efficiency with which he reaches it? Shouldn’t we have that in a few years? In the interim the current model is better than anything else. I mean why invest/stress out about perfecting the typewriter when the computer is about to be invented.

But that whole payroll argument was a little off. General managers weighing big bats (not defense) more than WAR accounts for why payroll is 57/43.

Stats 101
Guest
Stats 101
1 year 8 months ago

If MLB currently pays hitters/pitchers at a 57/43 ratio and pitchers are known to be more volatile commodities (for performance/repeat ability/injury/etc) than finance would dictate a lower price relative to a hitter with lower volatility. This would imply run prevention for defense is paid for at a rate of strictly less than 14%.

Jackie T.
Member
Member
Jackie T.
1 year 8 months ago

Should we also regress Chris Davis’ 53 home runs last year? We know that was fluky, and not representative of his true talent level, even though it did actually happen. Is it really that different than Alex Gordon getting and converting an inordinate number of chances in LF this season? We can all agree neither last year’s Davis production on offense or this year’s Gordon production on defense is representative of their true talent, but the events still happened.

Jamie
Guest
Jamie
1 year 8 months ago

I think a lot of this comes down to a how we once understood defense and how we are beginning to understand defense. It used to be thought that get defense could be captured by highlight reel plays. Think Derek Jeter and his jump throws, or Hecheveria and the diving stops he makes. Torii Hunter fits here as well. These guys all grade out as not particularly great defenders by the metrics. Partly getting their former reputations from making the easy ones look hard.
Now we know that guys like Gordon and Lagares are phenomenal defenders for the bases not taken because of their great arms. Or because they can get to a ball many defenders don’t get to, but make the play look easy. Their is tremendous value in that, but there are no highlights in acquiring that value. It is in watching these players daily that you gain, the appreciation

Richie
Member
Richie
1 year 8 months ago

Don’t feel like wading through 144+ posts, so sorry (not really) if someone mentioned this before. But the 43% to pitchers also includes their fielding and (half) their batting performance. I’d suggest that shifts the pitching:defense ratio from 43:7 to more like 41:9.

Luke
Guest
Luke
1 year 8 months ago

I believe that someday, WAR will be essentially perfect. Don’t know when, maybe 10 years, maybe 30, but we’ll get there. We have the technology.

Someday, we will know the expected run value of every single ball in play. Pitchers and hitters will be rewarded/punished based on that expected run value, regardless of whether the fielder makes the play. The fielder will be credited by how much he changes the run value of the ball in play by making or not making the play.

I don’t think we’ll ever have a consistent way of knowing whether to credit an individual fielder or a fielding coach for good (or lucky) defensive positioning. I’m fine with that. WAR is not predictive, and should not attempt to answer the question, “Would player X have fielded so well if he had a different fielding coach?” GMs will want to answer that question before paying a player for his defense, but WAR does not need to do so.

Park effects will always be an issue. Should a hitter be given full credit (and a pitcher full punishment) for hitting a home run that barely clears the right field fence in Yankee Stadium, when we KNOW that that same batted ball would likely be an out or double in other parks? I’m not sure. But, I do think someday we will know the league average value of all such batted balls, and will be able to calculate both a park-independent and park-dependent WAR. That’ll be cool.

Because I think someday our batted ball tools will be much better, and WAR will be essentially perfect, I really can’t get that interested in these arguments about what we should do in the meantime while we have imperfect tools. Every proposed solution I have seen has been arbitrary (e.g. 50/50 split between RA9-WAR and FIP-WAR, regression of fielding metrics by some arbitrary amount, etc.). The current method may be inaccurate, but it is not arbitrary. It’s just using the best guess we have based on the best tools we have.

Relax, peanut gallery. When the tools get better, WAR will get better.

Green Mountain Boy
Guest
Green Mountain Boy
1 year 8 months ago

I’ve always felt that defensive ability plays too large a role, not only in calculating WAR, but in constructing a MLB roster. In an ideal world, would any team rather have a .280/.340/.450 shortstop as opposed to a .240/.300/.380, but better defensive shortstop? Of course! What I’ve always wondered about though and I’ve yet to see addressed, is at what point is there an even balance between offensive and defensive skill. For example, is the .280/.340/.450 offensive-based player equaled by the .240/.300/.380 defensive genius, or is that point more like .270/.330/.425? My eyes tell me the latter, although I’ve never seen the stats to back it up. I guess I’m saying that a 10% increase in offensive capability more than outweighs a 10% increase in defensive capability. After all, how many balls does ANY elite defender at any position handle more than an average defender? Maybe 1 ever 3-4-5 games? For me, give me the offensively superior player any time.

EXECPT for catchers. The ability to call pitches and have a feel for a pitcher and the game, along with the ability to block/frame pitches effectively HAS to be more of a factor than hitting alone, but only because the catcher is involved in every pitch, as opposed to the position player who may only be involved in 5 PO/A per game.

I think the proof of what my eyes tell me is Todd Walker back when he was playing 2B. Did he not get to a ball every 2 or 3 games that a defensive guru would have? Sure. But that was more than offset by the fact that he was batting 4-5 times per game and could hit not only poor/average pitchers, but elite pitchers as well.

For that matter, why hasn’t there been a study of the “tipping point” for hitters? Who’s more valuable, the guy who can hit Kershaw et al as well as the replacement level guys, or the guy who can only hit replacement level guys, given they both end up with the same slash line at the end of the year? I’ll go for the guy who can hit the best every time. You see this to its extreme in HS baseball. Tons of kids rake against average pitchers, but who continues to rake against the best? I’ll take THAT kid and his future.

Cool Lester Smooth
Guest
Cool Lester Smooth
1 year 8 months ago

So would you say that the “average” from which wRAA should be calculated is the average (weighted by plate appearances) wOBA against of the pitchers a batter faces, rather than adjusted league wOBA?

That’s a good idea, but I’m not sure whether the additional accuracy would be enough to justify how much more complicated it would be to calculate.

Dead Serious
Guest
Dead Serious
1 year 8 months ago

Also, I don’t think that the WAR split needs to be changed. I think the most common issues people are coming up with is the calculation of the Def statistic, and specifically the calculations used in the smaller defensive statistics inside. I think Andrew McCutchen is a prime example. One could argue that he is being hurt considerably because he has had two players that are excellent defenders in their own right, in Starling Marte and Gregory Polanco, playing next to him for much of the season. This, combined with the Pirates pitching philosophy and positioning requirements, has greatly reduced the amount of plays that McCutchen has had the opportunity to make. RZR isn’t a perfect statistic, but it does show that when McCutchen has needed to make an out, he’s been “Excellent” which is anything .940 or above. So because of the way Def Runs Saved is calculated, McCutchen is penalized greatly because he’s 1) not required to venture into another fielders zone to make outs, due to having mostly excellent fielders around him and 2) not having nearly as many opportunities as other CF’s to make “great” or “difficult” plays, due to pitching philosophy. Now I’m not saying that McCutchen is a well above average fielder, or that he’s being robbed of credit somehow. But what I am saying is that he’s certainly not a -10 runs below average CF. But he’s being treated as one due to things out of his control that contribute to his Def which calculate his WAR. Basically he’s being penalized for not having the opportunity to make plays that other CF’s this have.

pft
Guest
pft
1 year 8 months ago

The biggest problem with WAR is the positional adjustments and not necessarily the defensive weighting, although using 1 year fielding metrics does seem to suggest regression is in order. MGL suggested 50%

Park adjustments also seem rather crude if they do not take into account a batters handedness or batted ball profile

To me WAR has an accuracy +/- 50% across positions and +/- 30% same position. I like the concept but wont take it as gospel since the execution is too crude

Brandon
Guest
Brandon
1 year 8 months ago

I like it the way it is.

BDF
Guest
BDF
1 year 8 months ago

Love this stuff. WAR really brings out people’s true colors. Love the guy arguing that WAR should be based on xBABIP not wRC+. It only sounds insane. At some point we’ll develop a pitch-level analytics that will enable us to develop stats based on individual pitches–incorporating, presumably, at some point, sequencing in addition to velocity, movement, and deception–and not on any single result or string of results.

Doug Lampert
Guest
Doug Lampert
1 year 8 months ago

Adding framing will increase the fraction of pitching outcomes assigned to position players by some amount.

But if we want to remain at 50% hitting 7% defense 43% pitching then it follows that UZR or other current defensive metrics must decline in contribution to keep the current total division of credit since framing moves some credit from pitchers to defense. To the extent that things like payroll and pitcher effects when changing teams provide some validation for a 57/43 ratio, that same data is support is for reducing the UZR component rather than the pitcher component when including framing.

RMR
Guest
1 year 8 months ago

WAR is an attempt to measure performance holistically. If regressing defensive performance measurement gets us more accurate measures of defensive performance, by all means, do it. But I’m not seeing complaints that the defensive measures are systematically biased in some way that should be corrected. And if they are systematically biased, it’s not clear to me why regressing to the mean is the right solution — doesn’t that just “punish” everybody, leaving the bias undressed, and creating a different source of error?

Bastardizing the conceptual logic of the metric so that it conforms to our pre-existing belief system is precisely what sabermetrics is not about. Fiddling with a metric’s does not strike me as the right way to go about fixing its accuracy problem. We should not seek some arbitrary break down of contribution, but continue to measure actual production to the best of our ability and let the chits fall where they may.

Kyle
Member
Kyle
1 year 8 months ago

The argument for regression is partially about systematic bias, and it’s partially about randomness and volatility that occurs from a weak evaluation metric.

As has been pointed out in this thread, measuring a fielder’s performance requires an equation which requires a model for measuring the likelihood that a fielder will make a play, which is then compared to the actual result of making the play, and measured by evaluating the hypothetical impact that has on runs.

If your model for looking at a batted ball and determining the probability of each outcome has a lot of variance and unpredictability, you are going to wind up with inaccuracies. Imagine if it was a completely random assignment, or if it was a constant value. Those would be terrible models, but they might not actually be biased in any particular direction. They would just be incorrect, and would require a lot more time to regress to a more “true” value than one that did a better evaluation.

The creator of UZR said that you need 3 years of UZR data to evaluate a fielder. Shouldn’t we consider the possibility that randomness is playing too big a role in our model of player value?

Mark L
Guest
Mark L
1 year 8 months ago

How about someone with a lot of time on their hands goes through every play that has affected Alex Gordon’s WAR this year and see how closely it matches to the scores he’s received for them?

Plucky
Guest
Plucky
1 year 8 months ago

What you have described is exactly what DRS is: http://www.fieldingbible.com/summary.asp

Dr. Obvious
Guest
Dr. Obvious
1 year 8 months ago

I modified the WAR system for my DMB league..
#1 – I first assumed a team of replacement players would go 50-112 and based total WAR available off of that

#2 – Batting 55% of available, pitching 45% of available, fielding total 0 (My theory is a replacement level player can make average amount of fielding plays since most players are in majors due to their offensive skills) – adjusting for positional value on fielders with DH’s getting a negative fielding WAR for their time in line-up
I use stolen base runs to add/subtract for players & the catchers allowed (nets zero)

#3 – I do other adjustments within the calculations but the bottom line is all the hitters contribution equals 17.6 WAR for each 162 games played, pitchers contribution equals 14.4 WAR per 162 games. Fielders net 0 WAR (before the adjustment for steals)

What I end up is elite outfielders seem to max out around 2 fielding WAR, infielders 3 fielding WAR, catchers around 3 fielding WAR

Positional Adjustments
Guest
Positional Adjustments
1 year 8 months ago

I love WAR, but I have mixed feelings about positional adjustments, particularly with regard to designated hitters. Some teams happen to have players who are better fielders at, say, first base than players whose bats might have kept them playing at first had there not been both a better fielder available and a worse player simultaneously serving as a DH. It seems strange to punish DHs to the degree that they currently are for not playing the field.

noseeum
Guest
noseeum
Guest
noseeum
1 year 8 months ago

Shoot I screwed up that link. If only we had preview. Try that again.

noseeum
Guest
noseeum
1 year 8 months ago

Shoot I screwed up that link. If only we had preview. Try that again.

Daniel Kopf
Guest
Daniel Kopf
1 year 8 months ago

I have also been curious about WAR’s defensive calculation and consistency from year to year of defensive WAR and baserunning WAR versus offensive WAR.

To my surprise, there is no more difference in how well an individuals z-score defensively in the previous year predicts their z-score in the following year than if you do this for offense. Its actually a bit of opposite.

I am happy to share the datasets and code if anyone is interested. I did this on 2012 to 2013 data. It was not different if we looked at players who changed teams.

This would suggest that it is not variance that would be the cause for concern about defensive metrics, but bias.

Using only 2012 as a predictor on 2013, the r-squared for defense was .47 for players who remained on the same team and .24 for players that changed teams.

For the same calculation for offense the r-squared for defense was .29 and .17 respectively.

Only players with a substantial number of games played were included in the analysis.

Paul E
Guest
Paul E
1 year 8 months ago

How about the difference between an earned run and an unearned run as the basis for the ratio?

They bat for 9 innings and they pitch for nine innings. What’s the standard deviation on that? 3%? 53/47 or 54/46 – that’s it. Nothing more. Does anyone find it absurd that by some WAR metrics (baseball-reference) Jason Heyward is supposed to be havingas good a season as the best hitter in the NL (fellow RF G. Stanton)?

Dave, I believe the payroll nuimbers and ratios are mere coincidence – no shit.

telejeff
Member
telejeff
1 year 8 months ago

This was a great discussion. Many days later, I am still thinking about it. Because so many days have passed, I do not know if anybody will read this post, but I still wanted to put it out there.

First, much of the argument to reduce the value of defense in WAR strikes me as antithetcal to the essence of statistical analysis. One of the biggest objections is that we don’t like what WAR is telling us because we don’t believe subjectively that defense could be so important. However, as Robert Arthur demonstrated, the model is acutally more accurate (despcriptive, or predictve, whichever description works best) if defense is given MORE weight. So, the problem would appear to be with our subjective beliefs, and not WAR.

Second, there are very clearly problems with the measurements we use for evaluating defense. That is a great argument for improving the measures of defensive effectiveness. It is not, however, a logical argument for devaluing the importance of defense in WAR.

Defense isn’t made less important just because we aren’t very good at measuring it. By analogy, it clearly would not have made sense years ago to de-value pitching in our attempt to model baseball simply because ERA (and Pitcher Wins) were (and are) flawed.

I don’t have any great suggestions for improving the measurement of fielding. As to the best way to derive the appropriate weighting for fielding, I think Robert Arthur is on to something. Comparing WAR predictions with different weighting for defense to large bodies of results should provide good insight into the appropriate weighting for defense. And, it likely is higher rather than lower compared to current weighting.

At the most basic level, fielding is obviously extremely important. If there were no fielders, every ball in play would be a home run. Many of the comments correctly note, however, that the right way to think about weighting defense is the difference between major league fielders and replacement-level fielders.

The comparison with replacment-level fielders must recognize, however, that many current major leaguers are worse than replacement-level fielders (but play in MLB because their negative fielding impact is perceived to be outweighed by their hitting impact).

Measuring fielding, it seems to me, should also take into account SLG in addition to BABIP (which doesn’t differentiate between singles, doubles, and triples). Maybe we could create something to be SLGBIP (slugging percentage on balls in play). All fielders are taught from an early age to minimize the big hits at the expense of singles (e.g., an outfielder letting the ball drop in front of him rather than diving to make the play when he’s not very likely to catch it).

Armour T. Unrue
Guest
Armour T. Unrue
1 year 8 months ago

I don’t think you should change WAR, but you should stop misusing the word “defense”. You wrote, “We think run prevention is 86% pitching and 14% defense.” In fact, run prevention is 100% defense, since the definition of defense (see Official Rule 2.00) is “the team in the field”. Defense is pitching and fielding.

Richard Bergstrom
Guest
1 year 8 months ago

The question that I have is if there is 570 WAR that goes to position players, what percentage of that 570 WAR goes for offensive contributions and what percentage goes to defensive contributions?

Also, what are the +/- extremes on offensive WAR and defensive WAR and the general distribution curves? Maybe it’s just me but it seems fielding WAR has more extremes on the positive side and a vast majority of players seem to have fielding WARs between -1.0 and 0.

wpDiscuz