I think the problem that people have with WAR is that its scope and use make it impossible for them to find statistical ways to back up conclusions they’ve already made based on emotions. I think that’s what these guys like — to be able to never be told they are “wrong.” When you have a hundred data points and none of them answer the overall question, you can “arguably” be right no matter which side you choose, and that’s how these writers make their livings.
Comment by Eminor3rd — February 4, 2013 @ 10:18 am
Great article Dave, but a comment: don’t you think it’s a bit unfair of Caple, and other writers, to criticize WAR users for being slaves to a single stat, when there are so many WAR users that this doesn’t describe? It seemed to me that Caple had a very good handle on the pros and cons of using WAR, and that his two major problems were the confusing different WAR numbers available on B-ref, here, and elsewhere, and the view that WAR users use WAR and only WAR. So I felt that WAR itself didn’t need a defense so much, as the many fans and writers who use WAR correctly needed one. While Caple was relatively fair in addressing some of the weaknesses of WAR while also explaining it’s usefulness, I thought the real problem with his article was not realizing (or maybe more correctly, not including in his article) the fact that there *are* many fans and writers using WAR in a helpful and enlightening way, both here on Fangraphs and on many other sites, and stereotyping them as fans who just stick to a single WAR number for a player not only doesn’t describe them, it’s practically the opposite of what they would tell you (as you just did in this article) WAR is for. I feel like WAR doesn’t need another article lecturing fans not to use it to end arguments, it needs someone to explain to writers like Caple (and from what you say, and the general fairness of his article, he’s obviously not the biggest problem out there, just the writer in question today) that WAR’s biggest and most knowledgable supporters *don’t* use it to end arguments, and not only that, continually preach that mentality to amateur saber-fans in order to help WAR be used correctly more widely.
I have to admit I was surprised you were so gracious with Mr. Caple. My first reading of his piece was that it was another example of the strawman argument. Does any reputable sabr-oriented writer every say that we should only use WAR? Or that other states are totally useless? I just get sick of writers finding one headline that is probably written by someone who doesn’t understand the subject matter and making that the position that needs to be defeated.
P.S. Jim Caple has also suggested, numerous times, that runs are the best way to judge a player’s value. Just saying.
There is simply no way to say this better. Thank you.
Comment by novaether — February 4, 2013 @ 10:21 am
What is the deal with Caple saying that no one knows how to calculate WAR, and therefore it is not particularly useful? (1) Someone clearly knows how to calculate it – he mentioned the B-Ref explanation specifically. (2) Just because you can’t calculate it easily doesn’t mean it isn’t calculable. I can’t calculate in my head if a train leaves Cleveland at 2:30 and another leaves Chicago at 2:50 and the first is travelling 55 m.p.h. and the second is travelling 45 m.ph. where and what time they will meet, but that doesn’t mean it isn’t calculable. “Hard” does not equal impossible.
The issue with WAR is it has no relation to reality. Adding up the WAR doesn’t equal wins for any given tream. Trout didn’t win an extra 10 games for his team, he might have won only 2-3 or 25 through his contributions. After all, Trout was a replacement player since he was a rookie. That would seem to make his contibution even greater than 10 wins? Who knows? It’s nice to have an overall statistic or measure to judge how good a season the player had, but really, wins above replacement?? What does that mean??
Comment by Hurtlockertwo — February 4, 2013 @ 10:24 am
No methodological flaw necessarily, but there are certainly improvements to be made. If the point is to measure value, and it doesn’t accurately measure value, then it’s not perfect, and if it’s not perfect, then you could say it’s flawed. I understand your point of course, but we shouldn’t use that as a reason to stop talking about ways to improve WAR.
WAR means “Wins Above Replacement”, so if we’re not sure that it precisely and accurately measures the wins above a replacement player, can’t we say that it is flawed? I’d guess neither WAR formula correctly calculates the actual number of wins a player provides over a theoretical replacement. There is probably some refining that can be done, so I think it’s fair to call it flawed.
That’s not exactly true. The very subjective defensive component of WAR is a flaw. In other words, if one uses a hammer for a screw, it’s the users fault, but if the hammer’s design causes it to break the nail on occasion, the tool, no matter how useful, does have a flaw.
Comment by Matty Brown — February 4, 2013 @ 10:30 am
It’s based on linear weights, which means it’s a context neutral statistic. Basically, if you assign a value to all the things he did (doubles, homers, stolen bases, etc.) that is based on the AVERAGE run value of that event, you come up with ten wins. You’re right — the ACTUAL effect of his contributions is entirely based on context (how many runners are on base, leverage index, etc.), but if you are comparing the play of two guys, you don’t want to give one credit for the situations he found himself in, which is why RBI, Runs, and Wins are bad stats.
Comment by Eminor3rd — February 4, 2013 @ 10:31 am
If there’s a flaw in how it’s used, it’s that people don’t adequately represent the uncertainty surrounding the stat. If you think that one year of defensive stats has a +/- 5 run level of uncertainty, then you should at least represent a 4-WAR player (for example) as a 3.5-4.5 WAR player.
Dave called it “the perfect tool to provide a general imprecise answer” to the question of how good a player is. So people should just keep that imprecision in mind when they talk about WAR.
Comment by Well-Beered Englishman — February 4, 2013 @ 10:37 am
It adds up to wins ABOVE REPLACEMENT. A replacement level team will win somewhere in the neighborhood of 46 games. Add that to the total WAR of a team will get you extremely close to their actual win total. The Braves had 29 offensive WAR and 18.5 pitching WAR. Add that to 46 replacement wins gives you 93.5 wins, i.e. extremely close to their actual win total of 94. So yes, WAR does have a very close relation to reality.
It’s how many wins he contributed to the team over someone like Jayson Nix or Willie Bloomquist would have contributed. So if the Angels started one of those guys in left or center field every day, and Mike Trout never existed, they would have won about 10 fewer games.
Well, the defensive methodology is completely wrong, not in the sense that it doesn’t approximate value, but that it approximates that some positions are worth negative wins. But how can a position that is necessary to play the game be worth negative wins?
Such methodology should at least give anyone pause about what the model is modeling? Of course, some of these models that don’t correspond to truth still happen to be incredibly accurate in terms of their predictive capacities. Like ptolemy’s model for the movement of planets.
Comment by Crumpled Stiltskin — February 4, 2013 @ 10:44 am
Yeah Jim’s arguments broke down into
1) WAR isn’t a simple X/Y calculation, if it isn’t easy enough for anyone to do then it’s not a good stat. “If we can’t figure a stat out on our own, then how do we verify whether it is accurate?”
2) fWar and bWar differ, so neither are valid. This would be like the world rejecting electricity until Tesla and Edison could agree on whether we should use DC or AC. Just because two groups are aiming towards the same goal, a stat that helps describe a players overall contribution, and are going about it slightly differently that doesn’t invalidate both.
3) Defense is hard to measure. Again Jim goes back to the H/AB argument and yearns for more stats which an 8 year old can figure out. Just because something is hard to measure doesn’t mean you shoudn’t try.
This ESPN article was so infuriating to read. I’m glad Dave did a follow up, though I’d prefer he come out stronger against such non-sense.
so would you rather have as a starting short stop: Ryan or Jeter? when i saw that question i said Ryan based on price, but the more I think of it i still prefer Ryan even without worrying about the budget.
What’s worse in my opinion is his argument that stats like AVG “are based entirely on indisputable math calculations” while “WAR has an element of theory and assumption to it.”
As Dave notes above, AVG is based on simple math calculations because it answers a very narrow question — how often did that player get a hit (excluding a bunch of plate appearances for various reasons). It does not tell you whether a player with a high AVG is strictly better than a player with a lower AVG. In order to make that judgment just based on AVG, you need to add a bunch of (flawed) theories and assumptions about the importance of getting hits, ignoring walks, extra-base hits, etc.
If you want to answer an important question, like is Player A better than Player B, then you have to add theories and assumptions. The strength of WAR is that it uses theories and assumptions that are supported by years and years of data.
Some positions have negative defensive value because of the relative skill needed to play the position. I guess you could just shift the defensive runs added to players’ totals but that wouldn’t change the end result.
I think you’re misinterpreting how the defensive adjustments work. WAR builds in value just for playing time, so the defensive adjustments subtract from that baseline value. A league-average fielder and hitter would never have negative value, regardless of what position he plays.
I’d like to point out that WAR should not be used to answer “How good is he?” but “How good was he?” Even if you accept that UBR, UZR, etc. are accurate measures of what happened, those numbers are less projetible (i.e., the aren’t as at predicting the future) than HR/PA, K%, BB%. To answer the question “How good is he?” we want a true talent metric that deals with uncertainty about all stats to the degree that they are uncertain measures of talent.
Comment by philosofool — February 4, 2013 @ 11:00 am
Comment by Eminor3rd — February 4, 2013 @ 11:02 am
Comment by commenter #1 — February 4, 2013 @ 11:04 am
Just for fun to end the MVP debate. Cabrera shall play center field next year to validate his award.
“After all, Morris +56.9 WAR is nearly a dead on tie for Sandy Koufax‘s +57.3 WAR, and no one’s trying to get Koufax evicted from Cooperstown. Even the staunchest defender of advanced metrics would agree with the notion that a player’s Hall of Fame case deserves more consideration than simply looking at a player’s WAR and calling it a day.”
But that applies to every stat.
Ozzie Smith got into the HoF on the first ballot @ 92%. Buddy Bell, Steve Finley, Steve Garvey, Julio Franco, etc all had more hits and none got in on the first ballot let alone have a good shot of ever getting in…
Ralph Kiner is in the HoF with 369 HRs– but not Norm Cash, Matt Williams or Frank Howard, each with more.
Every stat has the same ‘one stat isn’t enough’ issue and I think you could make a pretty strong case that WAR has less of that issue.
Yeah, I don’t know of anybody who claims WAR is THE definitive answer and uses it as such. Sure, a lot of people cited WAR to support a Trout-over-Cabrera case, but then they would give the reason why WAR they liked Trout better (defense and speed) and this was the really backbone of their argument. Presumably, it’s the same reason why Caple liked Trout.
You have to look at a lot of things. View count, word length, comment total, etc. And then you really need to consider intangibles like style guide.
Comment by Eminor3rd — February 4, 2013 @ 11:30 am
I’ll never understand why they make a point of saying UZR requires 2-and-change seasons of data to be fully accurate, but then they go ahead and use single season UZR in a WAR score anyway.
So Fangraphs, why do you do this? Why not keep a 2-and-change rolling average as the players definitive UZR? Sure we wouldn’t know rookies like Trout’s WAR for 3 years, but it would be totally worth it so we wouldn’t see things like a 5.9 WAR for Martin Prado in 2012, which was totally fueled by an inaccurate 17.8 UZR, because his previous two UZRs were 6.1 and -3.6. So no, Prado was never actually a 6 win player at any point last year, yet people who don’t understand the UZR flaw in fWAR will scream bloody murder if you try and tell them this. Would love an answer to this…
Comment by Forrest Gumption — February 4, 2013 @ 11:31 am
Jeter, because then my team would be surrounding him with excellent players the way the Yankees do — that team would BE the Yankees, after all — rather than the duds Ryan takes the field with.
This is a perfect example of using WAR to answer the wrong question. It does a good job (although IMO one still needing improvement) of measuring the individual contribution of players in an independent, stand-alone context. Many questions of interest to baseball fans are subject to answers obtained in that isolable kind of analysis. But many are not. Baseball, like any team sport, is a non-linear system where overall results are not the mere superposition of isolated individual achievements. Consider a simple play: catcher calls for a slider low and away; pitcher throws it about an inch from where it’s called; batter makes contact and grounds weakly to short, but manages to beat the throw to first and gets an infield single. How many different players’ skills went into creating the outcome of that play? Answer: no fewer than seven(!), since the named four all have obvious roles, but so does the first baseman (who may or may not have been able to stretch enough to record the out if he was better defensively) and the 2B and 3B (whose defensive strengths or shortcomings contribute significantly to where the SS was positioned). WAR does not, cannot, look at each of these contributions to the play. It isn’t designed to, and is hardly to be blamed for that.
Caple’s article misses the key point. WAR helps in understanding the forest, if not all the trees. Jack Morris was probably a significantly lesser pitcher than Chuck Finley or Bret Saberhagen and definitely was a much lesser pitcher than Curt Schilling. Mike Trout was clearly the best player in the American League last year. Whichever version of WAR you look at it will help in seeing how and why.
Those of us who use it regularly are aware that it is imperfect, even on its own terms. fWAR (for instance) suggests that Jack Morris was much more valuable over his career than Mariano Rivera or Mickey Cochrane; we know that WAR has a particular difficulty in measuring the contributions of ace relievers and catchers, and so we take this with a large grain of salt.
Comment by Mike Green — February 4, 2013 @ 11:44 am
Just a general comment. WAR is a framework, you can put whatever you want in it. And as it is constructed at Fangraphs, the offensive component is linear weights of a few stats, so it isn’t just one stat. It is the combination of many weighted stats.
It’s not that *UZR* requires 2+ years to be accurate. It’s that *any* defensive metric does.
Imagine a world in which we have a 100% perfect defensive metric. We know, without any doubt whatsoever, who was best in a season at playing defense. We know exactly where the balls went, how hard they were hit, who got the best jumps, etc. Everything. Anything you can think of that would accurately measure what happened.
You’d *still* need 2+ years. The problem is not with UZR; the problem is with the nature of defense (and the number of chances people get and the number needed to have a good idea of true talent).
The rest of your comment is fair though. Using a single season of a defensive metric in WAR doesn’t make a lot of sense.
Oh I totally understand why it takes 2 years, I just don’t understand why they use the single-season one when adding up the single-season WAR total. It ends up with an inaccurate final WAR total.
Comment by Forrest Gumption — February 4, 2013 @ 11:50 am
It’s the equivalent of using Wikipedia for an academic paper.
Comment by KCDaveInLA — February 4, 2013 @ 11:52 am
It’s not _subjective_.
It would be subjective if it were based on an individual judgment. What scouts tell us is subjective. Likewise, what no-nothing observers tell us is subjective.
It is _objective_ because is it based on a set of criteria that don’t differ from person to person.
It is also relative, since it is based on a hypothetical league average performance. But that hypothetical player is based on objective criteria in the sense that each of us is using the same hypothetical player as a basis for comparison.
Whether comparison to such a hypothetical player is _accurate_ is a further question. This matter is _complex_ and _difficult to assess_.
Complex, difficult to assess, relative comparisons of objective observations may or may not be accurate. They’re not subjective.
Comment by philosofool — February 4, 2013 @ 11:53 am
Exactly. It’s called a straw-man argument. I could write two thousand words about how there just isn’t a correlation between games interrupted by squirrels in the outfield and gap doubles in said games–contrary to what others might say.
I think anyone who actually understands WAR knows it’s limitations and never uses it as a definitive “be-all-end-all” stat. Of course there are some that don’t and spew it ad-nauseum, and those people give the rest of us a bad rap.
It’s also very understandable for Jurassic baseball writers and baseball people to want to defend the type of stats they’ve been using for the last 100 years and be very resistant to change, and therefore create strawman arguements about “all those people who just use WAR.”
It’s never a problem with the stats themselves, it’s what people understand and don’t understand about the stats that makes arguements about WAR, RBI, pitcher wins etc utterly frustrating
Comment by Johnhavok — February 4, 2013 @ 11:54 am
WAR is not a perfect statistic, but NO STATISTIC IS. The question is, is WAR a better metric for evaluation of a player than, say, batting average and RBIs and stolen bases put together — the more common base statistics we are more familiar with.
Comment by Dan Rozenson — February 4, 2013 @ 11:55 am
WAR is a useful (very useful) stat, but flawed like all stats). I think that one aspect that bothers people is a weakness similar to that of OPS: it’s a combination of numbers that aren’t really commensurate. It’s less obvious than in the case of OPS, though. Here’s the basic issue: the best hitter in baseball is a major leaguer, with or without the DH; the best fielder may not be. Sure Chet Lemon’s fielding was as good as Jim Rice’s hitting – but it could have been replaced by trading for Gary Pettis and bringing him up from A ball, whereas there was nobody available who could hit like Rice.
All-Star week would be much more fun if there was an exhibition where stars played all the wrong positions. Prince Fielder in right field, Miggy in left, and the National League infield would be studded with wild throwing arms so first baseman Jose Altuve would be constantly leaping off the bag.
Comment by Well-Beered Englishman — February 4, 2013 @ 12:08 pm
Why not include error bars in the graphs or at least some sort of range on expected outcomes? Are we talking +- 1 WAR, such that, a 5 WAR player could have been anywhere form a 4-6 WAR player? Is it more precise? Less? A solid rule of thumb can really go a long way here for developing tiers of players instead of just relying on the unknown uncertainty.
Comment by Sandy Kazmir — February 4, 2013 @ 12:09 pm
Maybe a stat that does show the “real” contribution may be more accurate?? When we played baseball (I assume you did)
there were players that had good stats but did not really make a difference in the team winning. Eminor, I understand what you are saying, but doesn’t context mean anything?
That was my point.
Comment by Hurtlockertwo — February 4, 2013 @ 12:11 pm
and home field advantage in the world series decided by whichever right fielder throws out mike trout going from first to third
Comment by commenter #1 — February 4, 2013 @ 12:15 pm
Well yeah, but the whole point is if you have a whole slew of simply-calculated hit stats you can really effectively see how good a hitter is. Line up his avg, obp, slugging percentage, home runs, doubles, steals, caught steals and BABIP and you can pretty effecitively see how good the person is. WAR or wOBA simplifies all these stats into one number, but it also masks where the players strengths and weaknesses lie. I think the main reason WAR has taken off is because we don’t have similar ways to measure defense and baserunning – if they could be measured as effectively as hitting can be, there would be less of a need for a value stat like WAR.
Comment by Dan the Mets Fan — February 4, 2013 @ 12:24 pm
I agree on RBIs and Runs, but you don’t need WAR to make a context neutral judgement on offense stats – just look at avg, obp, slugging percentage, home runs, doubles, steals, caught steals and BABIP all in a line and you will have a pretty good idea of the player’s offensive capabilities.
Comment by Dan the Mets Fan — February 4, 2013 @ 12:26 pm
Those are positional adjustments. For example, the bar for a DH is pretty high as all they have to do is hit. The bar for a SS is much lower, as it is harder to find someone who can play the position AND hit.
Every time someone suggests that WAR shouldn’t be the only “stat” at which we look, I ask myself, “Who are the people who only look at WAR?” I spend a lot of time on Fangraphs and sabr-friendly sites and I don’t know a single person who would advocate for WAR like that. I’d like to see someone point to an example of someone openly using that method before I have to read another one of these anti-WAR pieces.
When presented with a metric that makes little sense to me in my work environment, I have often asked the question, “what question are you trying to answer?” It serves as a very good way to get to the heart of the matter and suggesting the metric someone should be using or what I need to do to create the metric they need. I often use the hammer/screwdriver analogy, at the end of the discussion. “Right tool for the right job.”
It’s always nice to see my thoughts mirrored by other people I respect. The world would be a better place if more people knew this. Great article, Dave.
It was a pretty terrible article by Caple when you look at it. He spent a bit of time criticizing saber people for using WAR to much, and points out the ESPNLA link where they say Trout > Miggy because of WAR. And he complains that in the Trout argument people basically just used WAR and ignored everything else. So towards the end he makes his big point about how you have to look at other stats.
And it’s just hilarious, because if you look at the Trout article he’s criticizing, they use the following stats to prove their case in addition to WAR:
BA, OBP, ISO, RBI (explaining why hitting fourth provides more opportunities than leading off), Defence, base running, GIDP, running 1st–>3rd and 2nd to home. Pretty much everything Caple asked that people do in order to have a meaningful discussion.
What the article comes off as is Caple doesn’t like WAR and is tired of people using it in arguments. If it was really about WAR being used too much, then his evidence (the LA ESPN Trout article) was an extremely poor choice, given that they merely introduce the topic through WAR, and expanded on it to make a pretty well rounded and compelling case for Trout.
You’re confusing some arbitrary level of “accuracy” for whatever fully accurate means. It doesn’t mean what you think, nor is past performance ever completely indicative of the actual performance. (How do you determine if hitters in the AL East are suffering due to the Rays innovations in fielder alignment and pitch framing?)
Every single statistic (in anything ever) suffers from MoE. Don’t just randomly say it applies to fielding only.
Those trains won’t meet, because the one that left earlier is also going faster. Assuming you meant the first one was going 45 mph and the second 55 mph, the first train will travel 15 miles in the 20 minutes before the second leaves. The second train then closes that gap by 10 miles per hour, so it will take 1.5 hours to catch up. They will meet at 4:20, 82.5 miles from the station.
All of this is to expand on your argument against Caple: namely, just because something is hard for one person to calculate in their head, that doesn’t mean it’s hard for everyone.
Excellent write Dave. I always got the impression that the FG crew thought WAR was the end all stat. Thanks for clearing that up for me.
Comment by Chicago Mark — February 4, 2013 @ 1:14 pm
Eh, that “luck” aspect is also skewed with slap hitters like Ichiro. Its the skewing of the single-season UZR that makes WAR so imperfect.
*Yes I know its rather lame to compare anyone with Ichiro, or Ichiro to a slap hitter. But if youre not trying to hit HR, your BABIP will always be a lot higher than average, and it has nothing to do with luck. If its your goal to pepper the area just outside of the INF and in the shallow OF, youre going to have a higher average of those dropping in than line drive/power hitters have.
Comment by Forrest Gumption — February 4, 2013 @ 1:15 pm
The argument about WAR, particularly as it relates to the MVP debate, seems to miss the point. Someone who argued for Trout on the basis of WAR could’ve very easily stated that Trout was about as good (or perhaps better) offensively than Cabrera b/c of things like wOBA, wRC+, the number of DPs and other factors. But even if Cabrera was slightly better, Trout’s base running and defensive advantages far outweigh Cabrera’s. That argument is simply WAR broken down into its individual components. If you explain Trout’s advantages that way, it’s fairly clear. WAR, then, simply becomes a shortcut.
The argument against using WAR then simply begs WAR’s defender to break WAR down into its separate components and explain “who is better” more thoroughly. It simply asks the person to explain WHY player X is better than player Y. WAR gets you to the same end point. It just does so without explanation. The explanation is found within the statistic itself. So when you break it down and explain it using multiple statistics, it sounds better to the listener and it becomes a more palatable argument to those who don’t thoroughly understand WAR.
No, I mean that when you say “we need 2+ years of data to get a correct number” but then you use 1 year of data for it, then it will be inaccurate. Martin Prado as an example (but theres plenty others) had a 5.9 WAR, but that was made inaccurate by a skewed single-season UZR. Fangraphs needs to create a rolling 2+ year average UZR and use than instead, but the issue with that is it won’t make rookie WARs available for 2+ years and players who dont play that often wont get a WAR number either. Instead they go with this flawed WAR because every player can get a WAR number. No idea why FanGraphs doesn’t use the other method. Still waiting for an explanation from someone at the site.
Comment by Forrest Gumption — February 4, 2013 @ 1:21 pm
Your point has nothing to do with the weaknesses of WAR, but rather the overvaluing of certain components of roster construction. If a talent that is readily available can create the same value as one that is scarce, that points to an advantage that can be exploited.
Actually raw stats are perfect. HR/BA/AVG/OPS/SLG/SB, hits are hits, its as perfect as it can get.
Comment by Forrest Gumption — February 4, 2013 @ 1:24 pm
Koufax retired at 30, what a really lame example.
Comment by Forrest Gumption — February 4, 2013 @ 1:25 pm
WAR is a counting stat; there is no uncertainty. Similarly, we don’t give a range of how many HR’s a player hit. The components of WAR may be more abstract, but we’re still just counting things and adding them up. Now, if we want to use a stat–WAR, HR’s, whatever–to assess a players true talent level, THEN we have to incorporate uncertainty. If a player hits 35 HR’s in a season, we don’t just assume they will hit that many the next season. Regression, age, performance over their career, etc. will all be factors in our projection. So it is with WAR.
I think you’re undervaluing how good at baseball the ‘replacement level’ player that WAR is anchored to really is. A replacement level MLB player is very good indeed, so we can expect that any player like that can effectively play corner outfield or first base at a reasonable level.
We’re talking about a 2.3 difference in WAR driven entirely by defense (his speed and baserunning were actually far better in 2011).
WAR is the best summary measure we have but no other WAR-based statistic is as variable as the defense component. What other statistic can see a change from 19.4 to -6.4 (132% decrease) to 22.4 over the course of 3 years? Was his fielding really that much worse in 2011?
I would love someone to describe the average standard deviation to better prove my point but if the defensive metrics were a bit more stable then one could place greater faith in WAR as the be-all summary statistic.
You say that you can line up those stats and “pretty effectively” see how good the person is? Really? When comparing multiple players by those stats, do you rank by doubles first and by CS second? No? Then how do you figure out which of those stats are more important than others? Sure, you have the knowledge to be able to make sense of the different stats, but most people don’t. I agree with you that it’s better to look at the split-apart WAR to really understand where a player’s value comes from, but for a quick comparison of different players’ total value, WAR is the best tool available.
Except that the trains don’t leave from the same location. Zing!
Comment by The Corrector — February 4, 2013 @ 2:14 pm
But then you’re still left subjectively deciding whether or not a player’s 9 extra steals are better than the other player’s 4 extra homers. Using a linear weights-based measure allows you to have a common denominator on run value.
Yeah, for sure, I mean they lead to the most important number of all: team wins. But that’s just it, if you keep context in, you’re really measuring a team effort, and it’s difficult (or impossible) to know where to draw the line. How much credit do you award a player for a RBI? Do you penalize a guy like Giancarlo Stanton next year for hitting a bunch of homers with the bases empty? Do you reward a guy like Mark Trumbo for coming up to bat a lot with runners on? Guys with good numbers that don’t help the team win are still contributing to everything that they CAN control. And when it comes to roster construction, you want to be able to place guys in roles based on what the numbers say about their talent (or what they WILL do), not what happened when they chips fell as they did.
Context is everything when it comes to actually playing or watching the game, but it is noise when it comes to player evaluation through the lense of roster construction and skill comparison.
I think others have made this argument before, but why should defensive performance be less variable than offensive performance? Still just looking at Bourn, his offensive value went from -24.5 runs in 2008 to +2.0 runs in 2009, for a one-year swing of 26.5 runs. Compare that to the 28.8-run swing in fielding value from -6.4 runs in 2011 to +22.4 runs in 2012.
Comment by The Corrector — February 4, 2013 @ 2:38 pm
Or here’s another example, Ichiro:
12-year average hitting value: 11.9 runs
12-year average fielding value: 11.1 runs
12-year st. dev. of hitting value: 13.65 runs
12-year st. dev. of fielding value: 8.0 runs
Comment by The Corrector — February 4, 2013 @ 2:42 pm
Actually raw stats are not perfect. According to the raw stats practically every single player in baseball became a MUCH better hitter over the 1986-87 offseason and then lost all that newfound talent over the 1987-88 offseason. Raw stats need context or they lead you to draw weird conclusions.
Defense in WAR really punishes sluggers far too much.
Think of fantasy football the top QBs will put up just as many fantasy points as the top RBs, yet the first few rounds of a draft are mostly RBs. Why is that? Because getting a QB in latter rounds is much easier to do, than finding a productive RB.
Well, same happens here, you can go to the minors and find excellent defensive OFs, but finding one with Josh Hamiltons power is close to impossible. Hamilton shows up as the 16th best OF in 2012, behind Angel Pagan and Josh Reddick. Those rankings are clearly punishing Hamilton for def and speed, but in reality, he would never be taken in a single season draft behind those to players, because his power ability (which is rare) far offsets his defensive liability which isn’t nearly as rare.
Players such as M Cabrera, P Fielder and M Holliday all fall to the same unfair fate with WAR. You can go into the minors and find good defensive OFs or 3Bs. But, finding minor leagues who can hit 40 HRs is much more difficult to do.
There were other aspects to the Cabrera v. Trout thing regarding the reliance on WAR I think.
The biggest reason is that their WARs were not particularly close. If two guys are within 1 WAR of each other it’s absolutely necessary to look at many more factors. Trout led by 2.9 in fWAR and 3.8 in bWAR (so Caple’s criticism that there are multiple versions of WAR didn’t hold in this case either). These are generally not the kind of margins where you can make a credible case that WAR got it wrong – it’s a fairly blunt instrument, but it really was all you needed to separate Trout and Cabrera last year.
The other reason I think a lot of people relied on the WAR of the two players in that case is that the most criticized aspect of WAR, the defensive component, pretty definitively passed the smell test for these two players. All you need to do is (anti-saber cliche alert) *watch the games with your own eyes* and you’d know that Trout was a great defensive outfielder and Cabrera was a poor defensive third baseman.
It’s important to understand the shortcomings of WAR and not apply it where it’s not appropriate. I think it just happened to be very, very appropriate in the MVP debate, and that angered a lot of writers who had crafted a narrative saying something different.
There is no flaw and subjectivity involved? No, I think that is going too far. It wasn’t given from on high from a neutral context. Someone has to “decide” what is objective and since that person is simply making a judgment call (however sound) on the parameters of the stat, there is always the chance that is is horribly flawed. Today’s ironclad truth is often tomorrow’s outdated dogma that no longer carry the day. A little intellectual humility never hurts. :P
when we played baseball (i use the royal we here) my “real” contribution was mostly what I thought was comic relief from the bench and interesting patterns i would draw with my toe in the dirt of right field. there were players that had good stats but did not really make a difference in the team winning because i and several other guys like me were on the team because little league has to let everyone play. Context meant very little there, thankfully, or else the dixie youth baseball league allstar team would be limited only to the dudes on the stacked winning teams.
I agree with the point that Will is making. The subjective part of WAR comes in when deciding how to weight certain events in baseball, like Ks, BBs, HR, etc. The hard part of WAR is figuring out how to weight those things, and it takes subjectivity to decide what to settle on. We do not yet have an objective vision of how baseball works.
This is what I have never understood about the single season defense is meaningless crowd. Are you saying that the dWAR does not reflect reality or that it just doesn’t say much about how good a player actually is? If you think it does not reflect reality, then more seasons don’t help. If you think it doesn’t do a great job explaining how good a player actually is, well then that also applies to the offensive component in cases where the player has an unsustainable BABIP or HR/FB ratio. Does anyone think Torii Hunter is really as good offensively as he showed last year? No, but according the the WAR metric, he was a very good offensive player.
So the question is do you think Prado had a really good defensive season, but his true talent is not that great, or do you think the metric is broken and it said he had a good season when he didn’t? If you think the second, then why would you ever care what dWAR had to say about a player?
i will add less sarcastically that there are context-dependent stats you might use to answer a question like “who was the best baseball player?” instead of WAR. If we were arguing relief pitchers, for example, and I used WAR as my stat to back my belief that Carlos Marmol is a better pitcher than Jose Valverde, you could point to Marmol’s higher rate of ‘meltdowns’ than Valverde as a context-specific stat against my argument.
I think there could be room for context to become baked into WAR a bit more, as better context-specific stats emerge. But as it stands the context-neutral stats for across-the-board player evaluation are pretty dang sufficient.
I completely disagree with the idea that WAR is just one stat among many.
The whole point of WAR is to create a super stat — to combine all the information we have into a single number that assesses a player’s contribution to winning.
If someone says, “we should consider WAR, but also things like batting average”, they’re being nonsensical. What they’re really saying is that we should place greater weight on batting average when we compute WAR. If WAR is calculated correctly, it should screen out all other stats.
WAR isn’t a unified statistic at all — it’s more of a concept, and different people compute it differently. Baseball reference computes WAR differently than fangraphs. When traditional baseball writers vote for Cabrera for MVP, their brains are doing some sort of WAR calculation, combining RBIs and HRs and batting average and so on to tell them whether he was more “valuable” than Trout or some other player. If they tried to write down their mental process as an equation, they would probably come up with an inconsistent mess, but that’s beside the point.
So by all means, think critically about HOW we compute WAR, but if someone says we should “consider WAR, but also consider things like batting average and RBIs”, then they don’t understand what WAR is.
Calm down, you couldn’t be more off-base in your post.
“Are you saying that the dWAR does not reflect reality or that it just doesn’t say much about how good a player actually is?”
What. No. Not even close. I think UZR requires 2-3 season of data because that’s what Fangraphs tells me. When you use 1 year of data, you aren’t getting the correct sample scale, that leads to outliers and inaccuracies.
“If you think it does not reflect reality, then more seasons don’t help.”
No one thinks that. At all.
“If you think it doesn’t do a great job explaining how good a player actually is, well then that also applies to the offensive component in cases where the player has an unsustainable BABIP or HR/FB ratio. Does anyone think Torii Hunter is really as good offensively as he showed last year? No, but according the the WAR metric, he was a very good offensive player.”
Uh, you mean the same Torii Hunter who’s WAR was big because it was inflated by a -possibly- inaccurate UZR? Offense is much different than defense, hits and runs are hits and runs, the only job of the batter is to not make an out, its a lot more tangible to use offensive metrics than defense, which therein lies this issue: we want them both to be black and white and defense isn’t always that way because it relies on the team aspect of the game than the simple hitter-vs-pitcher. Defense relies on the pitcher, plus bad defense can be negated by run production. For example, say Ibanez does his fumble and fall over routine and turns an easy fly out to a runner on second, but hey look, its Felix on the mound and he gets the next 2 hitters out, end of inning. Ibanez then hits a 3 run HR the next inning, putting the team on top where they win 3-0. That bad defense effects the game how? We don’t quite know yet, but we do know how to gauge it in a vacuum.
“So the question is do you think Prado had a really good defensive season, but his true talent is not that great, or do you think the metric is broken and it said he had a good season when he didn’t?”
I think Prado had a good defensive season in a small sample. UZR requires more data. If he had more chances or worse pitchers, who knows how he’d fare? This is why we need more data.
“If you think the second, then why would you ever care what dWAR had to say about a player?”
Not broken, incomplete. Incomplete.
Comment by Forrest Gumption — February 4, 2013 @ 4:24 pm
Caple is doing nothing more than justifying East Coast bias. I bet he’d be on the WAR bandwagon if Cabrera played for the Angels and Trout played for the Tigers. Nothing against Cabrera, he had a great season and deserved consideration, but Trout turned the Angels season around and was an all around player.
I’m guessing batting average has just as much variation as defensive metrics.
But the question is what is better? Using UZR or assuming that all players are average defensively? I’m guessing it’s the former, which makes WAR much more useful than wOBA (not to mention positional, playing time and baserunning adjustments).
Obviously the best approach is to regress UZR ratings a little bit, this is essentially what Total Zone does.
Comment by vivaelpujols — February 4, 2013 @ 4:52 pm
dumbest comment of all time.
Comment by vivaelpujols — February 4, 2013 @ 4:53 pm
I completely disagree with the idea that WAR is just one stat among many.
The whole point of WAR is to create a super stat — to combine all the information we have into a single number that assesses a player’s contribution to winning.
I totally agree.
One can’t on one hand tout how great and all-encompassing WAR is, and how superior it is to other valuation systems (like Win Shares, etc), and essentially do everything but refer to WAR as the “discussion ender” … and then on the other hand claim it isn’t everything it’s been advertised and boasted to be.
As a sabermetric site, I think Fangraphs has the responsibility to continue to tout WAR as THE metric … and I’m generally NOT a fan of the “for dummies” things, but in terms of baseball production and contributions, WAR is pretty much IT. That it’s not perfect does not mean that it’s not IT. That there’s fluctuations in some of the metrics, does not mean that it’s not IT. As far as metrics go, it IS it.
If someone wants to use something other than WAR, then THEY need to demonstrate how their “something” is superior, and how it applies to all players without bias.
Generally, when you come up against someone that doesn’t want to use WAR as IT and wants something else to count “as much”, what they are wanting to do is view their opinion, hunch, gut, etc as being as valid as subjective data … and it’s not … not even close.
What I find is that people like to sprinkle around their “points” as they see fit and not have an “applies to all players” type of system. They want to award some “Jeter points” here or deduct some “Belle Points” there, not based on anything but their opinion based on their limited observations and perceptions.
From the OP …
But I just found it tiresome to keep reading all the references to it, as if WAR was the only stat that should be considered, and leading a league in batting average and home runs and RBIs — as Cabrera did in becoming the game’s first Triple Crown winner since 1967 — was somehow a mere accounting trick.
The author really doesn’t seem to have any idea how many “stats” are encompassed in WAR.
WAR is a stat like SAT and GDP are a “stat” or a “single number”.
I have no problem with voters giving someone a “bonus” for doing something really rare, like winning a Triple Crown or pitching a 10-inning shutout in game 7 of the world series … but that “bonus” has to be kept in check or at least in perspective with everything else.
Comment by CircleChange11 — February 4, 2013 @ 4:55 pm
Thank you. There’s no “arguable” way Caberera is the MVP if you’re looking at hitting + defense + baserunning + leadership/whatever. It’s only arguable if you cherry pick individual aspects of a players overall value.
Actually it’s sort of arguable if you think the MVP has to come from a playoff team, but that’s not a statistical argument.
Comment by vivaelpujols — February 4, 2013 @ 4:59 pm
Comment by vivaelpujols — February 4, 2013 @ 5:00 pm
Expect Jeff Sullivan to prove otherwise on Wednesday.
From the ESPN article: “… and leading a league in batting average and home runs and RBIs… was somehow a mere accounting trick.”
But that’s EXACTLY what the Triple Crown is. Somebody arbitrarily decided that those three stats were more important than any others without really thinking about what those stats actually mean. While home runs are fairly straightforward, batting average and RBI are stats that either paint an incomplete picture or are highly context-driven. The Triple Crown also completely ignores a player’s contributions with the glove, leaving out a very important part of the game that WAR at least tries to account for.
And the only reason people understand the formula for batting average is because At-Bats have been on the back of baseball cards for the past 100+ years instead of the more-accurate Plate Appearances. Statisticians are doing a major part of the calculation for the fan, which actually puts batting average in a similar calculation as WAR.
I agree that WAR has its flaws, but it still does a better job of painting a complete picture than the Triple Crown categories.
You just said Torii Hunter’s UZR is possibly inaccurate, but you said that nobody believes dWAR is not accurate. Please explain. My question, which you completely sidestepped, is whether you think UZR is not good because it just takes a larger sample to get a feel for how good a player is or whether you think it isn’t accurate. You seem to be saying that you believe it takes a larger sample to get a better idea of how good a player is.
Given that, I see little difference in how WAR is calculated for defense and offense and no reason to not use data from one year for single season WAR. There is no rule that you have to care about single season WAR, but the defensive accomplishments that go into UZR for a single season are the only things that happened that season. Judging a player’s single season using data from other seasons is stupid.
Thank you. I can’t believe I had to scroll down this far to find that comment!
Comment by Joe Morgan — February 4, 2013 @ 5:57 pm
These points are mostly correct. WAR is expected to be a better stat than others, we just have to recognize that it is a blunt stat and that an 8.0 WAR player isn’t necessarily better than a 7.7 WAR player. Also, there are reasons to question the methodology of WAR. I think F-WAR for pitchers is kind of stupid because it looks at batted ball stats when we don’t look at those stats for hitters. Nevertheless, as a descriptive stat, it does more to answer who is better than any particular metric (except b-WAR).
To say that this shows a gross misunderstanding of what is meant by “uncertainty” is a massive understatement. The uncertainties aren’t in the statistical part of the calculation of WAR; the number of HRs, BBs, fielding assists, etc., generated by a player are not subject to uncertainty, they’re simply what happened. But the multipliers used to turn those into WAR are subject to HUGE uncertainties of a systematic, rather than statistical, nature. This is the point that overzealous advocates of WAR miss.
Will is probably right about how the various BIP are bucketed for UZR. There are vague cases that could be placed in either bucket and the BIS recorder, or whomever, has to pick one. I wouldn’t call this a very subjective component, but it’s not like everyone agrees about line drives and fly balls.
I disagree with you, Brad, however. Ignoring the non-spherical shape of cannon ball in calculating it’s trajactory isn’t _subjective_, it’s just an abstraction (an “idealization” as the physicists, not ethicists, use that term.) The way in which things are weighted is by using average run values and then converting runs to wins. It’s true that you can generate different WAR models, e.g. by using RA9 or FIP as a base metric for pitchers. But _zero_ people would tell you that either model is absolutely superior, and the differences between those models is perfectly objective; we know what the inputs are and how to weight the run value of those inputs. Calculating pitcher WAR with an abstraction, like cannonball trajectories, does not make them subjective.
People can argue about how to use the different models, and such arguments might appeal to subjective, non-evidential considerations, but Tagno’s point is exactly that: how we use different models can be flawed, but the models are just equations.
Comment by philosofool — February 4, 2013 @ 6:57 pm
Disbelief can be just as irrational as belief. One’s extremely well-supported belief in Newtonian Physics yesterday can be undermined by new accounts of the hydrogen atom. In which case, one should change their belief, but it doesn’t somehow alter the fact that it was a rational belief before the new evidence came to light.
Comment by philosofool — February 4, 2013 @ 7:02 pm
you don’t understand my post. take a stats 101 class then come back.
… over a 16 year career. (Fangraphs no longer publishes split data on UZR)
There is a subjective component to the input data – determination of the zone the ball is in, the classification of the speed of the ball (I think there are 3). Then you also have the errR components of UZR – the impact is modeled, but the determination that an error occurred is completely subjective.
If you think about it, the definition of an “at bat” doesn’t really make any sense. What would it be like if you explained it to a kid?
No, I know they say he’s up to bat, but it sometimes doesn’t count as an “at-bat”. Those are called plate appearances. It doesn’t count as an at-bat if you get a walk. Or hit by pitch. In fact, it’s only if you get a hit or get an out. Well, ok, if you get an out but a runner scores, then that’s not an at bat either. Or if you bunt the ball and the runner advances a base. Why? Because that’s just the way it is. Don’t ask stupid questions.
Blah,blah,blah…the average fan doesn’t know or care what WAR is. WAR will never flash on the screen when a player is up at bat. WAR is hot right now only to be replaced and forgotten by the next generation of people who waste their intelligence on baseball stats when they could be doing more productive things with their time.
WAR is problematic because it fuses offensive and defensive numbers into a single output, which is kind of like making an apple and an orange into an orpple. The volatility and relative subjectivity of UZR and similar defensive metrics make them seem like a poison pill in the WAR formula, sometimes, and it’s hard to get the right amount of guarded imprecisin attached to such a fundamentally useful statistic. Perhaps they should call it WAR?, just to remind us of how much mystery gets baked into it.
1. I have always preferred to average bWAR and fWAR for pitchers, for reasons I’ve stated too many times to repeat. In short, I agree with Tango’s quote of something like “better to be half right than all wrong” (re: batted ball influence by pitchers).
2. I’m also not saying that it should just be the “WAR Award”, as if it were like the batting title where a player could win by 1/10th or 1/100th of a point.
3. I guess my main point was that sabermaticians should be standing UP for WAR in situations like this, not backing away to appease.
If they/we feel, and for solid evidence reasons, that WAR IS the all-encompassing stat, and feel as though they/we can demonstrate it … then you stand your ground on that.
Now, as I said before, if a person wants to give some “bonus points” to a player for being on a playoff team, for demonstrating leadership, for having the IT factor, for playing with style, or whatever … that’s fine (I really believe that) … but if a player has 2 more (or so) WAR than another, then the comparison as to who had the better season really isn’t that close … and you’d need to do some statistical gymnastics to make it so.
The key with WAR is always making sure the data inputs are valid. Most times this means UZR. If they/we are confident that the inputs are correct, the replacement levels are correct, etc … then we should defend WAR as the stat the encompasses all of a players measurable contributions on the field to the best of our ability.
If that means that some folks don’t like it because it results in them not being able to wax their opinion as it carries great weight, then so be it. Some folks like their opinion more than they like being accurate.
Comment by CircleChange11 — February 4, 2013 @ 10:02 pm
people who waste their intelligence on baseball stats when they could be doing more productive things with their time.
I think the internet is ripe with examples of many people doing something a lot less interesting and useful with their minds and time than discussing baseball stats.
Some of us view these discussions as taking a break from the far more strenuous ways we use our intelligence and mind during the work day.
Sometimes it’s just fun to discuss the finer details of something that doesn’t really matter too much in real life, y’know.
WAR will likely NOT ever be accepted to the level of detail we like to discuss it, but I can easily envision people being able to see it on a TV screen and recognize that 8 WAR is a great deal better than 3 WAR. That’s essentially the purpose … to give value to things (like defense and baserunning) that are not necessarily obvious nor whose value is readily apparent (like say, 40 HR).
I think fans that appreciate the finer things of the game, such as defense and smart baserunning, will appreciate WAR even more as it quantifies items that in the past were just chalked up to general statements such as “he’s a really good defender”. Now, we can compare players and say “He’s a +12 defender while the other guy is a -2” and understand the on field impact of their fielding without having to guess it.
Comment by CircleChange11 — February 4, 2013 @ 10:10 pm
The issue with WAR is it has no relation to reality.
Let me translate for you …
“The issue with WAR is that I don’t understand it.”
Comment by CircleChange11 — February 4, 2013 @ 10:16 pm
To the point about Bourn, consider he’s +54.1 for his career on UZR. Usually you can gauge glove value closer to the three season mark. In regards to the 2011 discrepancy, I assume his WAR was reduced due to the inflated BABiP mark that was clearly not sustainable.
At the end of the day single season value, you have to count what a player actually accomplished. Are Bourn or Prado going to duplicate those UZR’s? Not likely, but they made those plays that many of their peers did not, and they deserve the credit. Each of those plays had value as well. You can say it was a fluke to identify future regression candidates, but you cannot take away from the accomplished value.
This is the culture of Fangraphs ganging up in a superior way against the concept of Bleacher Report. Basically, its 20ies and 30ies age people ganging up on like-minded baseball fan teenagers.
But okay, you were led to your sneering platitudes, and dull plus marks, so I just blame it on Dave.
Comment by rubesandbabes — February 5, 2013 @ 2:16 am
Tom Tango (if it is indeed Tom Tango the baseball stat guy),
in defending the WAR stat,
simply calls people stupid who are not fully onboard with the WAR stat.
Just right out.
Sucks. He speaks from the way too common baseball stat insider fan voice, presuming more understanding of the game and also implied more understanding of baseball defense, much than those fans that ooh at a top play, and sigh when the second baseman drops the relay.
Dear Tom Tango,
FIP is a beautiful stat. I think I could well and fun explain it to a friend while watching a baseball game together without being immediately shut down and told to go get popcorn. A lot of the other new stats don’t pass this litmus. And there are too many other new stats.
I think you Should trademark the FIP name, and not let the Stat Separatists or the new stat proliferaters tack on any of those silly lower case letters in front of the stat name. The name FIP is good enough. Stat Separatism is really not a good idea and is a discouragement to actual peer criticism, especially for something that is really not even a science, but more a way to talk about baseball, as you well understand, I’m sure.
Also, too often baseball fans are using baseball stats to dump on other baseball fans. Way too much. Okay, this is fine, and has gone on down through time, but the problem is that now they are doing it in the name of Bill James, and they are not doing it well. I don’t agree with this practice.
Thank for reading. Have a nice day. Oh, one more. If relievers are not all that valuable, not good enough to earn their WAR like starters, why does known stat-minded A’s GM Billy Beane make sure to get a reliever in most every trade he makes, going back like 30 trades?
ps. I still don’t get how Brandon Moss was able to earn positive WAR in 2012, a year in which he was an actual replacement player. Was there an initiation ceremony where Moss secretly passed though?
Comment by rubesandbabes — February 5, 2013 @ 2:50 am
Suggesting Dave Cameron’s eulogy for the Win stat may be somewhat hopeful.
Wins, much more than Saves are big in the player roles ideas for all the MLB teams.
And could everyone please stop pretending a big dramatic RBI is something to be looked askance at? Yes, please remind yourselves of this next time you jump up off the couch for a big play, which surely will include a needed RBI knock.
Comment by rubesandbabes — February 5, 2013 @ 3:13 am
Yes, I agree with this.
But the criticism for me goes beyond breaking out the defensive stats:
my criticism is that WAR is so complex that it does not really encourage the study of the individual components, which is the best way to understand baseball stats, each on their own. In contrast, the FIP stat, which has very few components, to me does encourage attention/study on the K stat, for example, and that is good in using stats to understand what makes a better pitcher. WAR does not give this back to the fan.
Comment by rubesandbabes — February 5, 2013 @ 3:24 am
Miguel Cabrera was 3rd in the AL in WAR last year. Matt Holliday has had 6 consecutive seasons of at least 5 WAR. Power is plenty well represented in WAR.
Comment by suicide squeeze — February 5, 2013 @ 6:28 am
Fangraphs DOES do that, separating it into each of its components. Everyone is free to dismiss any of the components if they wish.
Comment by Tangotiger — February 5, 2013 @ 9:16 am
Don’t attribute to me something that I did not say.
Comment by Tangotiger — February 5, 2013 @ 9:27 am
A lot of the other new stats don’t pass this litmus.
I am so glad that in industry and science the “litmus test” is NOT whether someone’s buddy can understand it or is willing to listen to the explanation.
ALL that matters is if WAR works in regards to its designed purpose?
If it accurately compiles and represents a players measurable contributions and relates it to a fair standard, then it works and whether anyone’s friend understands it or is willing to listen is beside the point.
Sometimes, it’s the “smart people’s role” to do all of the “smart work” and present it in a simplified form for those that are not as interested in the details.
If I limited myself to “baseball knowledge” that fans can easily understand and explain, I’d set myself backwards 30 years in terms of being accurate. Seriously, think about some of the things that the common fan readily believes and baseball performance, strategy, etc.
Comment by CircleChange11 — February 5, 2013 @ 9:54 am
Here’s my follow-up question:
How is WAR determined? Sure it means Wins Above Replacement, like how many more Wins player X provides than an average player. But how in the world do you quantify that? How can you decide what an average player would accomplish in any run-scoring or run-saving opportunity? I just don’t see how you can even come up with the parameters to define WAR without doing so arbitrarily.
Considering that Cabrera actually WON the MVP in a landslide, why doesn’t Caple write an article talking about the over-reliance on BA, HR, and RBI?
The only place where WAR might be overrated or overused is among people who are already sabermetrically inclined. 99% of ESPN’s audience or casual MLB fans don’t fall into that category. If he really wants to make a difference, like I said, write an article downplaying RBI and BA. Not that we really need another one of those either.
Yeah “raw stats are perfect” betrays a hopelessly basic, very rudimentary understanding of things. “Hits are hits”, but not all hits are created equal. If they were equal, then you would always take player A’s 32 hits over player B’s 26 hits. But A’s 32 may have come in 275 AB’s whereas B’s came in just 160. Or A’s may have included 30 singles, 1 2B, and 1 HR, whereas B’s included 14 singles, 7 doubles, and 5 HR. Hits are not “just hits”, each the same as the next.
Likewise HR – 7 HR are not always necessarily better than 5. I would take 5 HR which created 10 runs in 90 AB’s over 7 HR which created 8 runs in 325 AB’s, for instance.
And batting average. I would take a .275 BA over 525 AB’s to go along with a .380 OBP and .515 SLG over a .320 hitter over 175 AB’s with a .345 OBP and .395 SLG.
And so on, and so on. Raw stats never, ever tell a complete story. Nor does WAR, as Dan says, but it can summarize a lot of disparate things and at least be a starting point of a discussion of value, or a tool in the “value assessment” toolbox.
“What other statistic can see a change from 19.4 to -6.4 (132% decrease) to 22.4 over the course of 3 years?”
Comment by acerimusdux — February 5, 2013 @ 1:33 pm
The baseline for the defensive component is average. So anything below average is negative. The same is true of offense (batting runs above and below average are used for example).
The baseline is then adjusted to replacement level being roughly 2 wins below average.
Comment by acerimusdux — February 5, 2013 @ 1:37 pm
You are thinking too black and white. No stat is perfectly accurate. That doesn’t mean we should throw them all out.
The issue here is that there is a strong correlation between the stat and what actually happened, so that the more data you have the more reliable the measure will be. But you need three times as many games of data for the best defensive stats to be as reliable a measure as the best offensive stats.
And since most people are more used to offensive stats, they make this adjustment themselves more naturally with offensive stats. Most people, for example, understand that if a player gets hot offensively for a month that doesn’t mean he has become a much better player.
The biggest flaw with defensive stats is that people too often want them to work exactly like offensive stats, instead of adjusting to how they do work.
Comment by acerimusdux — February 5, 2013 @ 1:54 pm
It’s not precise, but it’s very accurate.
Comment by acerimusdux — February 5, 2013 @ 2:16 pm
Yes, alot of the criticism of WAR is from people who don’t like that it IS accurate. This is especially true of the criticism of the defensive component.
Some people simply don’t like that Michael Bourn has been as valuable a player as Prince Fielder. It doesn’t fit their preconceptions. But that’s the kind of conclusion we can be pretty certain about, as there are no small sample size flukes there.
Comment by acerimusdux — February 5, 2013 @ 2:38 pm
Well that’s the real question: is actual defensive performance more variable than offense or is the statistic we use to measure it imperfect and prone to more variation than reality?
and you did well to choose a team where that works…
if you actually calculate out all 30 teams, only 2 teams were within 1 run, 5 more teams were within two runs of their actual wins, 10 more were within 5 wins, 7 between 5 and 10 wins and 6 off by more than 10 with the worst being the orioles which war underpredicted their wins by 15.
if you prefer to look at in runs (as war is off the number of runs gained/saved), the angels led the league in oWAR by 3.8 wins (or 38 runs) but they were actually fourth in runs scored, trailing the rangers by 41 runs for the lead.
likewise, the tigers had the most pitching WAR with a lead of 0.9 wins (or 9 runs) but they were actually 11th in runs allowed, trailing the rays (the leaders) by 93 runs.
Using either version, there will be those who say fWAR is not actually representative of what it claims to show, because well, as an aggregate, it doesn’t. And if it’s not meant to be aggregated, then there is no way to verify that what the formula tells you converts to gained victories actually does, and if that’s the case, it’s really just a catchy name more than anything else.
Comment by miffleball — February 5, 2013 @ 4:50 pm
The objection I’ve always had for UZR is that it assesses defense the same way for every position – range. While this might evaluate a middle infielder or outfielder well, it ignores a first baseman’s ability to field a bad throw, it ignores the arm of an outfielder, it doesn’t address catcher at all where the guy basically shouldn’t have to move if the pitcher doesn’t screw up, etc.
I’m not sure that there’s a way to do it, but while positional adjustments address difficulty of positions, they do nothing for disparate skills needed for each. After all, based on fielding value, mike piazza should have gone from a not so good catcher (with the highest positional adjustment) to a phenomenal first baseman (with the lowest), but it turned out he wasn’t actually remotely able to play the position.
Comment by miffleball — February 5, 2013 @ 5:08 pm
@acerimusdux well said.
@TKDC “My question, which you completely sidestepped, is whether you think UZR is not good because it just takes a larger sample to get a feel for how good a player is or whether you think it isn’t accurate. You seem to be saying that you believe it takes a larger sample to get a better idea of how good a player is.”
I didn’t sidestep anything, whats with the aggressiveness? Do I think UZR is a good stat because it takes a sample scale of more than 1 season to get a feel for how good a player is? Yes. Do I think using a single-season UZR in a WAR score is accurate? No. Lets look at Hunter’s last 3 years:
See that spike last year? I’d say his UZR for the last 3 years is 8.5/3 = 2.8 the last 3 years, and would adjust the WAR scores to reflect that. I don’t know how to do that, but it looks like Hunter has pretty much been a 3.5-4.5 WAR guy the whole time. My point is, he wasn’t a 5.3 last year, like Prado wasn’t a 5.9 one. If a player is prone to these outlier UZR seasons, hey, they dont effect everyone, but I think its something Fangraphs needs to address.
“Judging a player’s single season using data from other seasons is stupid.”
Then you must hate UZR and most defensive stats then. You do know that defensive opportunities aren’t as guaranteed as plate appearances, right? Some players can go games without touching the ball. No defensive player gets to make X amount of plays a game, yknow.
Comment by Forrest Gumption — February 5, 2013 @ 7:23 pm
Yep, hits are hits.
Except when someone reaches on an error. Then whether it was a hit or not depends on what the scorer decides to do that day.