FanGraphs Baseball


RSS feed for comments on this post.

  1. Out of curiosity, would you expect to get a better correlation by using offensive WAR and pitcher WAR based on RA9 instead of FIP? All defensive performance would be bundled into the pitcher stat so you wouldn’t have to worry about the imperfections of UZR and DRS. Obviously this wouldn’t be super useful for evaluating individual performance, but it would be interesting to see on the the team level.

    Comment by Eric — April 29, 2013 @ 4:59 pm

  2. Taking all the joy out of baseball…blergh…

    Comment by adrian reynolds — April 29, 2013 @ 4:59 pm

  3. Great Post

    Comment by Thomas Grantham — April 29, 2013 @ 5:03 pm

  4. Yes, if you use RA9 instead of FIP and some fielding metric to account for defense, you’ll come up with a higher correlation. But you’ll also then be including context into the metric, because runs allowed include sequencing. If you did that, you’d have a metric that included sequencing for pitching but not for hitting, unless you also swapped out batting runs for RE24. In other words, you wouldn’t be measuring pitchers and hitters the same way anymore.

    What inputs you use all depend on what questions you’re asking. There are times when you want to include sequencing and times you don’t.

    Comment by Dave Cameron — April 29, 2013 @ 5:08 pm

  5. Even over just 20 games, by using data from all 30 major league teams, aren’t you in fact looking at a fairly large sample here, such that you would expect the strong correlation that you found?

    Heyman’s point related to comparing 2 players and one month’s data. Yours involved 500+ players and one’s month’s data.

    Or is that not the correct way to look at this?

    Comment by taprat — April 29, 2013 @ 5:11 pm

  6. Issue: Heyman’s tweet is about individual WAR, Cameron’s defense about team WAR. Defensive metrics may well improve WAR when looked at in the aggregate and still be potentially inaccurate when looked at for individual players.

    Comment by O's Fan — April 29, 2013 @ 5:15 pm

  7. Interesting. Thanks!

    Comment by Eric — April 29, 2013 @ 5:18 pm

  8. Interesting article but I’m not sure this addresses the initial question of whether WAR is accurate for marte and Harper. By using team WAR aren’t you increasing the sample size by like 25 players?

    Comment by Chris H — April 29, 2013 @ 5:20 pm

  9. This was my first thought. I’m trying to argue myself over why we’re wrong but it seems to put a wrench in the premise of this article.

    Comment by La Flama Blanca — April 29, 2013 @ 5:30 pm

  10. The point is that using the inputs that go into WAR this early in the season are still better than not using them. Using UZR is better than ignoring it.

    Comment by Justin — April 29, 2013 @ 5:31 pm

  11. Agreed. I loved baseball before I read this post. Now I hate it.

    Comment by Jake — April 29, 2013 @ 5:37 pm

  12. The issue with Heyman’s tweet is that he’s coming from a position of assuming that Harper has been better than Marte, and then criticizing WAR because it disagrees. Sure, UZR and DRS are pretty close to useless on an individual level after one month. But their values are certainly possible. That is, it’s very conceivable that Marte’s defense has been good enough to make up the ground between him and Harper offensively.

    The problem is that we have no idea if that’s true or not. But it could be true, and for Heyman to just assume that it isn’t is incorrect just as assuming that it is true is incorrect.

    Here’s another way to think about it. Harper has about 14 runs from offense+baserunning and Marte has about 7. Say we don’t know the rest of their value, but we know it’s some random number between -5 and 5 runs.

    Now, the odds are not in Marte’s favor. If we have to choose one, we’ll say that Harper is better. But we shouldn’t have to choose one. Because for all we know, Marte’s defense – that random number of runs – makes up the gap between the two offensively. It’s unlikely, but possible. We can’t come from a position of assuming that they’re equal defensively just because we have no way of knowing their defensive value. We can saw that Harper is likely better, but we still need to admit to the fact that he might not be.

    Does that make sense or did I just spout a lot of nonsense?

    Comment by Matt Hunter — April 29, 2013 @ 5:38 pm

  13. I’m not a hater of the HR stat, but explain why John Buck has more than Ryan Braun.

    Comment by That Guy — April 29, 2013 @ 5:40 pm

  14. Aside from that, it’s easy to believe that Harper’s offensive production is real, whereas we still have to suss out whether or not Marte’s defense is real.

    Put it this way, if Andrelton Simmons was already +8 runs defensively, we’d all pretty go along with the idea that he can be a 5WAR player on defense alone. It matches what we already know and what we’d like to believe, ie, we’ll see a defense only 5WAR players the way that we see 5WAR offense players.

    Comment by That Guy — April 29, 2013 @ 5:44 pm

  15. I don’t think this is a good comparison. HR just measures number of home runs, so it’s been completely accurate thus far. WAR measures a player’s contribution to the team, and because of the unreliability of defensive metrics, it probably hasn’t been accurate thus far.

    Comment by Matt Hunter — April 29, 2013 @ 5:45 pm

  16. I really don’t think your approach here makes any sense, at least as regarding the defensive WAR totals of Harper and Marte. WAR is based on what happens on the field, and what happens on the field affects winning and losing. If a bunch of balls are dropping around Harper in the outfield, his defensive WAR rating will be low, and his team will suffer. What this metric can’t tell us – especially in a tiny sample – is whether Harper “should” have caught any of those balls, or whether Marte would have, or whether Harper might have equalled Marte if only they’d exchanged positions in their respective outfields.

    You might be able to guess an American’s income and a Norwegian’s income by evaluating all economic exchanges of those countries, and then checking against GNP to make sure you were accurate, with the result being that you’ll think you just proved you were.

    Comment by Jon L. — April 29, 2013 @ 5:45 pm

  17. The real gem here (as Dave alludes to) is that WAR is pointing out the terrific (albeit SSS) start that Marte has had. In the absence of advanced metrics, the notion of Marte=Harper is crazy. In fact, I was so shocked that I went and looked up Marte’s numbers. Surprisingly, they are quite comparable in aggregate. This isn’t a weakness of WAR — this is what it does so well.

    Comment by Jeff — April 29, 2013 @ 5:53 pm

  18. Harper has 1.5 wins. Marte has 1.2 wins. That’s 2 wins of difference over the course of a season, or the difference between Ryan Braun and Alex Gordon last season. Is that really implausible?

    Comment by Andy Olds — April 29, 2013 @ 6:19 pm

  19. I had set aside my evening to devote to Harvey v. Fernandez myself. Now, I think I’d rather go jump off a bridge after reading this post.

    Comment by l1ay — April 29, 2013 @ 6:23 pm

  20. Yeah. I just burned my baseball gear after reading this.

    Comment by akalhar — April 29, 2013 @ 6:40 pm

  21. I actually think that the lesson is that Heyman, Joe Morgan, Hawk, and the like are closed-minded idiots and shouldn’t be paid attention to. Hopefully, they lose their jobs and competent people will get hired. Alas, one can wish…

    Comment by Incompetent peeps — April 29, 2013 @ 6:40 pm

  22. That made a lot of sense. The point of advanced statistics (and old-school ones) is to tell us things about the game. If these things were intuitively obvious, there wouldn’t be a lot of value to the statistics. Heyman appears to be disregarding WAR just because it tells him something that isn’t intuitively obvious – Starling Marte is (probably) having a pretty good season so far.

    Comment by The Stranger — April 29, 2013 @ 6:43 pm

  23. Too much frequentism. Basic Bayesian logic tells us that if defensive systems are measuring anything worthwhile at all, having them will be better than not having them, over any sample.

    Comment by Tim — April 29, 2013 @ 6:49 pm

  24. Now use RE24 instead of wRAA and a game of “Cow Pie Bingo” instead of UZR. R-squared=1.001.

    Comment by JuanPierreDoesSteroids — April 29, 2013 @ 6:56 pm

  25. Pretty much OT but Westbrook got a 2-inning 4-run performance rained out, so we really shouldn’t take that ERA number at face value.

    Comment by matt w — April 29, 2013 @ 7:15 pm

  26. Absolutely. It’s just much harder to make margin of error adjustments to the gut than to the spreadsheet.

    Comment by Tom Rigid — April 29, 2013 @ 7:19 pm

  27. It isn’t that surprising that Marte’s defense grades out well either, at least not to a Pirate fan. Marte’s been getting rave reviews about his defense in the minors for a while. ( didn’t see fit to preserve the video of him throwing out Paul Goldschmidt trying to take an extra base, but here’s a gif:

    Comment by matt w — April 29, 2013 @ 7:27 pm

  28. The real question is: how can we better evaluate defensive performance? Our rudimentary evaluations do account for some defensive value, as this article attests, but there is certainly much room for improvement.

    It would be interesting to see if any major league teams adopt the technology created by SportVU, which is currently being used in the NBA to track player and ball movement for evaluative purposes. I think this information could certainly improve defensive performance evaluation (and as a side note I think it would be extremely useful in evaluating American football players).

    My guess is that Andrew Friedman and others are already exploring this, and my hope is that defensive metrics will take a giant step forward as a result.

    Great article by the way. Improvement is all about recognizing and correcting deficiencies.

    Comment by Ian — April 29, 2013 @ 7:49 pm

  29. Is the correlation statistically significant?

    Comment by Corey — April 29, 2013 @ 8:14 pm

  30. On an aggregate and team level, we’re probably to the point where we have large enough sample sizes to make defensive metrics somewhat meaningful and start seeing positive correlation.This does *not* mean that there is any statistical relevance to an individual’s stats.

    Comment by Ronaldo — April 29, 2013 @ 8:20 pm

  31. That’s all well and good, but couldn’t you fit all that in a tweet?

    Comment by TKDC — April 29, 2013 @ 8:53 pm

  32. Hell, I burned my baseball gear in anticipation of reading this.

    Comment by I Agree Guy — April 29, 2013 @ 9:34 pm

  33. Hit f/x and Field f/x

    Comment by Jason — April 29, 2013 @ 10:50 pm

  34. He understood the point.

    Comment by Pitnick — April 29, 2013 @ 10:51 pm

  35. Sarchasm. I think you fell in.

    Comment by Pitnick — April 29, 2013 @ 10:53 pm

  36. I dunno, I don’t need to be a statistics guy to appreciate that this type of analysis shows that there are certain things which can make a player worthy. WAR isn’t infallible, but it’s one useful tool amongst many.

    Comment by Kevin — April 29, 2013 @ 11:17 pm

  37. I decided not to buy baseball gear and then burned someone else’s in anticipation of this article

    Comment by The Humber Games — April 29, 2013 @ 11:57 pm

  38. It was for general goof factor, of course, but we’re also having to look at this from a sample range perspective as well. Why can’t John Buck hit as many HR as Ryan Braun given arbitrary ennpoints? Why can’t Starling Marte have as much WAR as Bryce Harper given arbitary endpoints?

    Comment by That Guy — April 29, 2013 @ 11:57 pm

  39. The A’s have something in house that told them that Cabrera indeed was more valuable than Trout last year and I understood that to mean that they placed a very different value on defense than UZR or any other metric that we know did. It would be nice to see something like SportVU freely available for baseball.

    Comment by That Guy — April 30, 2013 @ 12:03 am

  40. Nice play by Marte. He definitely has played well in the outfield according to the naked eye.

    What bothers me about that gif is, even while recognizing it’s slo-mo, how long it takes Neil Walker to place the glove down for the tage after he first catches it.

    Why does he catch it so far out there, and then undertake a sloppy and awkward twist/pivot/jump just to get in front of the bag? I would have positioned myself on the RF side of the bag, with my body pointing towards 3B (as if preparing to make a quick relay throw to 1B), ready to reach out and catch the ball and swipe it back toward the bag in an instant.

    I know MLB infielders have to be wary of giant, fast runners sliding into them, but still. I see this a lot, and as a one-time fundamentally-sound kid 2B (read: couldn’t hit), I don’t understand why they’re not doing it better.

    Comment by Jay29 — April 30, 2013 @ 12:22 am

  41. “WAR is not imperfect, nor is it precise.”

    For the kajillionth time, you guys need an editor. Cripes.

    Comment by Justin Bailey — April 30, 2013 @ 12:40 am

  42. I burned down a baseball field at the mere rumor of this article.

    Comment by DCN — April 30, 2013 @ 1:12 am

  43. And if we swap the pitches that each saw, maybe Marte would do better and Harper worse. That’s the batting equivalent of the argument you’re making.

    Both hitting and defensive metrics are measured in a manner to compare them to an average player.

    In any event, you should probably read up on UZR and DRS in the FG library to better understand it. No one is going to go through tedious process of switching each individual play for two players to see if A would do better than B or vice versa. It’s also way more subjective than deciding what the average player would do.

    Comment by Burrito — April 30, 2013 @ 2:23 am

  44. This is a good analysis, but for the love of god could you put some error bars on WAR please?

    Comment by Stathead — April 30, 2013 @ 2:26 am

  45. Why are people calling this “good analysis”. If the question was: “How quickly does team WAR stabilize and correlate to team level performance?” then the analysis is OK; but what does this have to do with individual WAR after 1 month?

    It shouldn’t be all that hard to do…. just go back to last year’s (or a few years of) data and correlate individual player April WAR totals to full year WAR totals (can do this on a rate basis or extrapolate 1 month out to a full year?). Not sure why in the world you would look at an aggregate team level data – you end up with a much larger sample size than 1 year of player WAR; of course that is going to show a decent correlation – it should be much more solid than 1 year of an individual’s #’s.

    Comment by Hank — April 30, 2013 @ 3:20 am

  46. Your suggestion that I read up on the metrics may be well-taken, but you’re missing my point. Defensive plays don’t always involve one defender who clearly has a chance to make a play. For batting statistics, in contrast, we know who was batting. We don’t just take pitches thrown somewhere in the vicinity of Harper, LaRoche, or Werth, and then attribute 1/3 of that set of results to Harper. If we did, we’d have very little idea of how good a hitter Harper was. Nonetheless, if we did it for the whole Nationals team, we’d end up with a perfectly accurate image of the team’s offense, regardless of how inaccurate we were for individuals. The reason for this, as in the article, is that we’re ulimately comparing a summary measure of on-field results to the on-field results.

    Comment by Jon L. — April 30, 2013 @ 6:00 am

  47. Using a pitcher to demonstrate the fallacy of low sample size among hitters is deceptive. While Westbrook’s % of season complete is the same, he has appeared in 4 games and faced relatively weak hitting teams…. THe hitters have 20-30 games against a mix of different pitchers so a much bigger sample size even if the % of the year is the same.

    Also, your article basically says: “WAR says Marte and Harper are about the same players, but we all believe that Harper has been a lot better…. so small sample is probably the reason, but overall WAR is very reflective of performance even in small samples.” Seems like you don’t want WAR to say Marte = Harper, but you do believe WAR to be a good reflection even in small samples…. seems contradictory and not logical.

    Comment by Rob Heaton — April 30, 2013 @ 7:44 am

  48. The error bars are roughly equal to the league-wide range of defensive values.

    Comment by Jason H — April 30, 2013 @ 8:35 am

  49. “i am not a hater of WAR stat, but if someone can explain to me how starling marte & bryce harper are both 1.7, please do”

    Had he looked at the player pages he would have seen the exact reason since both BB-ref and Fangraphs provide the components that add up to WAR. So, he is lazy or a moron or he in fact is a hater just looking to bad-mouth the stat for no good reason.

    Any of the three [let alone more than one of them] is plenty for me to ignore anything he writes…

    Comment by Eric R — April 30, 2013 @ 8:59 am

  50. The analysis is good, but there are some holes here. He uses team measurement of WAR to show that WAR needs defense to be as accurate as possible. while this might be true, it doesn’t explain that how Harper might have just had an unusually bad month of defense, while Marte cold have had an unusually good month of offense. This small sample size with these two players is the same case as Matt Harvey having his ridiculous month pitching. He could be the next RA dickey, or the next Ubaldo Jimenez. Normal stats like ERA can be inaccurate in measuring a players value in small samples, and the same can be said for UZR. This isnt true when measuring team-WAR correlation, and that is why Cameron sees UZR as so accurate.

    Comment by sbnovafan — April 30, 2013 @ 9:05 am

  51. Doesn’t Harper have a lot of mishaps in the field? And I know his base running aggressiveness has gotten the better of him in the past. Going back to the Trout/Miggy debate, it seems odd that the old school writers start to not care about doing the little things when there is an honest attempt to measure them in a stat that people actually care about. It makes you wonder what their beef actually is (it actually doesn’t make me wonder – I feel confident I know exactly what it is and so do you).

    Comment by TKDC — April 30, 2013 @ 9:56 am

  52. Looks to me like he took his time because Goldschmidt was still ~15 feet away when he caught the ball. :)

    Not taking anything away from Marte’s hose of an arm, but there was a poor baserunning decision involved there as well.

    Comment by Trent Phloog — April 30, 2013 @ 10:46 am

  53. “we’d all pretty go along with the idea that he can be a 5WAR player on defense alone”

    No, we all wouldn’t.

    However Braves fans DEFINITELY would!

    Comment by Jason B — April 30, 2013 @ 10:56 am

  54. “Hopefully, they lose their jobs”

    C’mon now. Fire people because they disagree, or don’t buy into this statistic or that one? Man we are sooooooo quick to fire people for their shortcomings as long as its not us! (We have plenty of excuses and explanations for our own!)

    Comment by Jason B — April 30, 2013 @ 10:58 am

  55. Not cool, bro. You can burn your friends, and you can burn your baseball gear, but you can’t burn your friends’ baseball gear.

    Comment by Bigmouth — April 30, 2013 @ 11:17 am

  56. Player A is having a bad month in the field according to everybody, but once you put that in numbers everybody starts complaining about sample sizes and unreliable defensive metrics.

    Comment by Mark — April 30, 2013 @ 11:36 am

  57. The evidence for the baserunning edge for Marte is just not there.

    Marte has grounded into more double plays 3-2.

    (Yes, I know, but still this is one of the speed profiles of the game. Lead the league in GIDP – that means you are not one of the fastest guys. Why should Marte get a pass on this? Plus, Marte’s a leadoff hitter in the National League. Edge to Harper.)

    Marte and Harper have been caught twice each on the basepaths.

    Doubles and triples: Harper is right there with fewer at-bats. Harper has a triple, too.

    So, while Harper takes a whole bunch more trots around the bases, Marte’s 6 stolen bases and and one extra base on extra base hits is being projected to 3 runs. Add in the GIDP to Marte’s baserunning, and it becomes only the 6 stolen bases.

    So, 6 stolen bases = 3 runs better in baserunning? Not in this case.

    Comment by rubesandbabes — April 30, 2013 @ 11:40 am

  58. That correlation would be irrelevant because WAR is a counting stat. Performance will vary greatly from month to month so comparing season WAR totals to April WAR totals and noticing a poor correlation would be the same thing as playing the extrapolation game that says a player has 8 home runs through 20 games, therefore I predict he will finish the season with 64. One month of WAR was never intended to be that predictive

    Comment by agam22 — April 30, 2013 @ 11:55 am

  59. So Adam Dunn should keep getting $15 million a year as long as he likes, because firing him would be mean? These guys get paid a lot of money to be idiots and there are much more qualified people out there (as in, just about anyone in a few of these cases).

    Comment by TKDC — April 30, 2013 @ 12:10 pm

  60. I don’t think you understand Bsr (known otherwise as ubr).

    Try Here:

    Comment by Terence — April 30, 2013 @ 12:11 pm

  61. There job is not to provide useful information, its to provide additional entertainment while people are watching the game, fill in the natural lulls of the game and get excited in order to get you to focus your attention when there is meaningful action happening.

    If it improved ratings they would play a sound track of water buffaloes mating during the game.

    Comment by Kazinski — April 30, 2013 @ 12:42 pm

  62. My one issue with Dave’s logic here is the Jake Westbrook reference. It’s a red herring. After answering Heyman’s question by dissecting the intricacies of Marte’s running and fielding success, as well as the slight fallibility of WAR; he goes on to chalk something up to small sample size?

    The question was never; why are these two players playing as well as each other. It was, why does WAR SAY they are when they SEEM not to be?

    SSS has nothing to do with that.

    Comment by Daaaan — April 30, 2013 @ 2:33 pm

  63. There’s a service called TrackMan which is used by 17 MLB teams which does something to this effect

    Comment by Mcneildon — April 30, 2013 @ 9:05 pm

  64. It’s awesome that you’ve never screwed up at your job! What’s that like? (Good thing it didn’t happen in public so total strangers could call you an idiot on a message board!)

    And Adam Dunn should keep getting $15 million a year as long as some owner will pay him $15 million a year, methinks. Heck, $45 million if he can. $9 million if he has to settle.

    Comment by Jason B — May 1, 2013 @ 10:27 am

  65. “It was, why does WAR SAY they are when they SEEM not to be?”

    No, it wasn’t; I think if that was the question he could head over to BBRef or FG and look at the individual components of WAR. I think it was more “I have an axe to grind against this stat I don’t really like or understand; here’s an example where I can try to make it look foolish!”

    Comment by Jason B — May 1, 2013 @ 10:30 am

  66. Walker could have ordered and eaten a pizza waiting for Goldschmidt to reach secondbase!

    Comment by Steve Z — May 1, 2013 @ 10:51 am

  67. In the absence of advanced metrics, the notion of Marte=Harper is crazy.

    Lacking advanced metrics, much of the perceived difference between the two players can be attributed to the hype which surrounds each of them. Marte is fast and has a plus-plus throwing arm (an Altoona pitcher with a radar gun recorded 100 mph on a Marte throw from the outfield). He has always hit for average and has added power as he progressed through the minors and recovered from a hamate injury. The only major warning sign: He has swing and miss issues. Harper seems to be improving in this regard. But, can we rationally expect Harper to hit like Ted Williams for the remainder of this season?

    Comment by Steve Z — May 1, 2013 @ 11:12 am

  68. Hyman: Let us play an We will both trust one another while you try to convince me that Marte and Harper provide equal or nearly equal baseball value to their teams.

    The problem, of course, is that we, the observers, may doubt Hyman’s willingness to believe the evidence provided to him in this regard. After all, he could have worked through the data himself! But he didn’t. It wasn’t worth the effort, it appears. Instead, Hyman made a demand: namely, that a stathead provide him with an explanation he could accept, an explanation that would undermine what he knows to be true — Harper > Marte.

    In other words, we should not trust Hyman to change his beliefs when presented with evidence that undermines those beliefs. He appears to be a disbeliever hiding behind the face of a skeptic.

    Comment by Steve Z — May 1, 2013 @ 11:33 am

  69. This.


    Comment by Steve Z — May 1, 2013 @ 11:36 am

  70. I managed to bungle a link. The first sentence should read.

    Let us play an assurance game.

    The link:

    Comment by Steve Z — May 1, 2013 @ 11:38 am

  71. Cameron addressed the two cases that Heyman cherry-picked, explained them fully and then made a much broader point.
    I don’t know what more you could expect.
    This was a great article.

    Comment by Baltar — May 1, 2013 @ 1:37 pm

  72. Yes, these tools have tremendous potential. I have a feeling (and am hoping) that in just a few years we’ll have metrics that will outclass our current ones in the same way that surgery outclasses bleeding.

    Comment by Baltar — May 1, 2013 @ 1:52 pm

  73. I have a question. How many runs would a replacement level line-up average per 162 games in a neutral park? I’m getting 3.2 runs per game; also, how many runs per game would a replacement level pitching staff (starters plus relievers) give up in a neutral park? I got about 5.5 runs per game, but I’m not so certain about this.

    Thank you.

    Comment by Ty — May 1, 2013 @ 5:19 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Close this window.

0.296 Powered by WordPress