## WAR: Imperfect but Useful Even in Small Samples

This morning, Jon Heyman noted an odd thing on Twitter:

He was quoting Baseball-Reference’s WAR calculation, and the two are indeed tied at +1.7 WAR on B-R. Here, we have Bryce Harper (+1.5 WAR) ahead of Starling Marte (+1.2 WAR), but the point still basically stands; WAR thinks Harper (1.200 OPS) and Marte (.835 OPS) have both been pretty great this year, with just a small (or no) difference between them. What Harper has done with the bat, WAR believes that Marte has mostly made up with his legs in baserunning (+3 run advantage) and defense (+3 run advantage), as well a slight bump from getting 12 extra plate appearances.

There’s no question that Harper has been a better offensive player, but there are questions about the defensive valuations, because defensive metrics aren’t as refined at this point as offensive metrics are. It is much easier to prove that Harper has been +10 runs better with the bat this year than it is to prove that Marte has been +3 runs better defensively by UZR, or +7 runs better defensively by DRS. There are more sources for error in the defensive metrics, and Heyman’s tweet led to a discussion on Twitter about the usefulness of including small sample defensive metrics in WAR.

I’ve written before about the strong correlation between team WAR and team winning percentage, and others have followed up with similar analysis more recently. However, all those articles have focused on full season or multi-season data samples, and since the question was raised and I hadn’t yet seen it answered, I became curious about whether WAR would actually correlate better at this point in the year if we just assumed every player in baseball was an average defender.

Essentially, if we just removed defensive metrics from the equation, and evaluated teams solely on their hitting and pitching, how would our WAR calculation compare to team winning percentage? And how does WAR correlate to team winning percentage based on just April 2013 data, when we’re dealing with much smaller sample sizes?

To answer that question, I turned to the numbers. To convert team WAR into an expected winning percentage, I just added position player and pitcher WAR to get total team WAR, divided by games played, multiplied WAR per game by 162 to get a full season total, and then added +47.7 wins — the replacement level assumption — to extrapolated team WAR. I then divided that total by 162 games to get a team’s expected winning percentage based solely on their WAR total, and compared that winning percentage to their actual current winning percentage. Here’s the table showing that comparison.

Team Winning% WARWin% Correlation R squared
Red Sox 0.720 0.706 0.880 0.775
Rangers 0.640 0.630
Braves 0.625 0.578
Yankees 0.625 0.561
Diamondbacks 0.600 0.574
Orioles 0.600 0.550
Pirates 0.600 0.490
Rockies 0.600 0.610
Royals 0.591 0.608
Cardinals 0.583 0.494
Tigers 0.565 0.660
Athletics 0.538 0.552
Reds 0.538 0.591
Twins 0.524 0.461
Brewers 0.522 0.438
Giants 0.520 0.570
Nationals 0.520 0.486
Dodgers 0.500 0.515
Rays 0.480 0.530
Phillies 0.462 0.437
Mets 0.435 0.477
White Sox 0.417 0.428
Indians 0.409 0.517
Mariners 0.407 0.413
Angels 0.375 0.386
Cubs 0.375 0.432
Blue Jays 0.346 0.391
Astros 0.280 0.350
Marlins 0.240 0.242

The correlation between actual team winning percentage and expected team winning percentage based on WAR is .88, which is almost exactly what Glenn DuPaul found when testing the correlation between full season WAR and team winning percentage last summer. It’s higher than what I got when I compared WAR to team winning percentage back in 2009, before we added things like baserunning to improve the formula. With about 15% of the season completed, current team WAR explains 78% of current team winning percentage.

Considering that WAR doesn’t include any kind of situational context, and we know the sequencing of hits and runs can have a major impact on a team’s win-loss record, that’s still a very robust correlation. That correlation suggests that WAR is doing a lot of things right in terms of measuring the results that lead to wins and losses.

It is almost certainly doing some things wrong as well, and it is theoretically possible that what WAR is getting right is hitting and pitching, and the defensive component is weakening what would be an even stronger correlation if the fielding metrics weren’t included. So, let’s check that out. Here’s a table of team winning percentage compared to a WAR-based winning percentage that assumes every player in baseball has played average defense this season. This is WAR with UZR removed, essentially.

Team Winning% NoFldWARWin% Correlation R squared
Red Sox 0.720 0.666 0.824 0.679
Rangers 0.640 0.608
Braves 0.625 0.545
Yankees 0.625 0.559
Diamondbacks 0.600 0.524
Orioles 0.600 0.510
Pirates 0.600 0.476
Rockies 0.600 0.605
Royals 0.591 0.557
Cardinals 0.583 0.515
Tigers 0.565 0.709
Athletics 0.538 0.608
Reds 0.538 0.562
Twins 0.524 0.570
Brewers 0.522 0.431
Giants 0.520 0.512
Nationals 0.520 0.475
Dodgers 0.500 0.508
Rays 0.480 0.485
Phillies 0.462 0.463
Mets 0.435 0.510
White Sox 0.417 0.445
Indians 0.409 0.482
Mariners 0.407 0.419
Angels 0.375 0.398
Cubs 0.375 0.440
Blue Jays 0.346 0.428
Astros 0.280 0.366
Marlins 0.240 0.297

Get rid of those crappy small sample useless defensive metrics that are full of errors and bias and you end up with a lower correlation to team wins and losses. The r squared now explains just 68% of a team’s winning percentage. A month into the 2013 season, WAR explains less about team performance without UZR than it does with it.

The original source of Heyman’s tweet, though, wasn’t UZR. He was quoting B-R’s WAR calculation, which uses Defensive Runs Saved as its fielding metric. UZR isn’t nearly as bullish on Starling Marte‘s defensive performance as DRS, so maybe it’s BIS’ fielding metric that’s the problem here? To check, I swapped out UZR for DRS in our WAR calculation and re-ran the numbers again. One more table.

Team Winning% DRSWARWin% Correlation R squared
Red Sox 0.720 0.687 0.898 0.806
Rangers 0.640 0.663
Braves 0.625 0.567
Yankees 0.625 0.572
Diamondbacks 0.600 0.588
Orioles 0.600 0.544
Pirates 0.600 0.562
Rockies 0.600 0.631
Royals 0.591 0.553
Cardinals 0.583 0.489
Tigers 0.565 0.653
Athletics 0.538 0.496
Reds 0.538 0.607
Twins 0.524 0.488
Brewers 0.522 0.500
Giants 0.520 0.495
Nationals 0.520 0.488
Dodgers 0.500 0.556
Rays 0.480 0.519
Phillies 0.462 0.459
Mets 0.435 0.491
White Sox 0.417 0.410
Indians 0.409 0.482
Mariners 0.407 0.395
Angels 0.375 0.328
Cubs 0.375 0.418
Blue Jays 0.346 0.393
Astros 0.280 0.375
Marlins 0.240 0.224

Well, that’s not it. WAR with DRS comes up with basically the same correlation to team winning percentage as WAR with UZR, and both do better than WAR without any defensive component.

Now, maximizing correlation to team winning percentage should not be the goal of WAR. If it was, we’d just make the inputs RBIs and RBIs allowed, and the correlation would be something like .99. It wouldn’t be a better metric simply because it was more highly correlated with winning percentage. This test is essentially a sanity check to make sure that WAR is actually measuring things that impact team wins and losses. The inputs of WAR were chosen to try and identify context-neutral individual player performance, and it’s a good sign that things chosen for those reasons end up correlating well to team wins and losses. It tells us that WAR is working pretty well, even in small samples. Even with imperfect inputs. Even with defensive inputs that are best used in the largest sample you can possibly get.

WAR is not imperfect, nor is it precise. It is best used in whole numbers, with any fractional difference being seen as marginal gaps at best, especially if that difference is based mostly on the defensive components. I wouldn’t say that Starling Marte has been Bryce Harper‘s equal so far, because I doubt that’s true. You shouldn’t take two dozen games worth of WAR at face value.

But you shouldn’t take two dozen games worth of anything at face value. The major league leader in ERA is currently Jake Westbrook, at 0.98. If you took ERA at face value, you’d have to argue that Jake Westbrook has been the best pitcher in baseball, and is on pace to have the best pitcher season in the history of the game. No one actually believes that, and no one is arguing that, because everyone knows that in a month’s worth of games, you’re going to see some funky results. Funky results in 24 games do not invalide a metric.

Just like every statistic under the sun, WAR is better when used in large samples. But, despite the beatings it takes on a regular basis, WAR actually does its job pretty well — not perfectly, because it is not a perfect model — even just based on April data alone.

And, getting back to the original point, I’ll note that WAR is very good at spotlighting players like Starling Marte, who deserve recognition but probably aren’t getting it due to the continuing focus on the triple crown statistics in the mainstream media. Today, because of WAR, a lot of people learned that Starling Marte is having a pretty great April. I’ll call that a win, even if the calculation might off by a few runs here or there.

Print This Post

Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.

### 73 Responses to “WAR: Imperfect but Useful Even in Small Samples”

You can follow any responses to this entry through the RSS 2.0 feed.
1. Eric says:

Out of curiosity, would you expect to get a better correlation by using offensive WAR and pitcher WAR based on RA9 instead of FIP? All defensive performance would be bundled into the pitcher stat so you wouldn’t have to worry about the imperfections of UZR and DRS. Obviously this wouldn’t be super useful for evaluating individual performance, but it would be interesting to see on the the team level.

• Dave Cameron says:

Yes, if you use RA9 instead of FIP and some fielding metric to account for defense, you’ll come up with a higher correlation. But you’ll also then be including context into the metric, because runs allowed include sequencing. If you did that, you’d have a metric that included sequencing for pitching but not for hitting, unless you also swapped out batting runs for RE24. In other words, you wouldn’t be measuring pitchers and hitters the same way anymore.

What inputs you use all depend on what questions you’re asking. There are times when you want to include sequencing and times you don’t.

• Eric says:

Interesting. Thanks!

Taking all the joy out of baseball…blergh…

#### -79

• Jake says:

Agreed. I loved baseball before I read this post. Now I hate it.

#### +96

• l1ay says:

I had set aside my evening to devote to Harvey v. Fernandez myself. Now, I think I’d rather go jump off a bridge after reading this post.

#### +7

• akalhar says:

Yeah. I just burned my baseball gear after reading this.

#### +12

• I Agree Guy says:

Hell, I burned my baseball gear in anticipation of reading this.

#### +24

• The Humber Games says:

I decided not to buy baseball gear and then burned someone else’s in anticipation of this article

#### +21

• DCN says:

I burned down a baseball field at the mere rumor of this article.

#### +10

• Bigmouth says:

Not cool, bro. You can burn your friends, and you can burn your baseball gear, but you can’t burn your friends’ baseball gear.

• Kevin says:

I dunno, I don’t need to be a statistics guy to appreciate that this type of analysis shows that there are certain things which can make a player worthy. WAR isn’t infallible, but it’s one useful tool amongst many.

3. Thomas Grantham says:

Great Post

4. taprat says:

Even over just 20 games, by using data from all 30 major league teams, aren’t you in fact looking at a fairly large sample here, such that you would expect the strong correlation that you found?

Heyman’s point related to comparing 2 players and one month’s data. Yours involved 500+ players and one’s month’s data.

Or is that not the correct way to look at this?

#### +15

• La Flama Blanca says:

This was my first thought. I’m trying to argue myself over why we’re wrong but it seems to put a wrench in the premise of this article.

• Baltar says:

Cameron addressed the two cases that Heyman cherry-picked, explained them fully and then made a much broader point.
I don’t know what more you could expect.
This was a great article.

5. O's Fan says:

Issue: Heyman’s tweet is about individual WAR, Cameron’s defense about team WAR. Defensive metrics may well improve WAR when looked at in the aggregate and still be potentially inaccurate when looked at for individual players.

#### +6

• Justin says:

The point is that using the inputs that go into WAR this early in the season are still better than not using them. Using UZR is better than ignoring it.

• Pitnick says:

He understood the point.

6. Chris H says:

Interesting article but I’m not sure this addresses the initial question of whether WAR is accurate for marte and Harper. By using team WAR aren’t you increasing the sample size by like 25 players?

7. Matt Hunter says:

The issue with Heyman’s tweet is that he’s coming from a position of assuming that Harper has been better than Marte, and then criticizing WAR because it disagrees. Sure, UZR and DRS are pretty close to useless on an individual level after one month. But their values are certainly possible. That is, it’s very conceivable that Marte’s defense has been good enough to make up the ground between him and Harper offensively.

The problem is that we have no idea if that’s true or not. But it could be true, and for Heyman to just assume that it isn’t is incorrect just as assuming that it is true is incorrect.

Here’s another way to think about it. Harper has about 14 runs from offense+baserunning and Marte has about 7. Say we don’t know the rest of their value, but we know it’s some random number between -5 and 5 runs.

Now, the odds are not in Marte’s favor. If we have to choose one, we’ll say that Harper is better. But we shouldn’t have to choose one. Because for all we know, Marte’s defense – that random number of runs – makes up the gap between the two offensively. It’s unlikely, but possible. We can’t come from a position of assuming that they’re equal defensively just because we have no way of knowing their defensive value. We can saw that Harper is likely better, but we still need to admit to the fact that he might not be.

Does that make sense or did I just spout a lot of nonsense?

• That Guy says:

Aside from that, it’s easy to believe that Harper’s offensive production is real, whereas we still have to suss out whether or not Marte’s defense is real.

Put it this way, if Andrelton Simmons was already +8 runs defensively, we’d all pretty go along with the idea that he can be a 5WAR player on defense alone. It matches what we already know and what we’d like to believe, ie, we’ll see a defense only 5WAR players the way that we see 5WAR offense players.

• matt w says:

It isn’t that surprising that Marte’s defense grades out well either, at least not to a Pirate fan. Marte’s been getting rave reviews about his defense in the minors for a while. (MLB.com didn’t see fit to preserve the video of him throwing out Paul Goldschmidt trying to take an extra base, but here’s a gif: http://assets.sbnation.com/assets/1283678/marte.gif.)

• Jay29 says:

Nice play by Marte. He definitely has played well in the outfield according to the naked eye.

What bothers me about that gif is, even while recognizing it’s slo-mo, how long it takes Neil Walker to place the glove down for the tage after he first catches it.

Why does he catch it so far out there, and then undertake a sloppy and awkward twist/pivot/jump just to get in front of the bag? I would have positioned myself on the RF side of the bag, with my body pointing towards 3B (as if preparing to make a quick relay throw to 1B), ready to reach out and catch the ball and swipe it back toward the bag in an instant.

I know MLB infielders have to be wary of giant, fast runners sliding into them, but still. I see this a lot, and as a one-time fundamentally-sound kid 2B (read: couldn’t hit), I don’t understand why they’re not doing it better.

• Trent Phloog says:

Looks to me like he took his time because Goldschmidt was still ~15 feet away when he caught the ball. :)

Not taking anything away from Marte’s hose of an arm, but there was a poor baserunning decision involved there as well.

• Steve Z says:

Walker could have ordered and eaten a pizza waiting for Goldschmidt to reach secondbase!

• Jason B says:

“we’d all pretty go along with the idea that he can be a 5WAR player on defense alone”

No, we all wouldn’t.

However Braves fans DEFINITELY would!

• Incompetent peeps says:

I actually think that the lesson is that Heyman, Joe Morgan, Hawk, and the like are closed-minded idiots and shouldn’t be paid attention to. Hopefully, they lose their jobs and competent people will get hired. Alas, one can wish…

• Jason B says:

“Hopefully, they lose their jobs”

C’mon now. Fire people because they disagree, or don’t buy into this statistic or that one? Man we are sooooooo quick to fire people for their shortcomings as long as its not us! (We have plenty of excuses and explanations for our own!)

• TKDC says:

So Adam Dunn should keep getting \$15 million a year as long as he likes, because firing him would be mean? These guys get paid a lot of money to be idiots and there are much more qualified people out there (as in, just about anyone in a few of these cases).

• Jason B says:

It’s awesome that you’ve never screwed up at your job! What’s that like? (Good thing it didn’t happen in public so total strangers could call you an idiot on a message board!)

And Adam Dunn should keep getting \$15 million a year as long as some owner will pay him \$15 million a year, methinks. Heck, \$45 million if he can. \$9 million if he has to settle.

• Kazinski says:

There job is not to provide useful information, its to provide additional entertainment while people are watching the game, fill in the natural lulls of the game and get excited in order to get you to focus your attention when there is meaningful action happening.

If it improved ratings they would play a sound track of water buffaloes mating during the game.

• The Stranger says:

That made a lot of sense. The point of advanced statistics (and old-school ones) is to tell us things about the game. If these things were intuitively obvious, there wouldn’t be a lot of value to the statistics. Heyman appears to be disregarding WAR just because it tells him something that isn’t intuitively obvious – Starling Marte is (probably) having a pretty good season so far.

8. That Guy says:

I’m not a hater of the HR stat, but explain why John Buck has more than Ryan Braun.

#### +36

• Matt Hunter says:

I don’t think this is a good comparison. HR just measures number of home runs, so it’s been completely accurate thus far. WAR measures a player’s contribution to the team, and because of the unreliability of defensive metrics, it probably hasn’t been accurate thus far.

• Pitnick says:

Sarchasm. I think you fell in.

#### +13

• That Guy says:

It was for general goof factor, of course, but we’re also having to look at this from a sample range perspective as well. Why can’t John Buck hit as many HR as Ryan Braun given arbitrary ennpoints? Why can’t Starling Marte have as much WAR as Bryce Harper given arbitary endpoints?

9. Jon L. says:

I really don’t think your approach here makes any sense, at least as regarding the defensive WAR totals of Harper and Marte. WAR is based on what happens on the field, and what happens on the field affects winning and losing. If a bunch of balls are dropping around Harper in the outfield, his defensive WAR rating will be low, and his team will suffer. What this metric can’t tell us – especially in a tiny sample – is whether Harper “should” have caught any of those balls, or whether Marte would have, or whether Harper might have equalled Marte if only they’d exchanged positions in their respective outfields.

You might be able to guess an American’s income and a Norwegian’s income by evaluating all economic exchanges of those countries, and then checking against GNP to make sure you were accurate, with the result being that you’ll think you just proved you were.

• Burrito says:

And if we swap the pitches that each saw, maybe Marte would do better and Harper worse. That’s the batting equivalent of the argument you’re making.

Both hitting and defensive metrics are measured in a manner to compare them to an average player.

In any event, you should probably read up on UZR and DRS in the FG library to better understand it. No one is going to go through tedious process of switching each individual play for two players to see if A would do better than B or vice versa. It’s also way more subjective than deciding what the average player would do.

• Jon L. says:

Your suggestion that I read up on the metrics may be well-taken, but you’re missing my point. Defensive plays don’t always involve one defender who clearly has a chance to make a play. For batting statistics, in contrast, we know who was batting. We don’t just take pitches thrown somewhere in the vicinity of Harper, LaRoche, or Werth, and then attribute 1/3 of that set of results to Harper. If we did, we’d have very little idea of how good a hitter Harper was. Nonetheless, if we did it for the whole Nationals team, we’d end up with a perfectly accurate image of the team’s offense, regardless of how inaccurate we were for individuals. The reason for this, as in the article, is that we’re ulimately comparing a summary measure of on-field results to the on-field results.

10. Jeff says:

The real gem here (as Dave alludes to) is that WAR is pointing out the terrific (albeit SSS) start that Marte has had. In the absence of advanced metrics, the notion of Marte=Harper is crazy. In fact, I was so shocked that I went and looked up Marte’s numbers. Surprisingly, they are quite comparable in aggregate. This isn’t a weakness of WAR — this is what it does so well.

#### +5

• Steve Z says:

In the absence of advanced metrics, the notion of Marte=Harper is crazy.

Lacking advanced metrics, much of the perceived difference between the two players can be attributed to the hype which surrounds each of them. Marte is fast and has a plus-plus throwing arm (an Altoona pitcher with a radar gun recorded 100 mph on a Marte throw from the outfield). He has always hit for average and has added power as he progressed through the minors and recovered from a hamate injury. The only major warning sign: He has swing and miss issues. Harper seems to be improving in this regard. But, can we rationally expect Harper to hit like Ted Williams for the remainder of this season?

11. Andy Olds says:

Harper has 1.5 wins. Marte has 1.2 wins. That’s 2 wins of difference over the course of a season, or the difference between Ryan Braun and Alex Gordon last season. Is that really implausible?

#### +11

12. Tim says:

Too much frequentism. Basic Bayesian logic tells us that if defensive systems are measuring anything worthwhile at all, having them will be better than not having them, over any sample.

• Tom Rigid says:

Absolutely. It’s just much harder to make margin of error adjustments to the gut than to the spreadsheet.

13. JuanPierreDoesSteroids says:

Now use RE24 instead of wRAA and a game of “Cow Pie Bingo” instead of UZR. R-squared=1.001.

14. matt w says:

Pretty much OT but Westbrook got a 2-inning 4-run performance rained out, so we really shouldn’t take that ERA number at face value.

15. Ian says:

The real question is: how can we better evaluate defensive performance? Our rudimentary evaluations do account for some defensive value, as this article attests, but there is certainly much room for improvement.

It would be interesting to see if any major league teams adopt the technology created by SportVU, which is currently being used in the NBA to track player and ball movement for evaluative purposes. I think this information could certainly improve defensive performance evaluation (and as a side note I think it would be extremely useful in evaluating American football players).

My guess is that Andrew Friedman and others are already exploring this, and my hope is that defensive metrics will take a giant step forward as a result.

Great article by the way. Improvement is all about recognizing and correcting deficiencies.

• Jason says:

Hit f/x and Field f/x

• Baltar says:

Yes, these tools have tremendous potential. I have a feeling (and am hoping) that in just a few years we’ll have metrics that will outclass our current ones in the same way that surgery outclasses bleeding.

• That Guy says:

The A’s have something in house that told them that Cabrera indeed was more valuable than Trout last year and I understood that to mean that they placed a very different value on defense than UZR or any other metric that we know did. It would be nice to see something like SportVU freely available for baseball.

• Mcneildon says:

There’s a service called TrackMan which is used by 17 MLB teams which does something to this effect

16. Corey says:

Is the correlation statistically significant?

17. Ronaldo says:

On an aggregate and team level, we’re probably to the point where we have large enough sample sizes to make defensive metrics somewhat meaningful and start seeing positive correlation.This does *not* mean that there is any statistical relevance to an individual’s stats.

18. TKDC says:

That’s all well and good, but couldn’t you fit all that in a tweet?

19. Justin Bailey says:

“WAR is not imperfect, nor is it precise.”

For the kajillionth time, you guys need an editor. Cripes.

This is a good analysis, but for the love of god could you put some error bars on WAR please?

• Jason H says:

The error bars are roughly equal to the league-wide range of defensive values.

21. Hank says:

Why are people calling this “good analysis”. If the question was: “How quickly does team WAR stabilize and correlate to team level performance?” then the analysis is OK; but what does this have to do with individual WAR after 1 month?

It shouldn’t be all that hard to do…. just go back to last year’s (or a few years of) data and correlate individual player April WAR totals to full year WAR totals (can do this on a rate basis or extrapolate 1 month out to a full year?). Not sure why in the world you would look at an aggregate team level data – you end up with a much larger sample size than 1 year of player WAR; of course that is going to show a decent correlation – it should be much more solid than 1 year of an individual’s #’s.

#### +5

• agam22 says:

That correlation would be irrelevant because WAR is a counting stat. Performance will vary greatly from month to month so comparing season WAR totals to April WAR totals and noticing a poor correlation would be the same thing as playing the extrapolation game that says a player has 8 home runs through 20 games, therefore I predict he will finish the season with 64. One month of WAR was never intended to be that predictive

22. Rob Heaton says:

Using a pitcher to demonstrate the fallacy of low sample size among hitters is deceptive. While Westbrook’s % of season complete is the same, he has appeared in 4 games and faced relatively weak hitting teams…. THe hitters have 20-30 games against a mix of different pitchers so a much bigger sample size even if the % of the year is the same.

Also, your article basically says: “WAR says Marte and Harper are about the same players, but we all believe that Harper has been a lot better…. so small sample is probably the reason, but overall WAR is very reflective of performance even in small samples.” Seems like you don’t want WAR to say Marte = Harper, but you do believe WAR to be a good reflection even in small samples…. seems contradictory and not logical.

23. Eric R says:

“i am not a hater of WAR stat, but if someone can explain to me how starling marte & bryce harper are both 1.7, please do”

Had he looked at the player pages he would have seen the exact reason since both BB-ref and Fangraphs provide the components that add up to WAR. So, he is lazy or a moron or he in fact is a hater just looking to bad-mouth the stat for no good reason.

Any of the three [let alone more than one of them] is plenty for me to ignore anything he writes…

24. sbnovafan says:

The analysis is good, but there are some holes here. He uses team measurement of WAR to show that WAR needs defense to be as accurate as possible. while this might be true, it doesn’t explain that how Harper might have just had an unusually bad month of defense, while Marte cold have had an unusually good month of offense. This small sample size with these two players is the same case as Matt Harvey having his ridiculous month pitching. He could be the next RA dickey, or the next Ubaldo Jimenez. Normal stats like ERA can be inaccurate in measuring a players value in small samples, and the same can be said for UZR. This isnt true when measuring team-WAR correlation, and that is why Cameron sees UZR as so accurate.

25. TKDC says:

Doesn’t Harper have a lot of mishaps in the field? And I know his base running aggressiveness has gotten the better of him in the past. Going back to the Trout/Miggy debate, it seems odd that the old school writers start to not care about doing the little things when there is an honest attempt to measure them in a stat that people actually care about. It makes you wonder what their beef actually is (it actually doesn’t make me wonder – I feel confident I know exactly what it is and so do you).

26. Mark says:

Player A is having a bad month in the field according to everybody, but once you put that in numbers everybody starts complaining about sample sizes and unreliable defensive metrics.

27. rubesandbabes says:

The evidence for the baserunning edge for Marte is just not there.

Marte has grounded into more double plays 3-2.

(Yes, I know, but still this is one of the speed profiles of the game. Lead the league in GIDP – that means you are not one of the fastest guys. Why should Marte get a pass on this? Plus, Marte’s a leadoff hitter in the National League. Edge to Harper.)

Marte and Harper have been caught twice each on the basepaths.

Doubles and triples: Harper is right there with fewer at-bats. Harper has a triple, too.

So, while Harper takes a whole bunch more trots around the bases, Marte’s 6 stolen bases and and one extra base on extra base hits is being projected to 3 runs. Add in the GIDP to Marte’s baserunning, and it becomes only the 6 stolen bases.

So, 6 stolen bases = 3 runs better in baserunning? Not in this case.

• Terence says:

I don’t think you understand Bsr (known otherwise as ubr).

28. Daaaan says:

My one issue with Dave’s logic here is the Jake Westbrook reference. It’s a red herring. After answering Heyman’s question by dissecting the intricacies of Marte’s running and fielding success, as well as the slight fallibility of WAR; he goes on to chalk something up to small sample size?

The question was never; why are these two players playing as well as each other. It was, why does WAR SAY they are when they SEEM not to be?

SSS has nothing to do with that.

• Jason B says:

“It was, why does WAR SAY they are when they SEEM not to be?”

No, it wasn’t; I think if that was the question he could head over to BBRef or FG and look at the individual components of WAR. I think it was more “I have an axe to grind against this stat I don’t really like or understand; here’s an example where I can try to make it look foolish!”

29. Ty says:

I have a question. How many runs would a replacement level line-up average per 162 games in a neutral park? I’m getting 3.2 runs per game; also, how many runs per game would a replacement level pitching staff (starters plus relievers) give up in a neutral park? I got about 5.5 runs per game, but I’m not so certain about this.

Thank you.