Unifying Replacement Level

On Christmas Eve of 2008, David Appelman gave the world a present – “win values” on the pages of FanGraphs. It wasn’t labeled WAR for a little while longer, though it was an implementation of the model Tom Tango laid out at The Book Blog a few months prior. Over these last four years, the model has become quite popular, and even those who are not fans of analytics know what WAR stands for. Over time, the model grew in popularity, and in 2010, Baseball Reference added it to their collection of statistics. Because WAR is essentially a model of player value, there are decisions that have to be made about the way it is constructed that don’t have an obviously correct answer. In places where we had made one decision, Sean Forman (and Sean Smith, who assisted with their original implementation) made some other decisions, and the calculations differ in some significant ways.

We know that this is a source of frustration for some folks, having two sites both publicly display different calculations for a statistic of the same name. Often, the differences between the two have been used to discredit the entire model. For instance, Jim Caple wrote this on ESPN.com a few months back:

Actually, we know it isn’t always accurate because depending on your source — FanGraphs or Baseball-reference.com — you can get wildly different WAR scores… For example:

Does (Jack) Morris, in fact, belong in the Hall of Fame? No, he doesn’t, according to baseball-reference.com, which gives him a WAR score of 39.3, tied for 145th all time among pitchers. Maybe he does, according to FanGraphs, which gives him a 56.9 WAR, 75th all time.

When Caple wrote it, I wasn’t exactly sure why Morris’ value differed so much, but since we measure pitching in very different ways, I assumed that the 17.6 win gap was due to some differences between Morris’ FIP and his runs allowed. But, then, I looked it up, and Morris’ career ERA (3.90) was almost an exact match for his FIP (3.94). Adjusted for park, Morris’ career FIP- was 97, while his ratio of RA9 to league average on Baseball-Reference is 96. Even with very different inputs, both models came to the same conclusion about Morris – he was a slightly above average pitcher who had a very long career. So, why did we give him credit for an additional 17.6 wins?

The answer, quite simply, lies with replacement level. Our model used a lower baseline than Baseball-Reference did, so the same performance would result in a higher WAR in our model than in theirs. Over very long careers — like Morris’, for instance, or many of the old time pitchers who threw forever — this could really begin to add up, and give the appearance of large disagreements when the two systems didn’t actually see things all that differently. In the case of guys with substantial careers, many of the large discrepancies were simply driven by the fact that the two sites had a different definition of replacement level.

After reading Caple’s article, David Appelman and I began discussing the idea of reaching out to Sean Forman and seeing if he was interested in agreeing to a unified replacement level. Before we could actually even send that email, Sean reached out to us with the exact same idea. And so, today, we’re pleased to announce that Baseball-Reference and FanGraphs have adopted that unified replacement level, allowing our two models to now measure players on the same scale.

As David noted a few minutes ago, this new unified replacement level is now set at 1,000 WAR per 2,430 Major League games, which is the number of wins available in a 162 game season played by 30 teams. Or, an easier way to put it is that our new replacement level is now equal to a .294 winning percentage, which works out to 47.7 wins over a full season. Conveniently, this number is almost exactly halfway in between our previous replacement level (.265) and Baseball-Reference’s previous replacement level (.320), though the number wasn’t chosen solely as an equal compromise.

In Tango’s original methodology post back in 2008, the model he laid out used a replacement level equal to 1,009 wins, or a .292 winning percentage, so this is in essence a return to WAR’s roots. In that post, Tango notes:

Replacement is defined very specifically for my purposes: it’s the talent level for which you would pay the minimum salary on the open market, or for which you can obtain at minimal cost in a trade.”

There are a variety of ways you can measure what kind of expected performance you might get from a replacement level player. A few months ago, I looked at the performance of position players who were acquired via minor league contract or waiver claim this winter, and over the last two seasons, those 24 players had accumulated almost exactly zero WAR in over 10,000 plate appearances. So, that suggests that the baseline has always been in the right neighborhood, at least.

That’s not the only way to figure out where replacement level should be, however. We can also look at the worst performances of players who have long Major League careers, and see what the minimum level of production teams have required in order to keep a player in the league for an extended number of years rather than simply swapping them out for someone else. Major League teams don’t always evaluate talent perfectly, but if they were continually employing players that were below our established replacement level for 10 to 15 years, it would be a pretty good sign that our replacement level was too high, and that they couldn’t simply replace these guys with someone better with minimal effort.

That’s not what we see, however. If you use .294 as the replacement level, 627 of the 628 players with at least 6,000 Major League plate appearances — that is, the equivalent of 10 full seasons of regular playing time — have a career WAR north of 0.0. The only player who falls below replacement level with this baseline is Alfredo Griffin, coming in at -1.0 WAR in 7,331 plate appearances, which works out to -0.08 WAR per full season. For all intents and purposes, that’s zero.

You can calculate replacement level a number of different ways, but in the end, it always leads back to a number in this vicinity. Baseball-Reference arrived at a number a little higher than what Tango had used, while we came up with one a little lower. Because they were at opposite ends of the defensible spectrum, the different baselines gave a false sense of difference in the actual calculations. Now, with an agreed upon replacement level, those differences that are solely due to scale will go away.

The net effect of this change is that players will get a little less WAR per season in our method (and a little more in B-R’s) than they used to. On an individual season level, you’re barely going to notice the shifts. For instance, Mike Trout‘s career +10.8 WAR in 774 plate appearances under our old calculation will become +10.7 WAR with the new changes. However, at the completed career level, you’re going to see some bigger drops. Luis Aparicio, with his 11,230 career plate appearances, drops 14.2 WAR, going from +63.5 down to +49.3. Likewise, Hank Aaron, Brooks Robinson, and Carl Yastrzemski all lose 14 WAR off their career totals. Long career players take the largest hit, as you would expect.

The higher baseline brings our scale down slightly, but we think that change is worth making, as a unified replacement level will allow for comparisons of our apples versus their apples, and will eliminate needless confusion based around an area that didn’t need to cause confusion. These changes weren’t made lightly, and we know that there is always some resistance to any sort of change, but we hope that you see the unification of replacement level between the two sites as a positive overall.

While there will never be one single agreed upon WAR calculation — I’d call that a feature and not a bug, but that’s another post — the common baseline will give us a better opportunity to explore where the real differences are, rather than being tricked into seeing big gaps where none actually exist.

So, that’s the short version of the story behind this change. We’ll have more on this going forward, including a post coming later this afternoon on why we need replacement level to begin with, but for now, we hope you guys see this as a step forward for WAR as a metric.




Print This Post



Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.


138 Responses to “Unifying Replacement Level”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Tomcat says:

    Does this improve the case of someone like say Larry Walker?

    Vote -1 Vote +1

  2. If replacement level is set at approx 48 wins a season, and the Astros won 55 games last year…

    Oh my.

    At least they’d make for a semi-competitive AAA team.

    +8 Vote -1 Vote +1

  3. lonewolf says:

    I feel like this change makes being a baseball nerd a lot easier.

    +9 Vote -1 Vote +1

    • lonewolf says:

      And when can we expect these changes to be made on the website(s)?

      Vote -1 Vote +1

      • Dave Cameron says:

        The numbers are live now on FanGraphs. Sean will announce the changes at B-R when they’re live there, I’d assume.

        Vote -1 Vote +1

        • MonkeyEpoxy says:

          How much longer until the new FIP formula gets uploaded into the FG glossary?

          And I’m super curious to see which pitchers’ numbers have changed for the better or the worse.

          Vote -1 Vote +1

  4. Blue says:

    It’s definitely a step forward–but the statistic still has a whole lot of false precision which mere standardization of the baseline won’t fix. Mixing the poorly measured defense value (to say nothing of the baserunning hack) and giving players credit for positional adjustments that are, at best, heroic assumptions, in with the well-described offense variables leads to a measure that is full of a whole lot of mush.

    Vote -1 Vote +1

    • mickeyg13 says:

      The false precision is a problem with the *users* of the statistic, not the statistic itself. It’s not the fault of WAR if some writer mistakenly draws a strong conclusion from a 4.2 WAR over a 4.1.

      Vote -1 Vote +1

    • Tomcat says:

      That is like saying the that since a Hammer can’t cut plywood it isn’t a good tool or that since you saw a guy using a hammer to put drywall screws in that there is a flaw with hammers. WAR has flaws and should be used as a conversation starter not ender.

      +16 Vote -1 Vote +1

      • Cguudgyrdycjvhkj says:

        You are saying WAR is the equivalent of saying, “pretty nice weather today,” which is not much different then Blue’s comment.

        A hammer works perfectly to drive nails or claw them out, what does WAR do (beside signal that you use imprecise aggregates to start conversations).

        Vote -1 Vote +1

        • Eric R says:

          What is your preferred method for evaluating MLB talent? Using that lets build what would be considered the best possible roster from MLB talent [lets say based on 2012 stats].

          Then lets build another team using fWAR.

          I’m sure the two teams will have a lot in common [unless the stat you choose is (H-HR+SB)/AB or some other bizarre metric]. If the two teams end up very similar, then it would certainly seem that WAR isn’t quite as terrible as you seem to think. Comparing the players who are not common would at least be very interesting.

          Vote -1 Vote +1

        • Frank's Wild Years says:

          It establishes a baseline to measure player value, you can chose to trust things like park factors and defensive metrics more or less than the model does.

          Vote -1 Vote +1

        • That Guy says:

          The hammer would be a terrible way to install drywall screws, and you certainly wouldn’t want to ‘claw them out’ once installed.

          Vote -1 Vote +1

        • TKDC says:

          OPS has been cited for years (decades?), but what the hell does it mean? Should we have stuck with batting average until something like wOBA came along? Is wOBA good enough? Should we revert back to batting average until we are 100% positive about the linear weights in wOBA?

          Vote -1 Vote +1

    • Baltar says:

      It’s easy to point out flaws in WAR, but what’s the point? We know it’s a rough indication of a player’s value, but it’s a pretty good one and it’s objective.
      If you have a better stat to do that job, please reveal it to us. If you have proven ways to improve WAR, please reveal them to us.
      Otherwise, shut up.

      +8 Vote -1 Vote +1

    • commenter #1 says:

      We’ll be eagerly awaiting your more correct system’s publication on your website.

      Vote -1 Vote +1

  5. Jeff T says:

    Glad to see this open collaboration between these two giants in the Sabermetric field. . . .

    Vote -1 Vote +1

  6. Bob says:

    Dave – while you’re collaborating with Sean, it might be helpful to list out the other differences between the two versions of WAR, maybe on the glossary page? I know FIP v. ERA is probably the biggest difference.

    Vote -1 Vote +1

    • Sky says:

      B-Ref has a great summary of differences between many WAR systems:

      http://www.baseball-reference.com/about/war_explained_comparison.shtml

      Vote -1 Vote +1

    • zenbitz says:

      I posted on BBTF as well, but I think this would be a good opportunity to for both BBREF and FG to report multiple WARs based on the different partitioning of credit between fielders and pitchers:

      FIP-WAR (fWAR) – batted balls are 100% fielder
      RA-WAR – batted balls are 100% pitcher
      ERA-WAR – batted balls are 98% pitcher (mlb fielding % is about .98)
      bWAR – whatever partitioning bbref uses to get the “middling” number.

      This would show that, in fact, the two sites are using the exact same numbers for RV AND would illustrate the kind of assumptions that go into creating a WAR stat and demystify the stat a bit.

      Vote -1 Vote +1

      • We do have RA9-Wins (which is RA-WAR) on the site and have for a while. We also have BIP-Wins (portion of wins due to balls in play) and LOB-Wins (portion of wins because of stranded runners & misc other stuff), and finally FDP-Wins, which is the difference between RA9-Wins and WAR.

        Vote -1 Vote +1

  7. Hurtlockertwo says:

    Are the total WAR for players on both sites now updated??

    Vote -1 Vote +1

  8. Dave S says:

    bravo.

    Vote -1 Vote +1

  9. James says:

    So are you guys using the same defensive metrics now as well?

    Vote -1 Vote +1

    • Anon21 says:

      So are you guys using the same defensive metrics now as well?

      No. That’s one of the things that Dave’s getting at when he says “there will never be one single agreed upon WAR calculation” and “the common baseline will give us a better opportunity to explore where the real differences are.” Nothing about the way either site calculates WAR is changing, they’re just now starting from a common baseline.

      Vote -1 Vote +1

  10. Manifunk says:

    Oh good, this makes the shallow “just add up the WARs!” style of analysis which has sadly become the norm around here that much easier

    -27 Vote -1 Vote +1

    • Anon21 says:

      Jesus, you’re really worked up about this nonexistent problem, huh? Try reading some of the 98% of Fangraphs articles that aren’t positional power rankings, you dumb whiner.

      Vote -1 Vote +1

    • tomdog says:

      Or try a different site. All they did was create a common baseline for a statistic to make that particular aspect of the site better. If you don’t like it then you know what you can do.

      Vote -1 Vote +1

  11. Brian says:

    I’m a bit confused on the math. How are we getting a .294 win percentage on 1000 WAR per 2430 available wins? I must be missing something.

    Vote -1 Vote +1

    • Brian says:

      never mind. I got it now.

      Vote -1 Vote +1

    • FJ says:

      I agree it’s not very clear about the explanation.

      2430 is the number of wins in a full season amongst all teams you need to be .500. (2430-2430).

      1000 WAR is what’s needed to get to that hypothetical .500 season. So, replacement team level is at 1430 (2430 – 1000) wins.

      1430/4860 = .294

      Hence a replacement level team has a .294 win percentage.

      Vote -1 Vote +1

      • Urban Shocker says:

        Just to make sure I have this right: so a team needs 33.3 WAR in the aggregate to reach .500? (1000 WAR divided by 30 teams).

        Vote -1 Vote +1

        • siggian says:

          Well, 33.3 WAR + 47.7 replacement level wins = 81

          So, yeah, it seems that way.

          Vote -1 Vote +1

        • Urban Shocker says:

          Fancy. Thanks for the explanation Siggian, that clears up a lot.

          That’s not quite what you see on the positional power rankings though, where it looks like an aggregate 38-39 WAR is what it takes to get to 80 wins. any guesses?

          Vote -1 Vote +1

        • Baltar says:

          I do have guesses on that Urban.
          To begin with, that referred to the previous version of WAR, with a lower replacement level.
          I’m guessing the remaining adjustment had to do with injuries and other unknowns, which FanGraphs rightly did not attempt to predict in its rankings.
          The total number of team wins had to come out correct, so an adjustment was made.

          Vote -1 Vote +1

    • chasfh says:

      Why is replacement level set at 1,000 WAR per 2,430 games? What is the genesis of that number?

      Vote -1 Vote +1

  12. Darren says:

    BRAVO!. I have been waiting for this for a long time, and the decision to make it a clean 1000 will make it more simple for the casual reader while still making it accurate and reasonable. Thanks David, Dave and Sean.

    I heard that BPro was also considering unifying their replacement level to your sites as well. Is that happening.

    Vote -1 Vote +1

  13. tz says:

    I guess we now can use the “Griffin line” as the career equivalent of the “Mendoza line”

    +19 Vote -1 Vote +1

  14. MarinersFan000 says:

    Just out of curiosity does anyone know what version of WAR espn uses on their site? Just wondering if the numbers there would be making the adjustment as well or if those numbers would still reflect a different baseline.

    Vote -1 Vote +1

  15. JeffD says:

    Playing the d’s advocate here: isn’t this simply a case of the two WAR peeps getting together so that there can’t be any more, “But the two WAR peeps can’t even come up with the same WAR?” or something similar?

    Vote -1 Vote +1

    • agam22 says:

      But there are still differences in the calculations that, as Dave says, should be considered a feature as they can tell you different things about different players. This is just putting both stats on the same scale to make comparisons easier

      Vote -1 Vote +1

    • chuckb says:

      They’re only getting together on the value of replacement level. They’re not creating 1 unified WAR that will become THE WAR calculation. That was my initial concern but Dave’s explanation here alleviated that.

      Vote -1 Vote +1

    • jfree says:

      JAW JAW is always better than WAR WAR. Now that there’s only one WAR, the saber rattlers can focus on JAW JAW.

      Vote -1 Vote +1

  16. Caveman Jones says:

    Is there any way we can get a list of the players who lost the most WAR due to the change in baseline?

    Vote -1 Vote +1

  17. kdm628496 says:

    does this mean that the cohort of replacement-level players you investigated will now produce a larger negative WAR?

    Vote -1 Vote +1

  18. Jason says:

    Hey Dave, can you go back and fix all the articles written in the last five years? Thanks.

    +6 Vote -1 Vote +1

    • gouis says:

      Hey Jason,

      Nothing changes because everyone goes down by the same amount. So in reality nothing changes except the exact numbers.

      Vote -1 Vote +1

      • chuckb says:

        Everyone doesn’t change by the same amount. The replacement baseline changes for everyone but that doesn’t affect everyone’s WAR calculation equally.

        +6 Vote -1 Vote +1

    • Baltar says:

      LOL!
      That thought occurred to me, not just humouresly but seriously. I was thinking of the recent rankings series and whether they would correct it, then realized that if they did that, why not everything? Then the enormity of the task knocked the silly out of me.

      Vote -1 Vote +1

  19. Tom H. says:

    I think it would be great to add (statistical) uncertainties into baseball stats. For example, if a player had a .400 OBP in 600 plate appearances, the rough statistical error would be 1/sqrt(N) ~ 0.041, so you could quote his OBP as .400 +/- 0.041. The same could be done for any rate stat (or counting stat, if you’re careful). Propagating these errors through the WAR calculation could clear up some of these issues.

    For instance, are we sure that a player has exactly 2.1 WAR in a season, or is it more like 2.1 +/- 0.3 WAR? If one wanted to go further (beyond just statistical uncertainties), you could use the varying WAR definitions on the web (fWAR, rWAR, etc.) as measures of the systematic uncertainties in the WAR calculation.

    This seems like relatively simple statistical analysis to me – maybe it’s been suggested before?

    Vote -1 Vote +1

    • Blue says:

      OBP in a season as NO error because it is a full and complete description of the events of that season. There is no need for error bands around it because there is no statistical uncertainty to describe.

      Vote -1 Vote +1

      • Blue says:

        “has” no error

        Vote -1 Vote +1

      • X says:

        I think you’ve made a fundamental error of statistics. The sample OBP may well be known exactly, but we are interested in the underlying “true” OBP, which is known imprecisely due to the limited sample size from which we derive the sample OBP. Thus, the “true” OBP has an uncertainty, which we can estimate using Poisson statistics, as pointed out by the OP.

        Vote -1 Vote +1

        • X says:

          Err, I should say our estimate of the true OBP has an uncertainty, not the true OBP itself.

          Vote -1 Vote +1

        • Anon21 says:

          we are interested in the underlying “true” OBP

          No, we are not. Not when constructing a statistic like WAR, which is simply supposed to serve as a descriptive record of what happened.

          Vote -1 Vote +1

        • Blue says:

          A population is not a sample, X. When you describe populations, error terms are not appropriate.

          Vote -1 Vote +1

    • Bryce says:

      I like this idea, but I don’t think it’s as trivial as you imply. What does it mean to put error bars on the number of doubles a player hit? He hit them; error is zero. You could put error bars on the value of a double, or on you prediction of the talent underlying the number of doubles, but those have very different meanings.

      Vote -1 Vote +1

      • Anon21 says:

        Chalk another one up to people who don’t understand the difference between description and prediction, I guess.

        Vote -1 Vote +1

      • Tom H. says:

        Hitting singles (or doubles, or triples, or homeruns) is essentially a Poisson process – discrete, countable events which happen at random intervals, but at some average rate. Thus, when we try to estimate, for example, HR/FB rate, we’re really trying to estimate the Poisson parameter λ of this process. Even if the observed HR/FB rate has no error (i.e., there’s no chance of misclassification), the estimation of the true HR/FB rate certainly has statistical uncertainties.

        Vote -1 Vote +1

        • Blue says:

          Again, the rate has no error and no statistical uncertainty because it is a descriptive statistic that is a full and complete accounting of the entire population of events.

          Vote -1 Vote +1

    • Anon21 says:

      Well, wait. Why would you want error bars on OBP? So far as I’m aware, there is virtually no measurement error associated with OBP, at least when it comes to people who played in the modern era of baseball.

      Vote -1 Vote +1

      • Naveed says:

        There would be no reason to have an error bar for OBP insofar as it’s a descriptive statistic, but when using it to predict future OBP, it might be useful to have error bars in order to make it clear how much predictive value the sample has.

        Vote -1 Vote +1

        • Anon21 says:

          That is never what WAR has been, from Tango’s earliest conceptualization to any of the implementations. You’re positing some different stat.

          Vote -1 Vote +1

        • Baltar says:

          You may be right, but that would be extremely cumbersome. You wouldn’t really want to read an article that showed those extra numbers on every stat that is being used predictively.

          Vote -1 Vote +1

      • Tom H. says:

        OBP as a measurement of what happened certainly has no (significant) error; however, if we’re trying to get an estimator of his true OBP, error bars are certainly appropriate. The 1/sqrt(N) error bars are approximately correct for large sample sizes, but binomial error bars are most appropriate for a rate stat.

        For example: we want to know the true OBP talent level of a certain player. He has had 4 plate appearances, reaching in two of them. In reality, his OBP has been exactly .500, but we know that this is a flawed measure of his true talent level. The 68% (1 σ) confidence interval for his true talent level is (.186, .814). This means we’re 68% confident that his true OBP lies in that interval, based only on the knowledge we have (his 4 PAs) – there’s just not much information. If he, however, had 400 plate appearances and reached in 200 of them, our 68% CI would be (.474, .526) – we would be much more confident.

        In standard baseball notation, both players have a .500 OBP, but we obviously believe the second one (+/- .026 uncertainty) much more than the first one (+/- .314 uncertainty). This helps to quantify this. It’s only meaningful as a predictor, though, not as a measurement. (You also have to assume that his true talent level is fixed, and not varying over time, which is probably roughly true for most players, at least over the course of a season.)

        Vote -1 Vote +1

        • Blue says:

          You’re mixing a couple of very distinct concepts. His “true OBP” is no different that measured OBP–what occured in the season, assuming no measurement errors. That’s very different from creating an estimate of “true talent OBP” that would be expected over a large number of PAs.

          Vote -1 Vote +1

        • Tom H. says:

          I guess we fundamentally disagree then – I indeed believe that true OBP is a distinct quantity from observed OBP. For one thing, true OBP must be able to take any value on the continuum of 0.000 to 1.000, but observed OBP can only take a certain number of discrete values. For example, if a player gets 700 PA in a season, his OBP can only take 701 discrete values – 0/700, 1/700, 2/700, …, 699/700, or 700/700. A player’s true OBP can be defined as the limit of his measured OBP as we approach an infinite number of observations. The fact that we have only a finite number of observations to estimate this true OBP is the origin of the statistical uncertainty we’re trying to measure.

          To summarize: true talent level OBP (or whatever stat: SLG, BA, etc.) is a quantity we can only estimate but never measure with perfect precision; measured OBP is a well-defined, exactly measured quantity, but it only describes a finite number of observed events, not the nature of the underlying distribution which generated those events.

          Vote -1 Vote +1

        • Blue says:

          How is a descriptive statistic of a population not “true”? It is an exact description of what occured!

          Vote -1 Vote +1

        • Anon21 says:

          Tom: What you are talking about is just not OBP. “True talent” is a useful concept in baseball, but mostly it’s useful for predicting future performance. When we look at historical OBP, all we want to know is what it was; the question of whether it was composed of a bunch of dying quails or hard line drives is just totally irrelevant to measuring its impact on the outcome of the games that have been played.

          Vote -1 Vote +1

    • Tom H. says:

      I guess this whole subthread is caught up on whether we want to describe what happened or to estimate the true talent of a player. I come from a more statistical background, so I prefer the latter.

      Here’s an example: if a rookie comes up and gets 3 hits in 10 at-bats, he is hitting .300 – that’s the descriptive rate, and it has no error bars. However, it will take a lot more than 10 at-bats before I’m ready to say that he’s a .300 hitter – the error bars on his true talent level (for batting average) are too big for me to confidently say that. They’re complementary interpretations, not competing or exclusive ones.

      We do, subconsciously, apply error bars to most stats we see, however. We cut the triple-slash stats off at 3 digits because that’s roughly the level at which those rate stats fluctuate during a full season. We cut WAR off after one decimal place because we realize there is estimation in the calculation, and it would be disingenuous to say “Mike Trout had 10.03857 WAR in 2012″ because we just don’t know it that precisely. I’m just proposing that these uncertainties be quantified a little better.

      Vote -1 Vote +1

      • Blue says:

        You’re making a huge assumption that some of us don’t “come from a more statistical background.”

        Vote -1 Vote +1

        • Tom H. says:

          I apologize – I didn’t mean any offense. I only meant that in my real life, I’m a scientist who performs statistical analysis for a living, so I have perhaps a more “scientific” or rigorous view of what statistical analysis actually means.

          Vote -1 Vote +1

        • Blue says:

          And in my real life I have a copy of SAS on my work machine and many, many statistical programs I’ve written to tease out information from huge data sets of various populations.

          Vote -1 Vote +1

    • Tom H. says:

      We also implicitly include uncertainties by requiring a minimum number of plate appearances in season-long awards like batting titles. We require a player to have more than about 500 plate appearances to qualify because we know that, for a small sample size, statistical fluctuations are much more important and can inflate rate statistics beyond sustainable levels (which is roughly what I mean by “true” talent levels).

      Vote -1 Vote +1

      • TKDC says:

        Tom, do you not care about J D’s 56 game hitting streak because it was statistically implausible given his true talent level? Would a modern day .400 hitter not matter to you if he had an inflated babip and therefore his achievement involved substantial luck? If your answers to these questions are yes, I wonder why you like baseball. People honestly really do care about statistics that measure what actually happened. In fact, estimates of true talent are almost always used to project future performance, not to inflate or deflate previous performance.

        Vote -1 Vote +1

        • Tom H. says:

          I don’t begrudge anyone the ability to enjoy the game in their own way. And yes, I would be excited for any of those records to be broken, but probably more for the history of it than for the pure statistical improbability.

          I’m not saying that descriptive statistics are wrong, or bad, or meaningless – just that there’s another way to look at these things that I think would be fun and interesting.

          (I did not expect to be having to write an argument like that, on FanGraphs of all places. Is this 2013 or 2003?)

          Vote -1 Vote +1

        • YanksFanInBeantown says:

          Well, graphs are descriptive, are they not?

          Vote -1 Vote +1

  20. Bryce says:

    This is a change for the better. Thanks.

    Vote -1 Vote +1

  21. GWR says:

    This is great. Are you going to update the glossary pages so they will be consistent with the new replacement level? Also when I was looking at the glossary pages I noticed that you use a different replacement level for starters vs. relievers and i was just wondering what the new replacement level for relievers is? On a related note Dave Cameron might not want to use RA Dickey as the “walking example” of a replacement level pitcher in any updated explanations of replacement level. http://www.fangraphs.com/blogs/index.php/pitcher-win-values-explained-part-three/

    Vote -1 Vote +1

  22. Stan says:

    Sean Smith clearly had a tough time in the minors. Poor guy never made it to the bigs.

    Vote -1 Vote +1

  23. Big Jgke says:

    Replacement level is dead! Long live replacement level!

    Vote -1 Vote +1

  24. brad says:

    I’d love an article on players whose relative rankings are most affected. How much of the difference in Jack Morris’s all-time rank is resolved? Old rankings 145b/75f vs…?

    Vote -1 Vote +1

  25. Steve Jeltz says:

    Poor Jamie Moyer. 269 Wins just ain’t what it used to be.

    Vote -1 Vote +1

  26. chuckb says:

    Great work! My gut reaction to David Appelman’s post was concern but, after reading this, I really like what you and the Seans have done.

    Vote -1 Vote +1

  27. db says:

    Now maybe Dave can admit that FIP for pitchers is a dumber metric than ERA (or RA). Or maybe we can use batted ball profiles to do hitter war.

    Vote -1 Vote +1

    • Baltar says:

      Your comment is sort of double-dumb. There are good reasons for using FIP rather than ERA, which I won’t go into.
      And I would love to have some sort of analysis of a player’s batted balls to use in place of whether they happened to fall in for hits or not. That day may not be far off.

      Vote -1 Vote +1

    • YanksFanInBeantown says:

      It’s nice to have both. Even if I do prefer bWAR for pitchers.

      Vote -1 Vote +1

  28. Joe Peta says:

    I am so happy that this is being done, and even if Caple’s article earlier this year is getting credit for the catalyst and starting the discussion, I still have to call attention to this article that I wrote in December, 2012 — which as cited in the piece, was inspired by Sam Miller at BP: http://tradingbases.squarespace.com/blog/2012/12/17/lets-level-the-replacement-level-playing-field.html

    Vote -1 Vote +1

  29. Kyle says:

    WAR was so last year. I like that someone ripping on WAR had to point out to you nerds that your model was flawed. FG says themselves it’s a general stat or a big hammer or something. And hey, you can add or subtract a win depending on what you think of the defensive value. So why try to make it exact? Keep tinkering with it and it will never have any credibility. I’ll always know what a RBI is even if it doesn’t tell me anything.

    -6 Vote -1 Vote +1

    • hossenfefer says:

      I feel like you’re missing out on what’s going on here. Everyone knows what an RBI is. These sites are just trying to help us understand the game better. You don’t HAVE to use WAR when you’re having a conversation with your buddy about who the better baseball player is. I wouldn’t say “You know, Andrew McCutchen had a better year than Josh Willingham because McCutchen had 3 more WAR.” That’s a lame conversation. But that’s better than saying “Yeah, I think Willingham was better because he had 14 more RBI.”
      My point is, why would you ever talk about RBI when there are so many better things to bring up. All RBI are good for is for me to get excited when the Twins get one and to get bummed out when it haoppens against the Twins.
      Why would you NOT want a statistic to tell you something, to help you understand the game better.
      And, as far as tinkering with WAR. What’s wrong with improving something? I don’t understand why you’d slag on something that is trying to get better? That’s just weird.

      Vote -1 Vote +1

    • Paul says:

      This is nearly perfect satire, except I believe the first sentence should be, “WAR *is* so last year.” Nice job.

      Vote -1 Vote +1

  30. Mike Green says:

    Bravo. Whenever you compromise and arrive where Tom Tango is, it’s a good indication that you have done something right.

    For those too young to remember Alfredo Griffin, he was a co-winner of the AL Rookie of the Year award after hitting .287 with (a career-high) 40 walks and 21 steals (with 16 CS) as a 21 year old shortstop. If you look where he and Ozzie Smith were at age 21 and see where they ended up, you might think that smarts matter. You would be right.

    Vote -1 Vote +1

  31. Kevin says:

    Can someone tell me how they calculate WAR for retired players? I thought they needed to use the data from the Sportvision technology to determine the distribution of balls for the UZR and UBR calculations?

    Vote -1 Vote +1

  32. blindbuddysirraf says:

    I may be mistaken, but if the problem with Jack Morris’ WAR differential was only a baseline replacement level issue, wouldn’t his career WAR ranking be closer to the other site. This says he jumped up 70 pitchers because of a difference in baseline. Wouldn’t all 70 of those pitchers career WAR jump as well, if the baseline was the only problem?

    Vote -1 Vote +1

    • Joe Peta says:

      Remember “bbs”, WAR is a cumulative stat. So if the baseline is moved (say lowered) everyone’s WAR does increase but a player who has played more seasons than others will have his career WAR jump more.

      Vote -1 Vote +1

  33. Clave says:

    The primary takeaway for me was that Tom Tango is pretty much always right.

    Vote -1 Vote +1

  34. Ray A. says:

    I believe this marks the first time that anything Jim Caple wrote contributed positively to the game of baseball.

    Vote -1 Vote +1

  35. adohaj says:

    So players like Rickey Henderson and Pete Rose lost 14-19 WAR, and a player like Joe DiMaggio lost about 7-8 war? And the inverse for Bbref?

    Vote -1 Vote +1

  36. Bip says:

    I always thought fangraphs WAR was a little inflated compared to baseball-reference. Or of course that B-ref was a little deflated compared to fangraphs, as neither was clearly right or wrong about replacement level.

    Vote -1 Vote +1

  37. Forrest Gumption says:

    Man, Alfredo Griffin was pretty bad for a long time.

    Vote -1 Vote +1

  38. Choo says:

    The question now is not if, but when the Unification of Replacement Level will be commemorated on a US postal stamp or massive oil painting displayed at The Metropolitan Museum of Art. This is some big-time forefathers shit right here.

    Vote -1 Vote +1

  39. Neil says:

    I like this, but it’s freaking me out that I woke up today and everyone’s WAR is slightly different.

    Vote -1 Vote +1

  40. Joshy says:

    Cool, that explains Ricky Romero’s huge difference for his 2011 season. Fangraphs 2.4, Baseball Reference 6.3.! Are there any larger differences?

    Vote -1 Vote +1

    • Patrick says:

      But it doesn’t, because his B-R WAR went up and his Fangraph’s down from this change. This difference is due to the different calculations.

      Vote -1 Vote +1

  41. So to summarize Mike Trout > combined Astros roster.

    According to Cot’s, the Astros 2012 opening payroll was around $60.8M. Given that Trout was 10.2/7.3 = 1.4 times better than the Astros, Boras should ask for a contract with an AAV of $60.8M * 1.4 = $85.1M. I’m thinking 10/850 is a good starting point.

    Vote -1 Vote +1

  42. rubesandbabes says:

    Peter repays Paul?

    One underlying problem with all the silly Stat Separatism is that the very best few understanderers of all this are getting paid for it,

    and keeping quiet.

    But okay, at least some good news here.

    Vote -1 Vote +1

  43. TheSinators says:

    Has this change already taken place? Or does Randy Johnson really have a career WAR of 110?

    Vote -1 Vote +1

  44. dolbear65 says:

    There is still a huge difference between fangraphs WAR and Baseball Reference WAR in some cases. Why?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>