Unifying Replacement Level

On Christmas Eve of 2008, David Appelman gave the world a present – “win values” on the pages of FanGraphs. It wasn’t labeled WAR for a little while longer, though it was an implementation of the model Tom Tango laid out at The Book Blog a few months prior. Over these last four years, the model has become quite popular, and even those who are not fans of analytics know what WAR stands for. Over time, the model grew in popularity, and in 2010, Baseball Reference added it to their collection of statistics. Because WAR is essentially a model of player value, there are decisions that have to be made about the way it is constructed that don’t have an obviously correct answer. In places where we had made one decision, Sean Forman (and Sean Smith, who assisted with their original implementation) made some other decisions, and the calculations differ in some significant ways.

We know that this is a source of frustration for some folks, having two sites both publicly display different calculations for a statistic of the same name. Often, the differences between the two have been used to discredit the entire model. For instance, Jim Caple wrote this on ESPN.com a few months back:

Actually, we know it isn’t always accurate because depending on your source — FanGraphs or Baseball-reference.com — you can get wildly different WAR scores… For example:

Does (Jack) Morris, in fact, belong in the Hall of Fame? No, he doesn’t, according to baseball-reference.com, which gives him a WAR score of 39.3, tied for 145th all time among pitchers. Maybe he does, according to FanGraphs, which gives him a 56.9 WAR, 75th all time.

When Caple wrote it, I wasn’t exactly sure why Morris’ value differed so much, but since we measure pitching in very different ways, I assumed that the 17.6 win gap was due to some differences between Morris’ FIP and his runs allowed. But, then, I looked it up, and Morris’ career ERA (3.90) was almost an exact match for his FIP (3.94). Adjusted for park, Morris’ career FIP- was 97, while his ratio of RA9 to league average on Baseball-Reference is 96. Even with very different inputs, both models came to the same conclusion about Morris – he was a slightly above average pitcher who had a very long career. So, why did we give him credit for an additional 17.6 wins?

The answer, quite simply, lies with replacement level. Our model used a lower baseline than Baseball-Reference did, so the same performance would result in a higher WAR in our model than in theirs. Over very long careers — like Morris’, for instance, or many of the old time pitchers who threw forever — this could really begin to add up, and give the appearance of large disagreements when the two systems didn’t actually see things all that differently. In the case of guys with substantial careers, many of the large discrepancies were simply driven by the fact that the two sites had a different definition of replacement level.

After reading Caple’s article, David Appelman and I began discussing the idea of reaching out to Sean Forman and seeing if he was interested in agreeing to a unified replacement level. Before we could actually even send that email, Sean reached out to us with the exact same idea. And so, today, we’re pleased to announce that Baseball-Reference and FanGraphs have adopted that unified replacement level, allowing our two models to now measure players on the same scale.

As David noted a few minutes ago, this new unified replacement level is now set at 1,000 WAR per 2,430 Major League games, which is the number of wins available in a 162 game season played by 30 teams. Or, an easier way to put it is that our new replacement level is now equal to a .294 winning percentage, which works out to 47.7 wins over a full season. Conveniently, this number is almost exactly halfway in between our previous replacement level (.265) and Baseball-Reference’s previous replacement level (.320), though the number wasn’t chosen solely as an equal compromise.

In Tango’s original methodology post back in 2008, the model he laid out used a replacement level equal to 1,009 wins, or a .292 winning percentage, so this is in essence a return to WAR’s roots. In that post, Tango notes:

Replacement is defined very specifically for my purposes: it’s the talent level for which you would pay the minimum salary on the open market, or for which you can obtain at minimal cost in a trade.”

There are a variety of ways you can measure what kind of expected performance you might get from a replacement level player. A few months ago, I looked at the performance of position players who were acquired via minor league contract or waiver claim this winter, and over the last two seasons, those 24 players had accumulated almost exactly zero WAR in over 10,000 plate appearances. So, that suggests that the baseline has always been in the right neighborhood, at least.

That’s not the only way to figure out where replacement level should be, however. We can also look at the worst performances of players who have long Major League careers, and see what the minimum level of production teams have required in order to keep a player in the league for an extended number of years rather than simply swapping them out for someone else. Major League teams don’t always evaluate talent perfectly, but if they were continually employing players that were below our established replacement level for 10 to 15 years, it would be a pretty good sign that our replacement level was too high, and that they couldn’t simply replace these guys with someone better with minimal effort.

That’s not what we see, however. If you use .294 as the replacement level, 627 of the 628 players with at least 6,000 Major League plate appearances — that is, the equivalent of 10 full seasons of regular playing time — have a career WAR north of 0.0. The only player who falls below replacement level with this baseline is Alfredo Griffin, coming in at -1.0 WAR in 7,331 plate appearances, which works out to -0.08 WAR per full season. For all intents and purposes, that’s zero.

You can calculate replacement level a number of different ways, but in the end, it always leads back to a number in this vicinity. Baseball-Reference arrived at a number a little higher than what Tango had used, while we came up with one a little lower. Because they were at opposite ends of the defensible spectrum, the different baselines gave a false sense of difference in the actual calculations. Now, with an agreed upon replacement level, those differences that are solely due to scale will go away.

The net effect of this change is that players will get a little less WAR per season in our method (and a little more in B-R’s) than they used to. On an individual season level, you’re barely going to notice the shifts. For instance, Mike Trout‘s career +10.8 WAR in 774 plate appearances under our old calculation will become +10.7 WAR with the new changes. However, at the completed career level, you’re going to see some bigger drops. Luis Aparicio, with his 11,230 career plate appearances, drops 14.2 WAR, going from +63.5 down to +49.3. Likewise, Hank Aaron, Brooks Robinson, and Carl Yastrzemski all lose 14 WAR off their career totals. Long career players take the largest hit, as you would expect.

The higher baseline brings our scale down slightly, but we think that change is worth making, as a unified replacement level will allow for comparisons of our apples versus their apples, and will eliminate needless confusion based around an area that didn’t need to cause confusion. These changes weren’t made lightly, and we know that there is always some resistance to any sort of change, but we hope that you see the unification of replacement level between the two sites as a positive overall.

While there will never be one single agreed upon WAR calculation — I’d call that a feature and not a bug, but that’s another post — the common baseline will give us a better opportunity to explore where the real differences are, rather than being tricked into seeing big gaps where none actually exist.

So, that’s the short version of the story behind this change. We’ll have more on this going forward, including a post coming later this afternoon on why we need replacement level to begin with, but for now, we hope you guys see this as a step forward for WAR as a metric.



Print This Post



Dave is the Managing Editor of FanGraphs.


Sort by:   newest | oldest | most voted
Tomcat
Guest
Tomcat
3 years 1 month ago

Does this improve the case of someone like say Larry Walker?

Bookbook
Guest
Bookbook
3 years 1 month ago

Probably not much, since the folks keeping him out aren’t relying on fWAR.

Hamba
Member
Hamba
3 years 1 month ago

It still may improve the case, while not improving his chances.

Hamba
Member
Hamba
3 years 1 month ago

I think I need to change my name…

Frank's Wild Years
Guest
Frank's Wild Years
3 years 1 month ago

better?

tomplatypus
Guest
tomplatypus
3 years 1 month ago

how about aaron sele?

Rays' 7th Infielder
Guest
3 years 1 month ago

If replacement level is set at approx 48 wins a season, and the Astros won 55 games last year…

Oh my.

At least they’d make for a semi-competitive AAA team.

Samuel
Member
Samuel
3 years 1 month ago

I feel like this change makes being a baseball nerd a lot easier.

Samuel
Member
Samuel
3 years 1 month ago

And when can we expect these changes to be made on the website(s)?

Blue
Guest
Blue
3 years 1 month ago

It’s definitely a step forward–but the statistic still has a whole lot of false precision which mere standardization of the baseline won’t fix. Mixing the poorly measured defense value (to say nothing of the baserunning hack) and giving players credit for positional adjustments that are, at best, heroic assumptions, in with the well-described offense variables leads to a measure that is full of a whole lot of mush.

mickeyg13
Member
3 years 1 month ago

The false precision is a problem with the *users* of the statistic, not the statistic itself. It’s not the fault of WAR if some writer mistakenly draws a strong conclusion from a 4.2 WAR over a 4.1.

Tomcat
Guest
Tomcat
3 years 1 month ago

That is like saying the that since a Hammer can’t cut plywood it isn’t a good tool or that since you saw a guy using a hammer to put drywall screws in that there is a flaw with hammers. WAR has flaws and should be used as a conversation starter not ender.

Cguudgyrdycjvhkj
Guest
Cguudgyrdycjvhkj
3 years 1 month ago

You are saying WAR is the equivalent of saying, “pretty nice weather today,” which is not much different then Blue’s comment.

A hammer works perfectly to drive nails or claw them out, what does WAR do (beside signal that you use imprecise aggregates to start conversations).

Eric R
Guest
Eric R
3 years 1 month ago

What is your preferred method for evaluating MLB talent? Using that lets build what would be considered the best possible roster from MLB talent [lets say based on 2012 stats].

Then lets build another team using fWAR.

I’m sure the two teams will have a lot in common [unless the stat you choose is (H-HR+SB)/AB or some other bizarre metric]. If the two teams end up very similar, then it would certainly seem that WAR isn’t quite as terrible as you seem to think. Comparing the players who are not common would at least be very interesting.

Frank's Wild Years
Guest
Frank's Wild Years
3 years 1 month ago

It establishes a baseline to measure player value, you can chose to trust things like park factors and defensive metrics more or less than the model does.

That Guy
Guest
That Guy
3 years 1 month ago

The hammer would be a terrible way to install drywall screws, and you certainly wouldn’t want to ‘claw them out’ once installed.

TKDC
Guest
TKDC
3 years 1 month ago

OPS has been cited for years (decades?), but what the hell does it mean? Should we have stuck with batting average until something like wOBA came along? Is wOBA good enough? Should we revert back to batting average until we are 100% positive about the linear weights in wOBA?

Baltar
Guest
Baltar
3 years 1 month ago

It’s easy to point out flaws in WAR, but what’s the point? We know it’s a rough indication of a player’s value, but it’s a pretty good one and it’s objective.
If you have a better stat to do that job, please reveal it to us. If you have proven ways to improve WAR, please reveal them to us.
Otherwise, shut up.

Price enforcer
Guest
Price enforcer
3 years 1 month ago

“Shut up” is an awesome suggestion to someone who wrote something on the internet.

Oh, Beepy
Guest
Oh, Beepy
3 years 1 month ago

They could be instructing you to close up your laptop and go watch some baseball.

Bill but not Ted
Member
3 years 1 month ago

Again it is important to understand the flaws in the way we measure things.

commenter #1
Guest
commenter #1
3 years 1 month ago

We’ll be eagerly awaiting your more correct system’s publication on your website.

Jeff T
Guest
Jeff T
3 years 1 month ago

Glad to see this open collaboration between these two giants in the Sabermetric field. . . .

Bob
Guest
Bob
3 years 1 month ago

Dave – while you’re collaborating with Sean, it might be helpful to list out the other differences between the two versions of WAR, maybe on the glossary page? I know FIP v. ERA is probably the biggest difference.

Sky
Guest
3 years 1 month ago

B-Ref has a great summary of differences between many WAR systems:

http://www.baseball-reference.com/about/war_explained_comparison.shtml

zenbitz
Guest
zenbitz
3 years 1 month ago

I posted on BBTF as well, but I think this would be a good opportunity to for both BBREF and FG to report multiple WARs based on the different partitioning of credit between fielders and pitchers:

FIP-WAR (fWAR) – batted balls are 100% fielder
RA-WAR – batted balls are 100% pitcher
ERA-WAR – batted balls are 98% pitcher (mlb fielding % is about .98)
bWAR – whatever partitioning bbref uses to get the “middling” number.

This would show that, in fact, the two sites are using the exact same numbers for RV AND would illustrate the kind of assumptions that go into creating a WAR stat and demystify the stat a bit.

David Appelman
Admin
Member
3 years 1 month ago

We do have RA9-Wins (which is RA-WAR) on the site and have for a while. We also have BIP-Wins (portion of wins due to balls in play) and LOB-Wins (portion of wins because of stranded runners & misc other stuff), and finally FDP-Wins, which is the difference between RA9-Wins and WAR.

Hurtlockertwo
Guest
Hurtlockertwo
3 years 1 month ago

Are the total WAR for players on both sites now updated??

Dave S
Guest
Dave S
3 years 1 month ago

bravo.

James
Guest
James
3 years 1 month ago

So are you guys using the same defensive metrics now as well?

Anon21
Guest
Anon21
3 years 1 month ago

So are you guys using the same defensive metrics now as well?

No. That’s one of the things that Dave’s getting at when he says “there will never be one single agreed upon WAR calculation” and “the common baseline will give us a better opportunity to explore where the real differences are.” Nothing about the way either site calculates WAR is changing, they’re just now starting from a common baseline.

Manifunk
Guest
Manifunk
3 years 1 month ago

Oh good, this makes the shallow “just add up the WARs!” style of analysis which has sadly become the norm around here that much easier

Anon21
Guest
Anon21
3 years 1 month ago

Jesus, you’re really worked up about this nonexistent problem, huh? Try reading some of the 98% of Fangraphs articles that aren’t positional power rankings, you dumb whiner.

Hamba
Member
Hamba
3 years 1 month ago

Or try a different site. All they did was create a common baseline for a statistic to make that particular aspect of the site better. If you don’t like it then you know what you can do.

Brian
Guest
Brian
3 years 1 month ago

I’m a bit confused on the math. How are we getting a .294 win percentage on 1000 WAR per 2430 available wins? I must be missing something.

Brian
Guest
Brian
3 years 1 month ago

never mind. I got it now.

FJ
Guest
FJ
3 years 1 month ago

I agree it’s not very clear about the explanation.

2430 is the number of wins in a full season amongst all teams you need to be .500. (2430-2430).

1000 WAR is what’s needed to get to that hypothetical .500 season. So, replacement team level is at 1430 (2430 – 1000) wins.

1430/4860 = .294

Hence a replacement level team has a .294 win percentage.

Urban Shocker
Guest
Urban Shocker
3 years 1 month ago

Just to make sure I have this right: so a team needs 33.3 WAR in the aggregate to reach .500? (1000 WAR divided by 30 teams).

siggian
Guest
siggian
3 years 1 month ago

Well, 33.3 WAR + 47.7 replacement level wins = 81

So, yeah, it seems that way.

Urban Shocker
Guest
Urban Shocker
3 years 1 month ago

Fancy. Thanks for the explanation Siggian, that clears up a lot.

That’s not quite what you see on the positional power rankings though, where it looks like an aggregate 38-39 WAR is what it takes to get to 80 wins. any guesses?

Baltar
Guest
Baltar
3 years 1 month ago

I do have guesses on that Urban.
To begin with, that referred to the previous version of WAR, with a lower replacement level.
I’m guessing the remaining adjustment had to do with injuries and other unknowns, which FanGraphs rightly did not attempt to predict in its rankings.
The total number of team wins had to come out correct, so an adjustment was made.

chasfh
Guest
chasfh
3 years 1 month ago

Why is replacement level set at 1,000 WAR per 2,430 games? What is the genesis of that number?

Pinstripe Wizard
Member
3 years 1 month ago

I would assume it is because it is very close to Tom Tango’s original number of 1009 wins and it is a round number. Who doesn’t like round numbers?

Darren
Guest
Darren
3 years 1 month ago

BRAVO!. I have been waiting for this for a long time, and the decision to make it a clean 1000 will make it more simple for the casual reader while still making it accurate and reasonable. Thanks David, Dave and Sean.

I heard that BPro was also considering unifying their replacement level to your sites as well. Is that happening.

Clutch Narrative
Member
Clutch Narrative
3 years 1 month ago

Baseball Prospectus differs not only on replacement level, but also on run per win.

tz
Guest
tz
3 years 1 month ago

I guess we now can use the “Griffin line” as the career equivalent of the “Mendoza line”

tz
Guest
tz
3 years 1 month ago

and btw, love the change here and at baseball reference!

Baltar
Guest
Baltar
3 years 1 month ago

LOL!

Kiss my Go Nats
Guest
Kiss my Go Nats
3 years 1 month ago

poor Alfredo, he will be forever be known as the player behind the Alfred Line.

MarinersFan000
Member
MarinersFan000
3 years 1 month ago

Just out of curiosity does anyone know what version of WAR espn uses on their site? Just wondering if the numbers there would be making the adjustment as well or if those numbers would still reflect a different baseline.

momomoses7
Member
momomoses7
3 years 1 month ago

I believe ESPN uses bWAR

Clutch Narrative
Member
Clutch Narrative
3 years 1 month ago

ESPN uses Baseball Reference metrics, including WAR.

JeffD
Guest
JeffD
3 years 1 month ago

Playing the d’s advocate here: isn’t this simply a case of the two WAR peeps getting together so that there can’t be any more, “But the two WAR peeps can’t even come up with the same WAR?” or something similar?

agam22
Guest
agam22
3 years 1 month ago

But there are still differences in the calculations that, as Dave says, should be considered a feature as they can tell you different things about different players. This is just putting both stats on the same scale to make comparisons easier

chuckb
Guest
chuckb
3 years 1 month ago

They’re only getting together on the value of replacement level. They’re not creating 1 unified WAR that will become THE WAR calculation. That was my initial concern but Dave’s explanation here alleviated that.

jfree
Member
jfree
3 years 1 month ago

JAW JAW is always better than WAR WAR. Now that there’s only one WAR, the saber rattlers can focus on JAW JAW.

jfree
Member
jfree
3 years 1 month ago

Yeesh. Meant to say — Now that the WAR War’s over, the saber rattlers can focus on JAW JAW.

Caveman Jones
Guest
Caveman Jones
3 years 1 month ago

Is there any way we can get a list of the players who lost the most WAR due to the change in baseline?

Bryce
Member
Bryce
3 years 1 month ago

Just sort by career PA or IP.

kdm628496
Member
kdm628496
3 years 1 month ago

does this mean that the cohort of replacement-level players you investigated will now produce a larger negative WAR?

Baltar
Guest
Baltar
3 years 1 month ago

Yes, they will.

Jason
Guest
Jason
3 years 1 month ago

Hey Dave, can you go back and fix all the articles written in the last five years? Thanks.

gouis
Guest
gouis
3 years 1 month ago

Hey Jason,

Nothing changes because everyone goes down by the same amount. So in reality nothing changes except the exact numbers.

chuckb
Guest
chuckb
3 years 1 month ago

Everyone doesn’t change by the same amount. The replacement baseline changes for everyone but that doesn’t affect everyone’s WAR calculation equally.

Baltar
Guest
Baltar
3 years 1 month ago

LOL!
That thought occurred to me, not just humouresly but seriously. I was thinking of the recent rankings series and whether they would correct it, then realized that if they did that, why not everything? Then the enormity of the task knocked the silly out of me.

Tom H.
Guest
Tom H.
3 years 1 month ago

I think it would be great to add (statistical) uncertainties into baseball stats. For example, if a player had a .400 OBP in 600 plate appearances, the rough statistical error would be 1/sqrt(N) ~ 0.041, so you could quote his OBP as .400 +/- 0.041. The same could be done for any rate stat (or counting stat, if you’re careful). Propagating these errors through the WAR calculation could clear up some of these issues.

For instance, are we sure that a player has exactly 2.1 WAR in a season, or is it more like 2.1 +/- 0.3 WAR? If one wanted to go further (beyond just statistical uncertainties), you could use the varying WAR definitions on the web (fWAR, rWAR, etc.) as measures of the systematic uncertainties in the WAR calculation.

This seems like relatively simple statistical analysis to me – maybe it’s been suggested before?

Blue
Guest
Blue
3 years 1 month ago

OBP in a season as NO error because it is a full and complete description of the events of that season. There is no need for error bands around it because there is no statistical uncertainty to describe.

Blue
Guest
Blue
3 years 1 month ago

“has” no error

X
Guest
X
3 years 1 month ago

I think you’ve made a fundamental error of statistics. The sample OBP may well be known exactly, but we are interested in the underlying “true” OBP, which is known imprecisely due to the limited sample size from which we derive the sample OBP. Thus, the “true” OBP has an uncertainty, which we can estimate using Poisson statistics, as pointed out by the OP.

X
Guest
X
3 years 1 month ago

Err, I should say our estimate of the true OBP has an uncertainty, not the true OBP itself.

Anon21
Guest
Anon21
3 years 1 month ago

we are interested in the underlying “true” OBP

No, we are not. Not when constructing a statistic like WAR, which is simply supposed to serve as a descriptive record of what happened.

Blue
Guest
Blue
3 years 1 month ago

A population is not a sample, X. When you describe populations, error terms are not appropriate.

Bryce
Member
Bryce
3 years 1 month ago

I like this idea, but I don’t think it’s as trivial as you imply. What does it mean to put error bars on the number of doubles a player hit? He hit them; error is zero. You could put error bars on the value of a double, or on you prediction of the talent underlying the number of doubles, but those have very different meanings.

Anon21
Guest
Anon21
3 years 1 month ago

Chalk another one up to people who don’t understand the difference between description and prediction, I guess.

Tom H.
Guest
Tom H.
3 years 1 month ago

Hitting singles (or doubles, or triples, or homeruns) is essentially a Poisson process – discrete, countable events which happen at random intervals, but at some average rate. Thus, when we try to estimate, for example, HR/FB rate, we’re really trying to estimate the Poisson parameter λ of this process. Even if the observed HR/FB rate has no error (i.e., there’s no chance of misclassification), the estimation of the true HR/FB rate certainly has statistical uncertainties.

Blue
Guest
Blue
3 years 1 month ago

Again, the rate has no error and no statistical uncertainty because it is a descriptive statistic that is a full and complete accounting of the entire population of events.

Anon21
Guest
Anon21
3 years 1 month ago

Well, wait. Why would you want error bars on OBP? So far as I’m aware, there is virtually no measurement error associated with OBP, at least when it comes to people who played in the modern era of baseball.

Naveed
Guest
Naveed
3 years 1 month ago

There would be no reason to have an error bar for OBP insofar as it’s a descriptive statistic, but when using it to predict future OBP, it might be useful to have error bars in order to make it clear how much predictive value the sample has.

Anon21
Guest
Anon21
3 years 1 month ago

That is never what WAR has been, from Tango’s earliest conceptualization to any of the implementations. You’re positing some different stat.

Baltar
Guest
Baltar
3 years 1 month ago

You may be right, but that would be extremely cumbersome. You wouldn’t really want to read an article that showed those extra numbers on every stat that is being used predictively.

Tom H.
Guest
Tom H.
3 years 1 month ago

OBP as a measurement of what happened certainly has no (significant) error; however, if we’re trying to get an estimator of his true OBP, error bars are certainly appropriate. The 1/sqrt(N) error bars are approximately correct for large sample sizes, but binomial error bars are most appropriate for a rate stat.

For example: we want to know the true OBP talent level of a certain player. He has had 4 plate appearances, reaching in two of them. In reality, his OBP has been exactly .500, but we know that this is a flawed measure of his true talent level. The 68% (1 σ) confidence interval for his true talent level is (.186, .814). This means we’re 68% confident that his true OBP lies in that interval, based only on the knowledge we have (his 4 PAs) – there’s just not much information. If he, however, had 400 plate appearances and reached in 200 of them, our 68% CI would be (.474, .526) – we would be much more confident.

In standard baseball notation, both players have a .500 OBP, but we obviously believe the second one (+/- .026 uncertainty) much more than the first one (+/- .314 uncertainty). This helps to quantify this. It’s only meaningful as a predictor, though, not as a measurement. (You also have to assume that his true talent level is fixed, and not varying over time, which is probably roughly true for most players, at least over the course of a season.)

Blue
Guest
Blue
3 years 1 month ago

You’re mixing a couple of very distinct concepts. His “true OBP” is no different that measured OBP–what occured in the season, assuming no measurement errors. That’s very different from creating an estimate of “true talent OBP” that would be expected over a large number of PAs.

Tom H.
Guest
Tom H.
3 years 1 month ago

I guess we fundamentally disagree then – I indeed believe that true OBP is a distinct quantity from observed OBP. For one thing, true OBP must be able to take any value on the continuum of 0.000 to 1.000, but observed OBP can only take a certain number of discrete values. For example, if a player gets 700 PA in a season, his OBP can only take 701 discrete values – 0/700, 1/700, 2/700, …, 699/700, or 700/700. A player’s true OBP can be defined as the limit of his measured OBP as we approach an infinite number of observations. The fact that we have only a finite number of observations to estimate this true OBP is the origin of the statistical uncertainty we’re trying to measure.

To summarize: true talent level OBP (or whatever stat: SLG, BA, etc.) is a quantity we can only estimate but never measure with perfect precision; measured OBP is a well-defined, exactly measured quantity, but it only describes a finite number of observed events, not the nature of the underlying distribution which generated those events.

Blue
Guest
Blue
3 years 1 month ago

How is a descriptive statistic of a population not “true”? It is an exact description of what occured!

Anon21
Guest
Anon21
3 years 1 month ago

Tom: What you are talking about is just not OBP. “True talent” is a useful concept in baseball, but mostly it’s useful for predicting future performance. When we look at historical OBP, all we want to know is what it was; the question of whether it was composed of a bunch of dying quails or hard line drives is just totally irrelevant to measuring its impact on the outcome of the games that have been played.

Tom H.
Guest
Tom H.
3 years 1 month ago

I guess this whole subthread is caught up on whether we want to describe what happened or to estimate the true talent of a player. I come from a more statistical background, so I prefer the latter.

Here’s an example: if a rookie comes up and gets 3 hits in 10 at-bats, he is hitting .300 – that’s the descriptive rate, and it has no error bars. However, it will take a lot more than 10 at-bats before I’m ready to say that he’s a .300 hitter – the error bars on his true talent level (for batting average) are too big for me to confidently say that. They’re complementary interpretations, not competing or exclusive ones.

We do, subconsciously, apply error bars to most stats we see, however. We cut the triple-slash stats off at 3 digits because that’s roughly the level at which those rate stats fluctuate during a full season. We cut WAR off after one decimal place because we realize there is estimation in the calculation, and it would be disingenuous to say “Mike Trout had 10.03857 WAR in 2012” because we just don’t know it that precisely. I’m just proposing that these uncertainties be quantified a little better.

Blue
Guest
Blue
3 years 1 month ago

You’re making a huge assumption that some of us don’t “come from a more statistical background.”

Tom H.
Guest
Tom H.
3 years 1 month ago

I apologize – I didn’t mean any offense. I only meant that in my real life, I’m a scientist who performs statistical analysis for a living, so I have perhaps a more “scientific” or rigorous view of what statistical analysis actually means.

Blue
Guest
Blue
3 years 1 month ago

And in my real life I have a copy of SAS on my work machine and many, many statistical programs I’ve written to tease out information from huge data sets of various populations.

Tom H.
Guest
Tom H.
3 years 1 month ago

We also implicitly include uncertainties by requiring a minimum number of plate appearances in season-long awards like batting titles. We require a player to have more than about 500 plate appearances to qualify because we know that, for a small sample size, statistical fluctuations are much more important and can inflate rate statistics beyond sustainable levels (which is roughly what I mean by “true” talent levels).

TKDC
Guest
TKDC
3 years 1 month ago

Tom, do you not care about J D’s 56 game hitting streak because it was statistically implausible given his true talent level? Would a modern day .400 hitter not matter to you if he had an inflated babip and therefore his achievement involved substantial luck? If your answers to these questions are yes, I wonder why you like baseball. People honestly really do care about statistics that measure what actually happened. In fact, estimates of true talent are almost always used to project future performance, not to inflate or deflate previous performance.

Tom H.
Guest
Tom H.
3 years 1 month ago

I don’t begrudge anyone the ability to enjoy the game in their own way. And yes, I would be excited for any of those records to be broken, but probably more for the history of it than for the pure statistical improbability.

I’m not saying that descriptive statistics are wrong, or bad, or meaningless – just that there’s another way to look at these things that I think would be fun and interesting.

(I did not expect to be having to write an argument like that, on FanGraphs of all places. Is this 2013 or 2003?)

YanksFanInBeantown
Guest
YanksFanInBeantown
3 years 1 month ago

Well, graphs are descriptive, are they not?

Bryce
Member
Bryce
3 years 1 month ago

This is a change for the better. Thanks.

George Resor
Member
3 years 1 month ago

This is great. Are you going to update the glossary pages so they will be consistent with the new replacement level? Also when I was looking at the glossary pages I noticed that you use a different replacement level for starters vs. relievers and i was just wondering what the new replacement level for relievers is? On a related note Dave Cameron might not want to use RA Dickey as the “walking example” of a replacement level pitcher in any updated explanations of replacement level. http://www.fangraphs.com/blogs/index.php/pitcher-win-values-explained-part-three/

Blue
Guest
Blue
3 years 1 month ago

That is comedy gold!

Stan
Guest
Stan
3 years 1 month ago

Sean Smith clearly had a tough time in the minors. Poor guy never made it to the bigs.

Big Jgke
Member
Big Jgke
3 years 1 month ago

Replacement level is dead! Long live replacement level!

brad
Guest
brad
3 years 1 month ago

I’d love an article on players whose relative rankings are most affected. How much of the difference in Jack Morris’s all-time rank is resolved? Old rankings 145b/75f vs…?

brad
Guest
brad
3 years 1 month ago

81, now, here. Unless he jumps up a lot over at b-r the mystery remains.

Steve Jeltz
Guest
Steve Jeltz
3 years 1 month ago

Poor Jamie Moyer. 269 Wins just ain’t what it used to be.

chuckb
Guest
chuckb
3 years 1 month ago

Great work! My gut reaction to David Appelman’s post was concern but, after reading this, I really like what you and the Seans have done.

db
Guest
db
3 years 1 month ago

Now maybe Dave can admit that FIP for pitchers is a dumber metric than ERA (or RA). Or maybe we can use batted ball profiles to do hitter war.

Baltar
Guest
Baltar
3 years 1 month ago

Your comment is sort of double-dumb. There are good reasons for using FIP rather than ERA, which I won’t go into.
And I would love to have some sort of analysis of a player’s batted balls to use in place of whether they happened to fall in for hits or not. That day may not be far off.

YanksFanInBeantown
Guest
YanksFanInBeantown
3 years 1 month ago

It’s nice to have both. Even if I do prefer bWAR for pitchers.

Joe Peta
Member
3 years 1 month ago

I am so happy that this is being done, and even if Caple’s article earlier this year is getting credit for the catalyst and starting the discussion, I still have to call attention to this article that I wrote in December, 2012 — which as cited in the piece, was inspired by Sam Miller at BP: http://tradingbases.squarespace.com/blog/2012/12/17/lets-level-the-replacement-level-playing-field.html

Kyle
Guest
Kyle
3 years 1 month ago

WAR was so last year. I like that someone ripping on WAR had to point out to you nerds that your model was flawed. FG says themselves it’s a general stat or a big hammer or something. And hey, you can add or subtract a win depending on what you think of the defensive value. So why try to make it exact? Keep tinkering with it and it will never have any credibility. I’ll always know what a RBI is even if it doesn’t tell me anything.

hossenfefer
Member
hossenfefer
3 years 1 month ago

I feel like you’re missing out on what’s going on here. Everyone knows what an RBI is. These sites are just trying to help us understand the game better. You don’t HAVE to use WAR when you’re having a conversation with your buddy about who the better baseball player is. I wouldn’t say “You know, Andrew McCutchen had a better year than Josh Willingham because McCutchen had 3 more WAR.” That’s a lame conversation. But that’s better than saying “Yeah, I think Willingham was better because he had 14 more RBI.”
My point is, why would you ever talk about RBI when there are so many better things to bring up. All RBI are good for is for me to get excited when the Twins get one and to get bummed out when it haoppens against the Twins.
Why would you NOT want a statistic to tell you something, to help you understand the game better.
And, as far as tinkering with WAR. What’s wrong with improving something? I don’t understand why you’d slag on something that is trying to get better? That’s just weird.

Paul
Guest
Paul
3 years 1 month ago

This is nearly perfect satire, except I believe the first sentence should be, “WAR *is* so last year.” Nice job.

Mike Green
Guest
Mike Green
3 years 1 month ago

Bravo. Whenever you compromise and arrive where Tom Tango is, it’s a good indication that you have done something right.

For those too young to remember Alfredo Griffin, he was a co-winner of the AL Rookie of the Year award after hitting .287 with (a career-high) 40 walks and 21 steals (with 16 CS) as a 21 year old shortstop. If you look where he and Ozzie Smith were at age 21 and see where they ended up, you might think that smarts matter. You would be right.

Kevin
Guest
Kevin
3 years 1 month ago

Can someone tell me how they calculate WAR for retired players? I thought they needed to use the data from the Sportvision technology to determine the distribution of balls for the UZR and UBR calculations?

Ben Hall
Member
Member
Ben Hall
3 years 1 month ago

They use Total Zone for pre-2002 (I think that’s a year) defense, which uses play by play data from Retrosheet.

http://www.fangraphs.com/library/index.php/defense/tz-tzl/

blindbuddysirraf
Member
blindbuddysirraf
3 years 1 month ago

I may be mistaken, but if the problem with Jack Morris’ WAR differential was only a baseline replacement level issue, wouldn’t his career WAR ranking be closer to the other site. This says he jumped up 70 pitchers because of a difference in baseline. Wouldn’t all 70 of those pitchers career WAR jump as well, if the baseline was the only problem?

Joe Peta
Member
3 years 1 month ago

Remember “bbs”, WAR is a cumulative stat. So if the baseline is moved (say lowered) everyone’s WAR does increase but a player who has played more seasons than others will have his career WAR jump more.

Clave
Guest
3 years 1 month ago

The primary takeaway for me was that Tom Tango is pretty much always right.

Ray A.
Guest
3 years 1 month ago

I believe this marks the first time that anything Jim Caple wrote contributed positively to the game of baseball.

adohaj
Guest
adohaj
3 years 1 month ago

So players like Rickey Henderson and Pete Rose lost 14-19 WAR, and a player like Joe DiMaggio lost about 7-8 war? And the inverse for Bbref?

Bip
Guest
Bip
3 years 1 month ago

I always thought fangraphs WAR was a little inflated compared to baseball-reference. Or of course that B-ref was a little deflated compared to fangraphs, as neither was clearly right or wrong about replacement level.

Forrest Gumption
Member
Forrest Gumption
3 years 1 month ago

Man, Alfredo Griffin was pretty bad for a long time.

Choo
Member
3 years 1 month ago

The question now is not if, but when the Unification of Replacement Level will be commemorated on a US postal stamp or massive oil painting displayed at The Metropolitan Museum of Art. This is some big-time forefathers shit right here.

Neil
Guest
Neil
3 years 1 month ago

I like this, but it’s freaking me out that I woke up today and everyone’s WAR is slightly different.

Joshy
Guest
Joshy
3 years 1 month ago

Cool, that explains Ricky Romero’s huge difference for his 2011 season. Fangraphs 2.4, Baseball Reference 6.3.! Are there any larger differences?

Patrick
Guest
Patrick
3 years 1 month ago

But it doesn’t, because his B-R WAR went up and his Fangraph’s down from this change. This difference is due to the different calculations.

Pinstripe Wizard
Member
3 years 1 month ago

So to summarize Mike Trout > combined Astros roster.

According to Cot’s, the Astros 2012 opening payroll was around $60.8M. Given that Trout was 10.2/7.3 = 1.4 times better than the Astros, Boras should ask for a contract with an AAV of $60.8M * 1.4 = $85.1M. I’m thinking 10/850 is a good starting point.

Pinstripe Wizard
Member
3 years 1 month ago

And yes I know Trout isn’t a Boras client, but Boras would probably be the only agent that would ask for 10/850.

rubesandbabes
Guest
rubesandbabes
3 years 1 month ago

Peter repays Paul?

One underlying problem with all the silly Stat Separatism is that the very best few understanderers of all this are getting paid for it,

and keeping quiet.

But okay, at least some good news here.

TheSinators
Member
TheSinators
3 years 28 days ago

Has this change already taken place? Or does Randy Johnson really have a career WAR of 110?

dolbear65
Member
dolbear65
3 years 16 days ago

There is still a huge difference between fangraphs WAR and Baseball Reference WAR in some cases. Why?

wpDiscuz