Pitcher WAR and the Concept of Value

Whenever one makes any conclusion based off of anything, a bunch of underlying assumptions get shepherded in to the high-level conclusion that they output. Now that’s a didactic opening sentence, but it has a point–because statistics are full of underlying assumptions. Statistics are also, perhaps not coincidentally, full of high-level conclusions. These conclusions can be pretty wrong, though. By about five-hundred runs each and every season, in this case.

Relative player value is likely the most important area of sports analysis, but it’s not always easy. For example, it’s pretty easy to get a decent idea of value in baseball while it’s pretty hard to do the same for football. No one really knows the value of a pro-bowl linebacker compared to a pro-bowl left guard, for one. People have rough ideas, but these ideas are based more on tradition and ego than advanced analysis. Which is why football is still kind of in the dark ages, and baseball isn’t. But just because baseball is out of the dark ages, it doesn’t mean that it’s figured out. It doesn’t even mean that it’s even close to figured out.

Because this question right here still exists: What’s the value of a starting pitcher compared to a relief pitcher? At first glance this a question we have a pretty good grasp on. We have WAR, which isn’t perfect, yeah, but a lot of the imperfections get filtered out when talking about a position as whole. You can just compare your average WAR for starters with your average WAR for relievers and get a decent answer. If you want to compare the top guys then just take the top quartile and compare them, etc. Except, well, no, because underlying assumptions are nasty.

FanGraphs uses FIP-WAR as its primary value measure for pitchers, and it’s based on the basic theory that pitchers only really control walks, strikeouts, and home runs–and that everything else is largely randomness and isn’t easily measurable skill. RA9 WAR isn’t a good measure of individual player skill because a lot of it depends upon factors like defense and the randomness of where the ball ends up, etc. This is correct, of course. But when comparing the relative value of entire positions against each other, RA9 WAR is the way to go. Because when you add up all the players on all of the teams and average them, factors like defense and batted balls get averaged together too. We get inherently perfect league average defense and luck, and so RA9 WAR loses its bias. It becomes (almost) as exact as possible.

Is this really a big deal, though? If all of the confounding factors of RA9 WAR get factored together, wouldn’t the confounding factors of FIP-WAR get factored together too? What’s so bad about using FIP-WAR to judge value? Well there’s this: From 1995 onward, starting pitchers have never outperformed their peripherals. Relievers? They’ve outperformed each and every time. And it’s not like the opposite happened in 1994–I just had to pick some date to start my analysis. Here’s a table of FIP-WAR compared to RA9-WAR compared to starters for the last 18 years, followed by the same table for relievers.

Starter RA9-WAR/FIP-WAR Comparisons

Year RA9 WAR FIP WAR Difference
1995 277.7 305.0 -27.3
1996 323.2 337.1 -13.9
1997 302.5 336.6 -34.1
1998 326.8 357.8 -31.0
1999 328.7 359.7 -31.0
2000 323.0 348.6 -25.6
2001 324.9 353.9 -29.0
2002 331.4 348.6 -17.2
2003 315.0 346.7 -31.7
2004 311.9 343.0 -31.1
2005 314.8 333.0 -18.2
2006 317.0 345.7 -28.7
2007 343.3 361.6 -18.3
2008 325.7 351.9 -26.2
2009 325.1 351.8 -26.7
2010 317.8 353.6 -35.8
2011 337.3 355.6 -18.3
2012 311.1 337.6 -26.5
2013 304.0 332.4 -28.4

Reliever RA9-WAR/FIP-WAR Comparisons

Year RA9 WAR FIP WAR Difference
1995 78.4 50.3 28.1
1996 73.9 61.8 12.1
1997 98.0 65.4 32.6
1998 101.6 70.4 31.2
1999 99.8 68.9 30.9
2000 106.9 80.2 26.7
2001 103.3 77.6 25.7
2002 91.1 76.6 14.5
2003 112.5 83.4 29.1
2004 117.7 85.1 32.6
2005 115.7 96.7 19.0
2006 112.7 84.0 28.7
2007 86.8 68.2 18.6
2008 104.1 79.7 24.4
2009 103.7 77.7 26.0
2010 109.0 74.9 34.1
2011 91.0 73.6 17.4
2012 116.3 91.3 25.0
2013 126.6 98.5 28.1

Ok, so that’s a lot of numbers. The basis, though, is that FIP thinks that starters are better than they actually are, while it thinks relievers are the converse. And this is true year after year, by margins that rise well above negligible. Starters allow roughly 250 more runs than they should according to FIP every season, while relievers allow about 250 less than they should by FIP’s methodologies–in much fewer innings. In more reduced terms this means that starters are over-valued by about 10% as whole, while relievers are consistently under-valued by about 25% according to FIP-WAR. Now, this isn’t a completely new idea. We’ve known that relievers tend to outperform peripherals for a while, but the truth is this: relievers really outperform peripherals, pretty much all the time always.

Relievers almost get to play a different game than starters. They don’t have to face lineups twice, they don’t have to throw their third or fourth-best pitches, they don’t have to conserve any energy, etc. There’s probably a lot more reasons that relievers are better than starters, too, and these reasons can’t be thrown out as randomness, because they pretty much always happen. Not necessarily on an individual-by-individual basis, but when trying to find the relative value between positions, the advantages of being a reliever are too big to be ignored.

How much better are relievers than starters at getting “lucky”? Well, a few stats that have been widely considered luck stats (especially for pitchers) for a while are BABIP and LOB. FIP assumes that starters and relievers are on even ground, as far as these two numbers are concerned. But are they? Here’s a few tables for comparison, using the same range of years as before.

BABIP Comparisons

Year Starter BABIP Reliever BABIP Difference
1995 0.293 0.290 0.003
1996 0.294 0.299 -0.005
1997 0.298 0.293 0.005
1998 0.298 0.292 0.006
1999 0.297 0.288 0.009
2000 0.289 0.284 0.005
2001 0.290 0.286 0.004
2002 0.295 0.293 0.002
2003 0.294 0.285 0.009
2004 0.298 0.292 0.005
2005 0.300 0.292 0.009
2006 0.293 0.289 0.003
2007 0.291 0.288 0.003
2008 0.297 0.290 0.007
2009 0.296 0.288 0.008
2010 0.292 0.283 0.008
2011 0.292 0.290 0.002
2012 0.294 0.288 0.006
2013 0.293 0.287 0.006

LOB Comparisons

Year Starter LOB% Reliever LOB% Difference
1995 69.9% 73.4% -3.5%
1996 70.9% 73.2% -2.4%
1997 69.5% 72.7% -3.2%
1998 69.9% 73.1% -3.2%
1999 70.6% 73.2% -2.7%
2000 71.4% 74.3% -2.8%
2001 70.9% 74.0% -3.1%
2002 70.2% 72.3% -2.0%
2003 70.7% 73.8% -3.1%
2004 70.4% 74.0% -3.6%
2005 70.6% 72.9% -2.3%
2006 70.9% 74.2% -3.3%
2007 71.5% 74.0% -2.4%
2008 71.3% 73.9% -2.6%
2009 71.7% 74.3% -2.6%
2010 72.0% 75.3% -3.3%
2011 72.0% 74.6% -2.6%
2012 73.1% 76.2% -3.1%
2013 71.9% 75.5% -3.6%

With the exception of BABIP in ’96, relievers always had better luck than starters. Batters simply don’t get on base as often–upon contacting the ball fairly between two white lines–when they’re facing guys that didn’t throw out the first pitch of the game. And when batters do get on, they don’t get home as often. Relievers mean bad news, if good news means scoring more runs.

Which is why we have to be careful when we issue exemptions to the assumptions of our favorite tools. There are a lot of solid methodologies that go into the formulation of FIP, but FIP is handicapped by the forced assumption that everyone is the same at the things that they supposedly can’t control. Value is the big idea–the biggest idea, probably–and it’s entirely influenced by how one chooses to look at something. In this case it’s pitching, and what it means to be a guy that only pitches roughly one inning at a time. Or perhaps it’s about this: What it means to be a guy who looks at a guy that pitches roughly one inning at a time, and then decides the worth of the guy who pitches said innings, assuming that one wishes to win baseball games.

The A’s and Rays just spent a bunch of money on relievers, after all. And we’re pretty sure they’re not dumb, probably.

Print This Post

Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.

9 Responses to “Pitcher WAR and the Concept of Value”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Anon says:

    FIP is handicapped by the forced assumption that everyone is the same at the things that they supposedly can’t control

    All the data you presented here only reinforces my dislike for the using the work ‘luck’ to describe almost anything about baseball.

    Also, I do not have the calculations to determine how statistically significant this data is, but it seems very clear that FIP is ignoring some skill(s), which skews the data it presents away from reality.

    Vote -1 Vote +1

  2. Jason says:

    Excellent article. One question I have in the whole starters vs. relievers RA9/FIP, is the status of inherited runners.


    As explained in the above article, starting pitchers are kind of unfairly treated when it comes to inherited runners and I wonder how league-wide cRA9 (from the article, context-RA9 or RE24 based RA9) tracks by year for starters vs relievers.

    Vote -1 Vote +1

    • Jason says:

      The more I try to wrap my sleep-deprived head around it, I keep going back and forth. Over large amounts of innings, the runs allowed would approach the run expectancy tables, so maybe there would be no effect. On the other hand, relievers brought in to face a specific batter usually gain a platoon advantage and probably beat the run expectancy table, which would even widen the relievers’ advantage over FIP.

      So, in conclusion, taking inherited runners into account and assigning ‘blame’ for them scoring according to run expectancy tables would either help starters, help relievers, or have no effect, and I’m just not smart enough to work it out.

      Vote -1 Vote +1

  3. Matthew Cornwell says:

    Tom Tango showed recently that RA is a better indicator of skill than FIP after 6-7 full seasons. It is all about sample size and regression.

    it is no longer a question of pitchers have some impact on BABIP and LOB% and HR/FB, etc. the issue is and always has been how much data is needed to find it among all of the randomness.

    Vote -1 Vote +1

    • And then, of course, there’s the notion that nothing is actually random–but that our data is imprecise or blunt–or that we simply don’t understand the data we do have well enough.

      Even a coin flip isn’t random–it’s an elegant physics equation. Baseball’s quite more complicated, especially because it’s a game of relative skill. Obtaining and then understanding granularity is the only way we can get better without having to rely on huge sample sizes. There are so many factors in baseball that we’ll never understand all of them, but we can get better at them. We’re really just getting started with understanding even the simple vagaries of BABIP. Which is all just too many words to say that we need more data, and we need to use the data we do have better.

      Vote -1 Vote +1

  4. Ben Markham says:

    This is why I like SIERA. SIERA isn’t perfect (it doesn’t predict Kershaw’s dominance for instance) it at least goes farther than Ks, BBs, and HRs. SIERA assumes relievers can sustain a lower BABIP than starters because of the advantages you mentioned (less than once around the order, no need to hold back).

    Because of the advantage in BABIP, relievers can also sustain a lower LOB% than starters. That’s because the more base runners you give up, the higher LOB% you can expect. Let one person on base in an inning and as long as it’s not a HR you don’t give up a run. Give up four base runners and you’re at least giving up a run, but more likely you’ll give up two or three runs. SIERA also takes this into account, by using BABIP with BB rate to estimate LOB%.

    Vote -1 Vote +1

    • Matthew Murphy says:

      Just for clarification, SIERA doesn’t actually use BABIP or LOB%, it simply accounts for the fact that high strikeout pitchers have an added benefits that aren’t included in FIP (such as inducing weaker contact, driving a lower BABIP) by making the effect of strikeouts on SIERA non-linear (amont other adjustments).
      Your point about SIERA being better to compare starters to relievers is right on. Over the past four years:

      ——– ERA / FIP / xFIP / SIERA
      Starters – 4.10 / 4.05 / 4.01 / 4.07
      Relievers – 3.72 / 3.83 / 3.92 / 3.58

      xFIP does the worst job, of comparing the two, because it assumes that pitchers have no control over HR/FB rate. FIP is a little bit better because it accounts does include homers (HR/FB% of 9.5% for relievers vs 10.6% for starters). SIERA, which places a greater weight on high strikeouts (21% relievers vs 18% starters) might actually favor relievers a bit too much given these numbers.

      Vote -1 Vote +1

  5. jdm says:

    Starters have lineups configured based on their handedness, relievers are able to take advantage of the platoon splits. Managers control the game flow based upon this which is why we see relievers come in for one batter in the middle of an inning only to change for the next batter. Good managers play match ups, but additionally batters performance against the same starter improves as the game goes on as they get to understand their pitch sequence and see all their pitches, so relievers only facing one batter provides them the advantage of mystique. I do believe that WAR probably undervalues relief pitchers which is why we see relievers get exorbitant contracts in terms of $/WAR but I think this also has to do with the binary result of a closer in a game.

    Vote -1 Vote +1