Author Archive

WAR and the Relief Pitcher, Part II


Back on 2016-Nov-11 I posted WAR and Eating Innings.

Basically, I was looking at reliever WAR and concluded that giving a lower replacement to relievers isn’t quite correct. Inning for inning, a replacement reliever needs to be better than a replacement starter, because eating innings has real value. But reliever/starter doesn’t actually capture the ability to eat innings, and I gave several examples where it fails historically.

I don’t have roster-usage numbers and don’t want to penalize a pitcher for sitting on the bench, but outs per appearance makes a nice proxy for the ability to eat innings; and in a linear formula that attempts to duplicate the current distribution of wins between relievers and starters, this gives roughly 0.367 win% as pitcher replacement level (as opposed to the current 0.38 for starters and 0.47 for relievers), and then penalized the pitcher roughly 1/100th of a win per appearance.

The LOOGY needs to be pretty good against his one guy to make up for that penalty, but for a starter it will make almost no difference.

That’s pretty much the entire article summarized in three paragraphs. By design, this doesn’t change much about 2016 WAR — it will give long relievers a modest boost, and very short relievers (LOOGYs and the like) a very modest penalty, and have an even smaller effect on starters.

So why did I bother?

Well, first, there are historical cases where it does matter; but more to the point, I was thinking that relievers are being undervalued by current WAR, and to examine this I needed a method to evaluate a reliever’s value compared to a starter’s value, and different replacement levels complicate that.

Why Do I Think Relievers Are Undervalued?

You could just go to this and read it; it shows that MLB general managers thought relievers were undervalued as of a few years ago. But that’s not what convinced me. What convinces me is the 2016 Reds pitching staff. 32 men pitched at least once for the Cincinnati Reds in 2016. Their total net WAR was negative.

Given that the Reds did spend resources (money and draft picks) on pitching, if replacement level is freely available, then that net negative WAR is either spectacularly bad luck, or spectacularly bad talent evaluation.

32 Reds pitchers were used; sort by innings pitched, and the top seven are all positive WAR, accounting for 5.6 of the Reds’ total of 6.7 positive WAR. Of their other 25 pitchers, only three had positive WAR: Michael Lorenzen (reliever, 50 innings, part of the Reds’ closer plans for the coming year), Homer Bailey (starter, coming off Tommy John and then injured again, only six appearances), and Daniel Wright (traded away mid-season, after which he turned back into a pumpkin and accumulated negative WAR for the season).

It sure sounds like the Reds coaches knew who their best pitchers were and used them. Their talent evaluation was not spectacularly bad. But they had 17 relievers with fewer than 50 innings, and not one of them managed to accumulate positive WAR for the year.

Based on results, we can list the possible mistakes in who they gave innings to: Maybe they could have used Lorenzen a bit more. That’s it; otherwise it’s hard to improve on who they gave the innings to. They also usually gave the high-leverage innings to their best relievers.

So, if replacement level is freely available, why did the Reds coaches give a total of 574.2 innings to 22 pitchers who managed between them to accumulate no positive WAR and 7.1 negative WAR?

If that’s just bad luck, it is spectacularly bad luck; and spectacularly consistent, as the Reds seem to have known in advance exactly who was going to have all this bad luck.

I don’t really believe it is bad luck. Thus, I don’t really believe that the Reds pitchers were below replacement, and the alternative is that replacement (at least for relievers) is too high.

GMs Still Agree: Relievers Are Undervalued by WAR

The article I referenced above was from the 2011-2012 off season; maybe something has changed.

As I write this (2017-Feb-24), FanGraphs’ Free Agent Tracker shows 112 free agents signed over the 2016-2017 off-season. 10 got qualifying offers and thus aren’t truly representative of their free-market value. 22 have no 2017 projection listed, and most of those went for minor-league deals (Sean Rodriguez and Peter Bourjos are the exceptions, and they aren’t pitchers). I’m going to throw those 32 out.

That leaves a sample of 80 players, 28 of them relievers or SP/RP. A fairly simple minded chart is below:

(Hmm, no chart. There was supposed to be a chart. Don’t see an option that will change this. Relief pitcher Average $/Year=5.7105*projected 2017 WAR with an R2 of 0.585; everyone else Average $/Year=4.6028+1.401*projected 2017 WAR with an Rof .5917. Note that the “everyone else” line, if you could see it, is below the relief pitcher line at 0 WAR, and then slopes up faster from there.)

R2 values aren’t great, and overall values per WAR are low because most of the big paydays are on multiyear contracts where value can be assumed likely to collapse by the end of the contract (I’m not including any fall-off). But the trend continues — MLB general managers think relievers are worth more than FanGraphs thinks they are.

The formula I give above (replacement of 0.367 win% with a −0.01 wins/appearance) is based on trying to reproduce the FanGraphs results. But if the FanGraphs results are wrong, then so is my formula.

Why the Current Values Might Be Wrong

I’ve shown why I think the current values are wrong, but what could cause such an error?

Roster spots change in value over time. That’s all it takes; the reliever is held to a higher (per-inning) standard because historical analysis indicated that he should be. But if roster spots were free, then it would be absurd to evaluate starters and relievers at all differently. The difference in value depends on the value of a roster spot; or, if using my method, the “cost” imposed per appearance needs to be based on the value of a roster spot.

Prior to 1915, clubs had 21 players, and no DL at all. In 1941, the DL restrictions were substantially loosened, and a team could have two players on the DL at the same time (60-day DL only at that time). In 1984, they finally removed the limits to the number of players on a DL at a time; in 2011, a seven-day concussion DL was added, and a 26th roster spot for doubleheader days; in 2017, the normal DL will be shortened to 10 days.

21 players and no DL makes roster spots golden. You simply could not have modern pitcher usage in such a period.

Not to mention the fact that, in 1913, you’d never have been able to get a competent replacement on short notice. Jets and minor-league development contracts both also dropped the value of a roster spot.

25-26 roster spots, September call-ups to 40, and starting this year you can DL as many players you want for periods short enough that it’s worth thinking about DLing your fifth starter any time you have an off day near one of his scheduled starts. Roster spots are worth a lot less today; it’s not surprising that reliever WAR seems off, when it was based on historical data, and the very basis for having a different reliever replacement level is based on the value of a roster spot.


When I started this, I was hoping to produce a brilliant result about what relief-pitcher replacement should be. I have failed to do so; there’s simply too little data, as shown by the low R2 values on the chart I tried to include above, to make a serious try at figuring out what general managers are actually doing in terms of their concept of reliever replacement level.

But the formula I suggested back in November has an explicit term acting as a proxy for the value of a roster spot, and that term can be adjusted for era. If you drop the cost of an appearance from 0.01 WAR to some lower value, raising replacement a bit to compensate, you’ll represent the fact roster spots have changed in value over time.

Given any reasonable attempt to estimate the cost per appearance based on era, I don’t see how this could be worse than the current methods.

WAR and Eating Innings

A WAR Carol

Winter has come, baseball season is over, and Ebenezer finishes his analysis and goes home to his cold bed and DVD of Game 4 of the 2004 ALCS since there are no longer any current games to watch.

The Ghost of Pitcher Wins appears and informs him that he will be visited by the Ghosts of Relievers Past, Present, and Future, who will explain to him the errors of his ways.

Mike Marshall appears. “In 1974 I pitched 208.1 innings in relief during the regular season and 12 more in the post season. I accumulated 4.4 WAR. It would have been about 2 wins higher, but I was penalized for being a relief pitcher. It seems that giving my manager over 200 innings of good pitching becomes less valuable if I do it out of the bullpen when and where he needs it rather than as a starter on a schedule. I’m not alone — in MLB history there have been 393 pitcher seasons with over 100 regular season innings and no starts. I don’t even hold the record for most relief innings in a season. Why must I suffer for being a reliever when I carried a starter’s load?”

Next is the Ghost of Relievers Present. Tony Watson appears. “In 2016 I went 67.2 innings as a lefty with a large platoon split (.049 difference in wOBA between lefties and righties). But because I wasn’t totally hopeless against righties I faced well over twice as many righties as lefties (195 to 77) and ended the season with –0.1 WAR. Had I been a worse pitcher so Clint Hurdle used me less I’d have had a positive WAR. Would any manager have actually preferred a LOOGY who faced fewer batters and was inferior against both righties and lefties? Why am I penalized for being too good to be used only as a LOOGY?”

Next comes a group of six — it’s the Ghost of Relievers Future, and they say, “In the distant future a team attempts starting by committee. They have nine pitchers who typically go 18 batters each on a three-day rotation so as to avoid the third time through the order penalty. We come in in the middle of a game, at an unpredictable time, and do the same job for the same length of time as the starters. Occasionally we’re asked to cover additional outs if an earlier pitcher melts down or is injured so we enter early and go long. The starters have a lower replacement level than we do despite having the easier job with greater certainty about both when they will enter and leave a game. How is this fair?”

They fade from view, and The Ghost of Pitcher Wins reappears and says, “Seriously; the reason relievers have a higher replacement level is because their usage is different than that of a starter in ways that affect their value. But different relievers can have drastically different usage, and that also affects their value. Fix this, Ebenezer, or in the long run WAR for relievers will suffer my fate and be superseded by a better tool for reliever evaluation.”

Why the Problem Exists

Why does the replacement level differ between starters and relievers? That’s easy — replacement level is different because it’s easier to find a reliever with a given xFIP, wOBA, RA/9, ERA, or pretty much any other rate stat than it is to find a starter that good. Starters improve when sent to the bullpen; relief is an easier job, so it has a higher replacement level.

But if that were all there was to it then pretty much everyone would do nothing but have bullpen games. Relievers are better and the goal is to win games. So why employ starters at all, much less pay them lots of money?

I’m going to assume that a team has seven roster spots for relievers and five for starters. I’m going to exclude September and October from this analysis as the limit to a 25-man roster doesn’t apply in those months.

In 2016, prior to September, a starter roster spot averaged 151.8 innings (decimal fraction rather than outs obviously). A reliever roster spot averaged 60.8 innings. A team averaged 1184.6 innings.

With those utilization numbers a team would need 20 roster spots of typical relief pitching to get to September. This is not viable. A reliever is less valuable than a starter because eating innings has real value. Getting lots of outs has value beyond simple run prevention, because the team not only needs to prevent runs per at-bat for one or two lefties a game, but someone also needs to get through a large number of innings, and most relievers provide far less of this value than a starter.

The problems with reliever WAR in the fable above all come from the fact that we’re using reliever or starter status as a proxy for the ability to eat innings and changing replacement level to reflect this, rather than giving an explicit adjustment for being able to eat innings as a thing of value in its own right and otherwise evaluating pitcher results on a common basis.

Not all starters are equally good at eating innings, not all relievers are equally bad at it, and the ability to eat innings per roster spot used on the pitching staff has value.

When Steve Carlton went 346.1 innings of 11.1 WAR ball in ’72, he not only pitched quite well on the batters he faced, but he also gave his managers a lot of added flexibility by eating far more than his share of the innings. This is a source of substantial value not captured in the current methods. When Mike Marshall ate over 208 innings in relief that was again a source of substantial value not captured in the current methods. Marshall is in fact penalized on the assumption that he is failing to do exactly the thing that he clearly did.


One problem with what I’ve been saying is that the value added depends on innings/roster spot over time, and I don’t have good information about roster usage. Even if I did have good information about exactly how long each pitcher spent on a roster, I don’t want to give a pitcher a negative WAR for being called up and never used. For that matter I also don’t want to have to change the formula in September when roster spots drop in value.

I’m going to use appearances as a proxy for roster-spot usage. Appearances are readily available and this doesn’t penalize a pitcher just for sitting on the bench. Outs/appearance gives an indication of how good a pitcher is at eating innings, or at least of how good his manager thinks he is. Once a pitcher is in, he typically stays in until the manager has a reason to take him out or the game ends. Closers put in only at the end see such short appearances because the manager doesn’t want to use him for longer appearances.

Note that this is all extremely preliminary; I’m mostly hoping someone else will come up with a better solution than I have below.

Cut to the Chase

I had a bunch of stuff typed up, and reading it puts me to sleep.

I ended up convincing myself that I wasn’t going to do better than a simple linear approach. I’m using outs/appearance as a proxy for efficiency at eating innings; I split this into two terms.

The proposed replacement formula for pitcher WAR is:

WAR = (Runs above League Average)/(R/W) + (C1 × total outs recorded) – (C2 × total appearances)

That’s familiar enough — the first term is wins above (or below) average, the C1 × total outs recorded term is simply adding in a replacement level, the C2 × total appearances is a penalty to represent eating a roster spot.

What I’m actually doing is reducing the replacement level and adding a small penalty based on number of appearances. Unless you think there is something magical about being the “starter,” the different replacement levels for starters and replacements already add such a penalty. They simply do so in an ad-hoc way by adjusting the replacement level and assuming that relievers are the ones with short appearances.

The elephant in the room is that relievers and starters record different numbers of appearances over the same amount of time and I’m using appearances as a proxy for roster usage. This is where the math I removed comes in. On June 17, 1915, George Washington “Zip” Zabel came in for 18 1/3 innings in relief in a single game. I don’t think he needed less rest than a starter. Outs/appearance is being used as a stand-in for the ability to eat innings, and rest requirements would also be reasonably modeled as a term dependent on Outs/appearance. I don’t need a separate term for the things already being accounted for.

Let’s run the numbers for 2016. I’m going to assume that the total WAR given to starters and relievers in each league is at least approximately correct, and that all I’m doing is redistributing that WAR slightly.

Player Type

WAR xFIP Total Outs Total Appearances
AL starter 155.3 4.34 41,450 2,428
AL reliever 58.8 3.94 23,383 7,301
AL pitcher 214.1 4.22 64,833 9,729
NL starter 171.6 4.14 40,788 2,428
NL reliever 43.8 4.18 24,298 8,002
NL pitcher 215.4 4.16 65,086 10,430

I don’t have league-specific runs/win handy; 9.778 is the combined value, so I’ll use that. I also don’t have a good way to correct for the fact that some fraction of reliever WAR is due to leverage concerns and won’t apply to the average values I’m using here.

155.3 = outs/27×(4.22−4.34)/.92/9.778 + 41,450×C1 −2,428×C2

58.8 = outs/27×(4.22−3.94)/.92/9.778 + 23,383×C1 −7,301×C2

And if follows that for the AL the value of C1 is 0.004906 and C2 is 0.01135.

The AL C1 value gives a replacement level of 0.1345 below league average, or replacement of 0.3655, slightly less than the .38 currently used for starters. Then the AL C2 value penalizes a pitcher an 88th of a win for each time he comes into a game.

The same calculation for the NL comes out with a C1 of 0.004856 and C2 of 0.009520; or replacement of 0.3689, and a penalty of one 105th of a win per appearance.

Call it a replacement level of 1 win less than average per 200 outs recorded and a penalty of 1 win per 100 appearances and you’d be close enough for a first cut.

I strongly suspect that more detailed analysis with better starting numbers and taking leverage effects into account would work better, but the basic method will give long relievers some credit for what they’re doing, and give exceptionally long or short starters a small amount of credit for their ability (or inability) to eat innings also.