# Win Probability Added Above Replacement

### Short Version

WPA, used as a value metric, is incomplete. You have to build in replacement level, including bullpen chaining, to get the full story. These adjustments, which are commonly accepted parts of WAR, shift value to starting pitchers relative to high-leverage relievers, while keeping the win expectancy framework of WPA at the heart of the value metric. With readily available data we can use some basic algebra to convert WPA to WPA-Above-Replacement, or WPAAR.

### Much Longer Version

Thanks to Zach Britton’s near-perfect season, a reliever received real Cy Young support. In the saber community, his candidacy was supported by WPA (Win Probability Added) where he led the majors with +6.2 wins (and Andrew Miller finished second with +4.8 wins.) WPA-backed arguments mimic those that tout Mike Trout as the obvious MVP: “whoever helped their team win the most games is the most valuable player.” The problem with WPA-based MVP arguments, however, lies in the assumption that the WPA leader is the player who helped his team win the most games. Even after deciding that the win probability framework is the one you want to use, WPA is just *a* win probability metric, not *the* win probability metric, and, as I’ll lay out below, it’s an *incomplete* win probability metric.

#### Background: Win Probability and Leverage

Win probability (or win expectancy) is the baseball version of the little percentages next to the cards in televised poker. Given the state of the game, and an expectation of “all” the possibilities that could occur the rest of the game, how likely is each team to win? Win probability *added* is the change in win probability after an event occurs. When Jose Bautista hits a home run, the Blue Jays are more likely to win, and that change in expectancy is credited to Bautista. Add up all these little changes over a full season, and you have a player’s WPA.

WPA is very similar to WAA, Wins Above Average, except for how the wins are tallied. WPA uses win probabilities, WAA uses linear weights. In the middle is REW (Run Expectancy Wins). Run expectancy, like win expectancy, uses the game situation to calculate the change of each play. The difference is that run expectancy only takes into account runners on base and number of outs, while win expectancy also accounts for inning and score. Linear weights doesn’t care about context of an event at all, using the average value across all possible contexts. REW, like linear weights, use a runs-per-win converter to translate runs into wins. Win probability starts with wins as the unit.

To summarize:

Linear weights | Run expectancy | Win probability | Championship probability | |
---|---|---|---|---|

Context/leverage | None | Runners on base, outs | Inning, score, runners, outs | Standings, inning, score, runners, outs |

Question answered | On average, across all situations a PA might occur in, how many runs does a single add? | How many more runs do we expect to score this inning because of this single? | How much more likely are we to win this game because of this single? | How much more likely are we to win the World Series because of this single? |

Common stats | wRAA, RAA, WAA (converted to wins) | RE24, REW (converted to wins) | WPA | cWPA |

A: http://www.hardballtimes.com/postseason-probability-added/

B: http://baseballanalysts.com/archives/2009/04/championship_wp.php

C: http://www.hardballtimes.com/the-top-10-plays-of-2016-according-to-championship-wpa/

In all of these expectancy metrics, there is an inherent assumption that some situations are more important than others. For example, an at-bat in a tied game in the ninth inning matters more than in a six-run game in the fifth. It matters more because the outcome of the at-bat has a bigger influence on the outcome of the game. Mathematically, the average change in win expectancy is larger in the first example – there are wider swings. The difference between a strikeout and a home run is quite wide in a tied game in the ninth, while the difference is negligible in a six-run game in the fifth. And you know that intuitively, because your heart is racing. This “average change in win expectancy” is known as leverage. Every situation can be assigned a leverage value using similar math to expectancy metrics. Each expectancy metric has its own version of leverage, according to the context it cares about.

If you’ve heard of leverage, it’s most likely the one associated with win expectancy, but there’s also base-out leverage, championship leverage, etc. (Linear weights does not have an associated leverage, since outcomes have no context in linear weights.) FanGraphs reports a few aggregated stats measuring win expectancy leverage. pLI averages a pitcher’s average leverage across all plate appearances. inLI averages leverage across the first pitch of an inning a pitcher started. gmLI averages leverage across the leverages of the first pitch a pitcher makes in a game. exLI cares about the leverage when a pitchers exits. When calculating reliever WAR, wins above average based on linear weights (or FIP or ERA) is multiplied by LI to give relievers who pitch more important innings more credit for their runs prevented.

#### Background: Bullpen Leverage Chaining

Finally, while closers pitch high-leverage innings and deserve a lot of credit for doing so, their replacements aren’t replacement-level relievers, but instead are setup guys. When a closer goes down, the guy added from Triple-A is given mop-up duty, not the closer role, while everyone else moves one step higher on the ladder. The closer is replaced by the setup guy, the setup guys is replaced by the 7th inning guy, all the way down the line. All those little changes add up to yield the actual value of the closer. To account for this, we give half credit for the higher leverage innings of good relievers. Why half? Because that’s what makes the math work out – there’s a longer explanation and an example calculation here if you are interested in said math. Closers usually deserve to close because they’re excellent relievers, but replacing them with setup guys doesn’t hurt the team as much as their raw leverage and WPA numbers suggest.

#### Background: Replacement Level

Again, what all these probability/expectancy stats have in common is that they are relative to average. You can interpret that as the league summing to zero net wins, or that each player is compared to an average player. But we don’t use wins-above-average very often, because it’s incomplete. It doesn’t account for the value that an average player provides over a replacement level player. It says that a 0 WAA player over 10 plate appearances was just as valuable as a 0 WAA player over 600 plate appearances. But you’d rather have the second player, because the first requires you to find another 590 plate appearances at league-average rate. That’s not easy, and not cheap. That’s the reason why we usually use WAR (Wins Above Replacement), building in the value of an average player above and beyond that of a replacement level player. This can be more than a two-win difference for full-time players.

Relative to above-average stats, above-replacement stats reward additional playing time. This shifts value from relievers to starters, because starters pitch more innings. Additionally, the replacement level for relievers is better, because performance improves moving from starting to relieving (and vice versa). This adjustment isn’t too dissimilar from park adjustments, accounting for the difficulty of the job each player does. Relieving is easier than starting. The advantages of pitching in relief include throwing harder, using only your best pitches, and facing hitters only once per game. Most relievers are failed starters. Justin Verlander has a career 3.47 ERA as a starter, but can you imagine what his ERA would be as a reliever, going just one inning at a time? Research has shown that the typical pitcher would have an ERA almost a full run lower in a relief role than as a starter. Strikeouts increase about 17 percent, home runs per batted ball decrease about 17 percent, and BABIP decreases by about 17 percent. Replacement level for relievers is about the league average ERA, while replacement level for starters is about a full run higher. One run of ERA over 180 innings is a difference of 20 runs, or about two wins. That, not coincidentally, is the value of a league average starter: two wins.

As you can probably guess, these adjustments comparing an average player to a replacement level player significantly decrease the value of high-leverage relievers when judged solely by WPA. But these are all adjustments that we already make in WAR and are commonly accepted. By using win probability above replacement, we’re still giving bullpen aces lots of credit for their higher-leverage performances, just not as much as raw WPA claims.

### The New Stuff: Converting WPA to WPAAR

So, what’s the solution? I’m going to call it Win Probability Added Above Replacement, and calculate it using the 2016 versions of Zach Britton and Jon Lester (the top starter by WPA) as examples. The two main adjustments are for bullpen chaining and differing replacement levels of starters and relievers.

- Start with WPA. For Britton, this is +6.1. For Lester, this is +4.6. Because the former is a higher number than the latter, many people make claims like “WPA says Zach Britton was more valuable than Jon Lester.” The purpose of this article has been to highlight the context missing in that interpretation of the two numbers. I’d go so far as to say it’s plain wrong.
- Adjust pLI (leverage index) halfway toward 1. This is the bullpen chaining adjustment. For Britton, a pLI of 1.8 becomes 1.4. For Lester, .94 stays at .94, since he’s a starter. WPA is giving Britton full credit for the situations he pitched in, when he really only deserves half.
- Move WPA toward average (zero) by the ratio of LI_adj/LI. For Britton, that ratio is 1.4/1.8 = 78%, and 78% * 6.1 = 4.8. For Jon Lester, no change from +4.6. Because Britton only deserves half credit for the high-leverage situations he finds himself in, his WPA is adjusted down.
- Credit the player for the value of league-average performance over replacement level. For starters, that’s about 2 wins per 180 innings. So Jon Lester gains 2*202/180 = 2.2, for a total of 6.8 WPAAR. But since reliever replacement level is approximately league average, there’s no extra credit for Britton. He stays at +4.8.

In total, Jon Lester gains 2.2 wins due to replacement level, while Britton loses 1.4 wins due to replacement level and chaining. Britton’s 1.5 win lead in WPA over Lester becomes a 2.0 win deficit in WPAAR. Here’s the top 25 leaderboard from 2016.

Name | Team | IP | WPA | WPAAR | delta |
---|---|---|---|---|---|

Jon Lester | CHC | 202 | 4.6 | 6.8 | 2.2 |

Johnny Cueto | SF | 219 | 3.8 | 6.2 | 2.4 |

Max Scherzer | WAS | 228 | 3.6 | 6.1 | 2.5 |

Kyle Hendricks | CHC | 190 | 3.9 | 6.0 | 2.1 |

Justin Verlander | DET | 227 | 3.5 | 6.0 | 2.5 |

Clayton Kershaw | LAD | 149 | 4.2 | 5.8 | 1.7 |

Jose Fernandez | MIA | 182 | 3.2 | 5.2 | 2.0 |

Tanner Roark | WAS | 210 | 2.9 | 5.2 | 2.3 |

Aaron Sanchez | TOR | 192 | 2.9 | 5.1 | 2.1 |

Masahiro Tanaka | NYY | 199 | 2.6 | 4.8 | 2.2 |

Chris Sale | CHW | 226 | 2.3 | 4.8 | 2.5 |

Zach Britton | BAL | 67 | 6.1 | 4.8 | -1.4 |

Jose Quintana | CHW | 208 | 2.4 | 4.7 | 2.3 |

Madison Bumgarner | SF | 226 | 1.9 | 4.4 | 2.5 |

J.A. Happ | TOR | 195 | 2.2 | 4.4 | 2.2 |

Rick Porcello | BOS | 223 | 1.8 | 4.3 | 2.5 |

Noah Syndergaard | NYM | 183 | 2.1 | 4.1 | 2.0 |

Corey Kluber | CLE | 215 | 1.7 | 4.0 | 2.4 |

Cole Hamels | TEX | 200 | 1.8 | 4.0 | 2.2 |

Julio Teheran | ATL | 188 | 1.9 | 3.9 | 2.1 |

Andrew Miller | – – – | 74 | 4.8 | 3.8 | -1.0 |

Marco Estrada | TOR | 176 | 1.8 | 3.8 | 2.0 |

Jake Arrieta | CHC | 197 | 1.6 | 3.8 | 2.2 |

Carlos Martinez | STL | 195 | 1.6 | 3.7 | 2.2 |

Rich Hill | – – – | 110 | 2.4 | 3.6 | 1.2 |

If you want to see the whole list, which displays more of the data, you can see it here.

### Additional Notes

Now, I don’t actually suggest using WPA for starting pitchers, as their leverage is heavily dependent on run support and timing of the runs scored in the game, which are clearly not pitching skills (for more on not using WPA for starting pitchers, read these three pieces at *The Book* blog). A better approach is to use a different, more traditional WAR metric for starting pitchers, even if you want to compare them to the WPAAR numbers of relievers. If we remove starting pitchers from the WPAAR leaderboard above, here’s how relievers stack up:

Name | Team | IP | WPA | WPAAR | delta |
---|---|---|---|---|---|

Zach Britton | BAL | 67 | 6.1 | 4.8 | -1.4 |

Andrew Miller | – – – | 74 | 4.8 | 3.8 | -1.0 |

Sam Dyson | TEX | 70 | 3.6 | 2.6 | -0.9 |

Dan Otero | CLE | 70 | 2.1 | 2.5 | 0.4 |

Mark Melancon | – – – | 71 | 3.1 | 2.4 | -0.7 |

Jeremy Jeffress | – – – | 58 | 2.9 | 2.3 | -0.6 |

Roberto Osuna | TOR | 74 | 2.8 | 2.2 | -0.5 |

Aroldis Chapman | – – – | 58 | 2.7 | 2.2 | -0.5 |

Robbie Ross Jr. | BOS | 55 | 1.8 | 2.0 | 0.2 |

Will Harris | HOU | 64 | 2.3 | 1.9 | -0.4 |

Mychal Givens | BAL | 74 | 1.9 | 1.9 | -0.1 |

Seung Hwan Oh | STL | 79 | 2.2 | 1.8 | -0.4 |

Joe Blanton | LAD | 80 | 1.9 | 1.7 | -0.1 |

Blake Treinen | WAS | 67 | 1.7 | 1.7 | 0.0 |

Cody Allen | CLE | 68 | 2.1 | 1.7 | -0.4 |

A.J. Ramos | MIA | 64 | 2.1 | 1.6 | -0.5 |

Mauricio Cabrera | ATL | 38 | 1.9 | 1.6 | -0.3 |

Ryan Buchter | SD | 63 | 1.7 | 1.6 | 0.0 |

Addison Reed | NYM | 77 | 1.8 | 1.5 | -0.2 |

Brad Hand | SD | 89 | 1.7 | 1.5 | -0.2 |

Tyler Lyons | STL | 48 | 1.1 | 1.5 | 0.3 |

Kenley Jansen | LAD | 68 | 1.8 | 1.5 | -0.3 |

Kelvin Herrera | KC | 72 | 1.7 | 1.5 | -0.3 |

Nate Jones | CHW | 70 | 1.9 | 1.5 | -0.4 |

Tyler Thornburg | MIL | 67 | 1.9 | 1.4 | -0.4 |

Matt Bush | TEX | 61 | 1.6 | 1.4 | -0.2 |

Peter Moylan | KC | 44 | 1.1 | 1.4 | 0.3 |

Jeurys Familia | NYM | 77 | 1.8 | 1.4 | -0.5 |

Additionally, WPA does a poor job of parsing defensive credit between pitching and fielding (as in, it doesn’t do it). A fielder making a great play is credited to the pitcher under WPA, when really the pitcher should be held accountable for the quality of the batted balls he gave up, while the fielder is credited or debited value from that point depending if he makes the play. With the growing popularity and availability of Statcast data, this splitting of WPA credit between pitchers and fielders might be possible.

### Conclusion

After adjusting WPA to account for replacement level and bullpen chainging, Zach Britton remains one of the top five most valuable pitchers in the American League in 2016, and only Justin Verlander is significantly ahead of him. But the lead he held in WPA has disappeared. WPA is a fine metric, but it’s incomplete. You can’t forget replacement level and all of its repercussions. With WPAAR, I think we have a metric that is more closely aligned with a pitcher’s true value.

### References & Resources

- 2016 WPAAR Final Results
- Brandon Heipp, Walk Like A Sabermetrician, “Runs Per Win From Pythagenpat”
- Dave Studeman, The Hardball Times, “Postseason probability added”
- Sky Andrecheck, The Baseball Analysts, “Championship WPA: What Portion of a Title Did A Player Contribute?”
- Dan Hirsch, The Hardball Times, “The Top 10 Plays of 2016 According to Championship WPA”
- Sky Kalkman, Beyond the Box Score, “Bullpen Chaining and Reliever WAR”
- Tom Tango,
*The Book*Blog, “Understanding WPA for starting pitchers” - Tom Tango,
*The Book*Blog, “Goodbye to you, WPA for starting pitchers” - Tom Tango,
*The Book*Blog, “Death knell of WPA for starting pitchers”

Print This Post

Right. Number of outs left is another way of saying that leverage increases during the game, as measured by leverage index. I think the author is assuming people understand what leverage index is and how it works.

In point #2, you’re write that it’s the inverse, I misread what you wrote. But that just means we’re in agreement, I think. Do you think I implied the other way somewhere?

And there’s leverage considerations for hitters, too! Middle-of-the-order lineup spots tend to hit in slightly higher leverage situations, because there are likely to be more runners on base in front of them. WAR doesn’t give them this leverage credit. Should it? I’d argue yes, because just like a dominant reliever deserves to pitch in higher leverage situations, a top hitter deserves to hit in the middle of the lineup. Should it give them all of the leverage credit? Probably not. How much credit, then? Uhh, err, I’ll get back to you.

Just use RE24 for batters! That’s my preference, anyway.

Now we’re starting down a slippery slope. Players on teams with great offense also hit with runners on base more often, on average, than players on lesser offensive teams [think Josh Donaldson vs Mike Trout, 2015]. That is probably going to translate into higher average leverage, though it depends on the average runs allowed by the team, too. In this case, higher leverage production clearly correlates with Runs and RBI, context-dependent stats that I thought no saber would be caught dead praising.

Studes, for your catcher chaining example, I assume we should be using the value of the average backup catcher in the league for the replacement-level calculation, correct?

Peter — I’m multiplying WPA by [(LI+1)/2)]/LI. If LI is 1.8, that’s [1.8+1]/2]/1.8 = 1.4/1.8 = .78. This ratio will always be less than 1, shifting WPA towards zero. The reasoning is that while WPA correctly values the change in WP for any event, relievers don’t deserve full leverage when determining their value to the team — their replacements would also be “given” those leverage opportunities. (It’s the same logic as the LI adjustment in reliever WAR.) Think of it as WPA/LI, but instead of removing all the leverage of each PA, just a bit is removed.

Thanks for the explanation Sky. I knew you were multiplying by (LI+1)/2, but I had missed that you were then dividing the result by LI.

Thanks Peter. To me, WAR is a descriptive stat, which is why I’m all in on RE24. On Twitter, Sky asked me if I like RE24/boLI. That’s one way of correcting for variations. Unfortunately, my memory doesn’t work so well. I remember that I wasn’t a fan of boLI, but I can’t remember why. Getting old sucks.

I haven’t fully digested the theory, and I tend to doubt there can be one metric that lets RPs be gauged against both RPs and SPs. But I admire the effort, and the clear explanation. Thanks, Sky!

We’re back to recap the most noteworthy sneaker releases of the weekend ahead, ensuring that you stay plugged into the source and are able to cop with the swiftness.