# 10 Lessons I Have Learned about Win Probability Added

*Editor’s Note: This is the second post of “10 Lessons Week!” For more info, click here.*

Ten years ago, I wrote an article called The One About Win Probability (*Friends* was a big hit on TV at the time), and it has turned out to be the most-read article in THT’s history. I’ve written about Win Probability many times since, most notably every year in the *Hardball Times Annual*, and I’ve learned a few things about WPA along the way. Allow me to share.

### Lesson #1: Win Probability Graphs are Cool

Source: FanGraphs

The box score is perhaps the most beautiful display of data in the known universe. It contains so much terrific information in a compact space. But it lacks one thing: the dynamic of the game. Box score information is static, but games happen in real time.

Exhibiting runs scored by inning helps capture the dynamic of individual innings, and Game-Winning RBIs helped capture the dynamic of individual hits (which, I think, was the primary appeal of that otherwise irrelevant stat). But WPA graphs are much better at capturing the dynamics of the total game. They are a natural accompaniment to the box score.

The above graph is from a game played on April 23 this year, when the Angels had a 4-1 lead on the Nats in the ninth inning. Their biggest hit had been Albert Pujols’ double in the top of the sixth. But thanks primarily to Jayson Werth’s double and then Adam LaRoche’s single, the Nats staged a statistically improbable comeback in the ninth to win the game, 5-4.

Everything I just told you came from looking at the graph. If you hold your mouse over any point in the game, you’ll see much more … exactly what happened on each play and how much it impacted the game. If you had perused just the box score, you might have missed what made this game special.

### Lesson #2: Win Probability is the Story Stat

Win Probability is the ultimate quantification of a game. It captures the specifics of each situation—the score, inning, base situation and number of outs—and reduces them all into one number, the likelihood that one team or the other is going to win the game.

It’s important to remember that Win Probability isn’t a forecast. It’s a measure by which to judge the outcome of each play and the game. WP assumes that every play after the one in question will be average. Each player will be average, and each team will be average. In that way, we don’t prejudge a play’s impact.

When you look at the change in Win Probability from one play to another—which we call Win Probability Added, or WPA—you quantify how important that particular play was in the story of the game. This accounts for the swings on the graph.

If you sum up a player’s WPA score in a single game, you get a sense of how involved he was in the key plays of the game. And if you want to rank games by sheer excitement, you simply have to add up the total number of absolute swings in Win Probability throughout the game.

For the past two years, I’ve published a list of the most watchable games of the season by combining WPA and LI (read more about LI below) into a ranking system. WPA makes it easy to decide how to spend your time watching MLB.tv in the offseason.

Shane Tourtellotte has developed his own take on the system, called Win Percentage Sum. Shane uses WPA to emphasize the top plays as well as the last play of the game. There are a lot of ways you can go at the task. They key thing is that WPA gives us a tool to measure the excitement, to measure the story, of the game.

### Lesson #3: Win Probability is Just an Extension of the Run Expectancy Chart

Most baseball analysts have no problem using run expectancy charts to measure run contributions, but many seem to have a problem with the Win Probability approach. Let’s discuss.

I recently wrote about wOBA and how linear weights have become the fundamental methodology behind many, if not most, of our most advanced stats. As you may know, linear weights are an extension of the run expectancy chart, as Tom Ruane most memorably recorded at Retrosheet.

For reference, I’m putting Tom’s run expectancy chart for the 1992 American League here. These are the number of runs the average 1992 team scored in an inning after reaching each specific situation:

Men OnNumber of Outs

Tom Ruane’s 1992 American League Run Expectancy Chart | |||
---|---|---|---|

FST |
0 |
1 |
2 |

— | 0.482 | 0.258 | 0.096 |

x– | 0.853 | 0.51 | 0.211 |

-x- | 1.095 | 0.646 | 0.293 |

xx- | 1.494 | 0.907 | 0.423 |

–x | 1.356 | 0.94 | 0.377 |

x-x | 1.804 | 1.151 | 0.47 |

-xx | 2.169 | 1.418 | 0.598 |

xxx | 2.429 | 1.549 | 0.745 |

There are two elements to the run expectancy chart, the runners on base and the number of outs that have been recorded. You can think of the number of outs as an inning timer—when there are two outs in an inning, time is running out and the impact of events is heightened.

For instance, a single that scores a runner from second is worth 0.758 runs with none out (1.000 plus 0.853 minus 1.095) and 0.918 runs with two outs (1.000 plus 0.211 minus 0.293). The fact that time is running out in the inning makes the event worth 21 percent more.

Win Probability tables add two more elements to this logic, runs scored in the game (or the difference between the two … the Run Differential) and the inning (the game timer). In fact, it’s relatively easy to construct your own Win Probability tables from Run Expectancy data. I did so in an Excel spreadsheet that I published eight years ago. (Man, I feel old.)

A single that scores a runner on second with two out in the bottom of the first of a tie game is worth 0.09 WPA points, a ninth of a difference between a win and a loss. The same hit in the bottom of the ninth is worth 0.39 WPA, an increase of 333 percent over the first-inning hit. When you talk about WPA, the game timer has a big impact.

A lot of people are uncomfortable with this aspect of WPA. A 333-percent jump is just too big a number to be reasonable; it takes more than four hits in the first inning to equal the impact of the same type of hit in the ninth. Even though the logic is similar to that of Run Expectancy, the Win Probability impact just feels too extreme.

I understand why people feel this way. Because of the extreme nature of WPA and the fact that opportunities aren’t evenly divided among players, WPA is not the best stat for rating players. However, it seems to me that ranking games by WPA, or using it to quantify key plays or even players within a game, is a legitimate use of the stat. If you remember that WPA measures the story and not the value, you’ll be on solid ground.

### Lesson #4: The Difference in Impact Based on Game Run Differential

The most common complaint about WPA is that, once the game is over, it doesn’t matter when the run scored. A single run scored in the first inning of a 1-0 game yields the same result as a single run in the ninth inning of a 1-0 game.

Okay, but what about the difference between a 1-0 game and a 10-0 game? Are you willing to say that a home run in a 1-0 game is worth more than a home run in a 10-0 game? After all, WPA captures this difference, too. Perhaps you want to capture this situational aspect of a player’s contribution, but you feel that WPA has too much other baggage. If so, I have a solution for you.

I first introduced this concept in an article I called Long Live Baseball Analysis. As you might expect, WPA shows that events in close games have a bigger impact than those in not-so-close games. Here is a table of how the impact of an event varies by the final score of the game. The data are from 2006, but other years provide similar results.)

Relative WPA Impact of Events by Final Margin of Game | |
---|---|

Margin |
Impact |

1 | 1.38 |

2 | 1.13 |

3 | 0.97 |

4 | 0.86 |

5 | 0.76 |

6 | 0.66 |

7 | 0.63 |

8 | 0.57 |

9 | 0.51 |

10 | 0.47 |

The numbers aren’t as round as I would like, but you can see that a home run in a 1-0 game (or any event in a 1-0 game) is worth nearly three times as much as a home run in a 10-0 game (or any event in a 10-0 game). The difference of 1.38 to 0.47 is a relative ratio you can use to judge such a thing.

I’ve wanted to invent a stat—let’s call it Game-Adjusted wRC—that takes this into account. I haven’t had the time, but the lesson I’ve learned is that there is something here worth capturing. A stat that captures the average impact of a player’s hits in both runs scored and games won can be a viable and useful way of saying that Player A “contributed” more than Player B. And there is a way to develop such a stat that avoids the extreme impacts of WPA.

### Lesson #5: It Defines Critical Situations

Perhaps the most useful thing to come out of Win Probability is Leverage Index. LI is a scale developed by Tom Tango that quantifies the importance of a situation based on the range of potential outcomes from that situation.

Tom introduced his concept in a three-part series here at THT:

The key to Leverage Index is to measure the range of potential outcomes from any single plate appearance. The wider the range of potential outcomes, the more critical the plate appearance is. It’s a straightforward concept, but the math is tricky. Most of us rely on Tom’s numbers to look up the Leverage Index of a situation.

However, I can tell you a shortcut. You can mimic the relative scale of Leverage Index by calculating the difference in WPA between the current situation and the WPA that would result after a strikeout. The only exception is with a runner on third and fewer than two outs, for some fairly obvious reasons (it’s the same reason we have the sacrifice fly).

I sometimes hear people say that WPA doesn’t “work,” that Leverage Index, for instance, doesn’t really measure the most critical situations. I find this hard to believe, so I decided to pull together some data points from the 2012 season. Consider the following…

- If a team got an out in a situation with the LI between 1.5 and 2.0, it eventually won the game 60 percent of the time.
- If a team got an out with the LI between 2 and 2.5, it won the game 62 percent of the time.
- If it got an out with the LI between 2.5 and 3, it won 68 percent of the time.
- If it got an out with the LI between 3 and 4, it won 69 percent of the time.
- If it got an out with the LI between 4 and 5, it won 71 percent of the time.
- If it got an out with the LI over five, it won 76 percent of the time.

The more critical the situation, the more impact its outcome had on the outcome of the game. I don’t know how to make the case more strongly: Leverage Index does a very good job of measuring the criticality of a situation. There may be better stats for this purpose, but I haven’t seen one yet.

One more thing: Leverage Index is a very good tool for determining optimal bullpen usage. In general, you want your best pitcher on the mound when the game is most critical. Now, LI doesn’t measure everything related to bullpen usage. The strength of the opposing batters (and of those coming later) is key, as is the lefty/righty match-up and the status of the pitcher. (Is he rested? Has he warmed up sufficiently?) But it’s a natural starting place, a foundation for a decision.

### Lesson #6: It’s Great for Rating Bullpens

A natural use of WPA and Leverage Index, and probably the most accepted use of WPA in general, is its relevance to the bullpen. WPA, with its heightened focus on close games and late innings, is a valid way to measure the contribution of a reliever. We may shy away from ranking batters based on WPA, and there is a definite difference between starting pitchers and relief pitchers that negates direct WPA comparisons. But WPA gets to the nub of the reliever’s job: perform well in high-leverage situations.

The 2012 Orioles bullpen was the best of all time, as measured by WPA. Their relievers’ total WPA was 13.9, far ahead of the No. 2 team, the Detroit Tigers of 1984 (which featured AL MVP Willie Hernandez). I wrote about the Orioles’ bullpen in detail in the 2013 *Annual*, and every aspect of its performance was tremendous. Baltimore’s relief men stood out in most of the ways a bullpen can rack up high WPA points:

- The Orioles’ relievers pitched more innings than most other teams’ in 2012 (their innings pitched was the fourth-highest figure in the majors).
- They pitched very well in general. Their WPA/LI (more about WPA/LI below) was second-highest in the majors.
- Their overall Leverage Index was higher than average, though not outrageously so.
- Most importantly, they “pitched to the situation.” They were at their best when Leverage Index was highest.

It’s funny how things change. In 2013, the Orioles’ bullpen WPA was 0.42, 21st in the major leagues. Why? Well, take a look at their performance in their 10 most critical appearances of the year. In seven of the 10 appearances, the Oriole reliever posted a negative WPA total.

Top 10 Most Critical 2013 Orioles Bullpen Appearances | |||
---|---|---|---|

Game Date |
Pitcher |
MaxOflli |
WPA |

2013-07-05 | Jim Johnson | 9.0 | -0.805 |

2013-05-26 | Tommy Hunter | 7.8 | 0.043 |

2013-05-18 | Jim Johnson | 7.5 | -0.820 |

2013-05-26 | Jim Johnson | 7.4 | -0.955 |

2013-08-14 | Jim Johnson | 7.1 | -0.313 |

2013-04-24 | Jim Johnson | 6.9 | -0.279 |

2013-08-09 | Jim Johnson | 6.4 | -0.329 |

2013-09-20 | T.J. McFarland | 6.4 | 0.249 |

2013-09-29 | Jim Johnson | 5.8 | 0.077 |

2013-06-10 | Tommy Hunter | 5.5 | -0.026 |

The Orioles had the guy they thought was their best reliever (Jim Johnson) on the mound in most of their high-leverage situations last year. Unfortunately, Johnson was not as brilliant as he had been in 2012. Truth be told, most of the bullpen wasn’t.

By the way, WPA provides ways to measure a reliever’s performance and contribution that are far better than saves and holds. In addition to WPA and LI, FanGraphs carries the number of each pitcher’s shutdowns (any time a pitcher adds at least .06 WPA points to his team) and meltdowns (any time a pitcher loses at least .06 WPA points for his team). I’m looking forward to the day when we quote shutdowns instead of saves.

### Lesson #7: It Identifies Clutch Performances

Does clutch hitting exist? Well, that is a question sure to make any sabermetrician spin in his computer desk chair. But at least WPA gives us a way to measure clutch performance. By comparing a player’s WPA/LI in general to WPA/LI for each specific play, we can measure how well he batted in high-leverage situations compared to how well we would have expected him to bat.

The Clutchiest batters of the past 10 years have been:

Clutch Leaders, 2004-2013 | |
---|---|

Batter |
Clutch |

Willie Bloomquist | 6.2 |

Yadier Molina | 5.6 |

Jimmy Rollins | 5.2 |

Marcus Giles | 4.7 |

Ryan Howard | 4.5 |

Yeah, Willie Bloomquist has risen to the occasion the most in the past 10 years. Now, this doesn’t mean that Bloomquist is the No. 1 guy you want at the plate in a clutch situation. Even clutch Bloomquist is worse than a lot of other batters. You would probably opt for Ryan Howard, Jimmy Rollins or just about anybody from this list. It does mean that, somehow, Bloomquist has managed to exceed expectations when the game is in critical condition.

Said differently, Bloomquist has managed to positively insert himself into the game story more often than you’d expect based on his nominal stats. He may be overhyped in some other ways, but in this way Bloomquist deserves our respect.

### Lesson #8: You Can Apply WPA Logic to a Season

Having called Win Probability a story stat, I should say that it is really more of a method (methodology?) than a stat. What’s more, it’s a method that you can apply to other facets of a baseball seasson.

For instance, if we say that plays towards the end of an inning are more important than those at the beginning, and innings at the end of a close game are more important than those at the beginning, how about late games in a season for a contending team?

Independent of each other, Sky Andrecheck and I asked ourselves this very question a few years ago. I developed a system called Drama Index, and he called his system Championship Leverage Index, but they amount to the same thing.

We both looked at each team’s probability of making the postseason at each point of the season based on the standings before each game, as well as the number of games left on each team’s schedule. We used these two figures to calculate a Season “Leverage Index,” in which each game is rated in terms of its criticality.

There probably has been no better season for Championship LI in recent memory than 2011, when four critical games were played on the last day of the season … and three of those games were WPA humdingers.

Tampa Bay played in one of those games. The Rays had been out of the pennant race most of the year, with a CLI between 1.0 and 2.0 for most of the season until they seemingly dropped out of pennant contention in August. They managed to scramble back into things in September, however, and posted a CLI of 9.7 on the last day. That game was nearly 10 times more critical than an average major league game.

Tampa Bay wound up with 192 CLI points in 2011, the third-highest total in the American League. The Rangers (who made the postseason) and the Angels (who didn’t) were the only two teams with higher scores.

Calculating Championship Leverage Index is a challenge, and MLB’s recent changes to Wild Card qualification has made our old code obsolete. As a result, we haven’t published any results in a couple of years. It would be fun to make this available again.

### Lesson #9: You Can Apply WPA Logic Even to the Postseason

I have a lot of fun applying WPA logic to the postseason. It seems to me that the postseason is more about the story than the stats, which makes WPA a natural fit. In fact, one of the early uses of WPA was an article by Jay Bennett, who used it to make a case that Joe Jackson didn’t throw the 1919 World Series.

What’s more, the postseason offers a pretty easy way to describe how the WPA method works. It all comes down to the range of possible outcomes of a game (i.e. a win or loss). When two teams play a seventh game of a World Series, there is one full world championship at stake. Think of the possible outcomes: One team wins and is 1-0 in championships; the other team loses and is 0-1 in championships. One minus zero is one, so we give the seventh game a criticality value of one.

When two teams play the sixth game of a World Series, there are two possible outcomes. The team that is ahead in the series might win, which would give it a championship. Or it could lose, which would result in a seventh game … which the team has a 50 percent chance of winning. The difference is one minus 0.5, or 0.5.

Conversely, the trailing team could win, which would give it a 50 percent probability of winning the seventh game, or could lose and lose the championship overall. The difference is 0.5 minus one, or 0.5, same as the leading team.

The sixth game is half as critical as the seventh.

In all cases, a postseason game is equally critical to the leading and trailing team.

You can use this approach for all games in a series. You can even apply it to previous series. For instance, the final game of a League Championship Series will be worth 0.5 world championships, because the winner has a 50 percent probability of winning the World Series while the loser gets zero world championships. So you see that the final game of a league championships series is as critical as the sixth game of the World Series.

For your reference, here’s a table of the Championship Leverage Index for all potential games in the postseason.

Championship Leverage Index of each Postseason Game | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Series |
0-0 |
1-0 |
1-1 |
2-0 |
2-1 |
2-2 |
3-0 |
3-1 |
3-2 |
3-3 |

Wild Card | 0.125 | |||||||||

Divison Series | 0.094 | 0.094 | 0.125 | 0.063 | 0.125 | 0.250 | ||||

Championship Series | 0.156 | 0.156 | 0.188 | 0.125 | 0.188 | 0.250 | 0.063 | 0.125 | 0.25 | 0.500 |

World Series | 0.313 | 0.313 | 0.375 | 0.250 | 0.375 | 0.500 | 0.125 | 0.250 | 0.500 | 1.000 |

Since we’re dealing with just two results here, win or lose (as opposed to WPA, which has to deal with home runs, stolen bases, outs, etc.), it’s fairly straightforward to show the relationship between the Championship Added and the Championship Leverage Index. In fact, they’re (almost) the same thing.

At the beginning of a World Series, we assume that each team has a 50 percent chance of winning the Series. If a team wins the first game, it has a 65.6 percent chance of winning three of the next six. But if it loses, it has a 34.4 percent chance of winning four of the next six games. This is based on simple probability.

The difference—.656 and .344—is what’s at stake in the first game. .656 minus .344 is .313 (okay, I’m rounding here), which is the championship value of game one. The Championship Added of winning that first game is .656 minus .500, or half of .313: .156 (rounding!). In every case, the Championship Added of winning a game is exactly half of its Championship Leverage Index.

When you multiply each game’s championship value by each play’s WPA, you find some really interesting things. For instance, Hal Smith, not Bill Mazeroski, swatted the biggest home run in Pittsburgh Pirate history; for that matter, in major league history. And Willie McCovey faced Ralph Terry in the most critical at-bat of all time.

There is almost no end to the things you can research and write about using WPA and its offshoots. Baseball history is rich, and there are many stories to tell.

### Lesson #10: WPA/LI Might Be the Ultimate Stat

Think about it for a second. The value of winning a postseason game is always exactly half of its Championship Leverage Index. So when you divide one by the other, you get an equal value for all wins. Dividing the “Added” part of the system by its LI averages things out and makes each win equal in importance.

This is how you make a run in the first inning equal to a run in the final inning of a close game: divide by Leverage Index. Leverage Index is the great equalizer in the WPA fabric.

This is kind of esoteric, but when you divide the WPA of a specific play by its Leverage Index, you get a number that indicates who “won” the at-bat, and by how much. WPA/LI, which is sometimes called situational wins, is a funky stat. It is impossible to describe and its units are unclear. Let’s try to get at this with an example.

In 2013, Mike Trout led the majors in WPA/LI despite being ninth in WPA. According to WPA, his biggest hit of the season was a run-scoring single in the bottom of the eighth of a tie game. Given what we know about WPA, this makes sense, right? But WPA/LI says that his biggest hit of the year was a home run in the top of the first with two out. This also makes sense. Why, you ask?

Well, first of all, home runs are the biggest hits of all, so a home run should be at the top of most “best hit” lists. Secondly, Trout hit his home run at a most opportune time because there were two out and no one was on base in a tie game. Isn’t a home run the best “response” to that situation? When you lead off an inning, a single, walk, double, etc. are all reasonable alternatives to a home run. But with two out and no one on, a home run is far and away better than all other alternatives. WPA/LI measures how well the player “won” the situation.

I don’t know how to better describe what WPA/LI does, but I hope you can see how useful it can be. A few years ago, I used it to assess the impact of all those unintentional walks to Barry Bonds. I have a feeling that I have just scratched the surface of its usefulness.

So WPA/LI has a lot to recommend it. It evens out opportunities between all players so that no one benefits from having more high-leverage situations. It quantifies how well each player performed within the context of the situation. WPA/LI might just prove to be the “ultimate stat.”

Try to keep an open mind.

### References and Resources

Tango’s blog and website have many excellent sources for learning more about WPA, Leverage Index and WPA/LI. Here’s a link to his website and here’s a link to the run/win expectancy posts on his old blog.

Dave, I met you briefly at the analytics conference in Phoenix. Not only do you have a terrific baseball mind, you’re a good guy, too.

Thanks, Jim! Nicest comment ever.

I have a counter-example for you, courtesy of Tango. Bases loaded, two out, bottom of the ninth, tie game. A walk and a home run have the exact same impact in that situation. wOBA and WAR give more credit to the home run. WPA and WPA/LI are the only stats that treat home runs and walks the same.

See my comment below. I love theses stats for these reasons, though I’m not a big fan of the “clutch” stat….

I thought WPA/LI stripped out inning and score before reading Tango’s comment as well. It removes some of the effect but not entirely, according to Tango. If you want the version of this stat with no consideration of inning and score it is BRAA.

Accuracy is obviously the best criterion, but accuracy of what? For me the ultimate stat is accurately predictive, but for you it is accurately descriptive—is that a fair summary?

Simon, you’re taking my “ultimate stat” line out of context. Read the sentence before it: “It quantifies how well each player performed within the context of the situation.” So, of course, accurately descriptive is more important to me. Everything I wrote in this article pointed to that. I don’t care how well a stat like this predicts the future at all, and I see no reason to be concerned about it.

BTW, I would guess that WPA is about as predictive as WAR, as consistent for a specific player from season to season.

Thanks, Simon. I agree that WPA and WPA/LI aren’t designed to be predictive, but my guess is that they actually are as predictive as WAR, at least. Might be a fun study, though I’m guessing someone has already done it.

I agree that WPA at its core is as predictive as WAR, though high-leverage events might make WPA a bit “noisy” if you’re interested in developing a forecast tool. WPA/LI probably would be just as noisy as the offensive portion of WAR because it blunts the impact of high-leverage events.

Two suggestions for the WPA/LI graphs. When you hover your cursor, it shows the score and play. Why not also show outs and runners on? It always shows the biggest hits of the game, but how about the biggest outs?

It would be nice for FG to make it clearer on those charts, but you can easily determine the base/out state by looking at the last few plays. As far as showing the biggest outs, I *believe* FG will actually highlight (with a red dot) all the “key” events regardless. It’s just that, since an out is always the more likely outcome, getting an out doesn’t cause as big of shifts in WE as hits can.

Tango, is the correlation between WPA/LI from year T to year T+1 lower than the correlation between wOBA in year T to WPA/LI in year T+1 because of the year-to-year variation in the “situational” component of WPA/LI?

Just trying to take the point you just made and put it to bed. Rather focus on the descriptive role of these stats myself.

Right, there’s extra noise in WPA/LI.

I don’t know if the correlation is lower, but even if it is, it’s not going to be noticeably lower, and certainly not that you can put a line between that and wOBA.

Great article! I have one question: in number 4 above, you mention that a single that scores a runner from second is 21% more valuable with 2 outs than 1 out. However, it would seem that some events are worth the same regardless of the number of outs, for instance a double that scores a runner from second (worth 1 run regardless of the number of outs, since the change in run expectancy is 0). How does Win Expectancy take this into account?

Hi John,

You’re right that some hits are worth exactly one run (or X number of funs) regardless of the number of outs…specifically, hits that result in the same situation after the hit as before. But the Win Expectancy of the hit will differ if there are differences in the score or inning.