Analsyis and WPA – WE Velocity

One of the coolest things about Win Expectancy, in my opinion, is the way that it can be graphically represented. Of course, most people reading this post will know this from the classic Win Probability graphs we have here at FanGraphs. With this graphical representation, we can investigate beyond the simple line on the screen using the branch of mathematics known as analysis.

Analysis, better known as calculus, is used to solve two problems. One is finding the area contained under a curve, and the other, which will be used in this post, is to find the slope of a curve. In physics, if a graph is presented with time on the horizontal axis and position on the vertical axis, such as this graph (LINK) from HitTracker Online, the slope of the curve represents the velocity. Similarly, the slope of the line of a Win Probability graph in a way represents the “velocity” of a baseball game.

A Win Probability graph is composed of broken line segments, with each line representing the WPA of a play in the game. The slope of each line segment is then equal to the WPA of the corresponding play. Unfortunately, we can’t find the slope at the point where the broken line segments meet, because the slopes are different on each side of these points. In order to counteract this, we need to model the Win Probability graph with a curve. In this case, I choose to use something called a cubic spline, because it’s both accurate and relatively easy to compute with mathematical software. This makes it possible to find the slope at the points between the plays by taking the derivative of the curve.

Let’s show this with the 163rd game of the Twins and Tigers. Below is the graph for that game.

Here is the win probability graph with the derivative, or “WE Velocity” on the same scale. Win probability is set so 0 = 50% for both teams so that both graphs can be shown on the same axis.

So you can see, even when the game appears stagnant in the early innings, there are still quite a few minor tosses and turns in the game. What really stands out, however, are the massive peaks and valleys towards the later innings, where each individual play has a massive impact on the end of the game, as we can also see just with simple WPA. Near the end of the game, we can see the game start to really take off, as WE velocity reaches as high as 23% of a win per play.

Some other interesting points are those where the derivative crosses the horizontal axis. Those are turning points, as they indicate a local maximum of win probability for one of the teams. The major turning points typically come after extremely high points of WE Velocity. For example, between play 99, in which Don Kelly singled and advanced on the throw to third to put runners on first and second in the top of the 12th and play 102, in which Gerald Laird struck out to end the inning, the win velocity went from 13.8% of a win per play in favor of the Tigers to 18.7% of a win per play in favor of the Twins. The Twins would go on to win the game in the bottom of the inning, and one might even surmise that the unfortunate end of the inning for the Tigers took all of the air out of their sales.

I don’t know if this is particularly useful at all, but I do think it is an interesting thing to see the peaks and valleys of a baseball game quantified. If there are any games in particular you would like to see this analysis with, please leave a message in the comments section.



Print This Post



Jack Moore's work can be seen at VICE Sports and anywhere else you're willing to pay him to write. Buy his e-book.


Sort by:   newest | oldest | most voted
Bill
Guest
Bill
6 years 2 months ago

Analysis*

Sky Kalkman
Member
6 years 2 months ago

How is this different from plotting WPA?

Preston
Guest
Preston
6 years 2 months ago

Not sure if the graphs go back that far (or if they’re easily generated), but if so, I’d be curious about the Dodgers-Padres game on September 18, 2006 – known to Dodger fans as the “4+1 game,” when the Dodgers hit 4 consecutive homers in the 9th to tie it, only to see the Padres score in the 10th, setting up Nomar’s game winning 2-run homer in the 10th.

Zac
Guest
Zac
6 years 2 months ago

Preston, I think baseballreference has them for every game in which retrosheet has play by play.

The game you’ve mentioned is at http://www.baseball-reference.com/boxes/LAN/LAN200609180.shtml#wpa . You can see that San Diego had a 98% chance of winning before those home runs. It fell to 95%, then 90%, then 78%, then 35% after each of those home runs. Even so, they were back up to a 65% chance of winning before Nomar hit that final home run.

Apparently you can use the Play Index to find top WPA performances (according to http://www.baseball-reference.com/blog/archives/4718 ), but I must be doing it wrong because I can’t get it to work.

Zac
Guest
Zac
6 years 2 months ago

Alright, I figured out one thing I was doing wrong. Barry Bonds is the all-time leader for players we have information for. Bonds greatest game of all time according to WPA was August 12, 1991, http://www.baseball-reference.com/boxes/PIT/PIT199108120.shtml#wpa , when the Pirates beat the Cardinals 4-3. Bonds had all 4 RBI on 2 HRs. His second HR in the 11th won it when the Cardinals win expectancy was 80%. His first home run was worth another 38%. All in all, Bonds is credited with 1.119 WPA for that game.

The Grammarian
Guest
The Grammarian
6 years 2 months ago

“Air out of their sales” or “air out of their sails”?

SagehenMcGyver47
Guest
SagehenMcGyver47
6 years 2 months ago

maybe they were selling blimps

bluejaysstatsgeek
Member
6 years 2 months ago

Interesting. Last year, I started working on something I was tentatively calling an “Excitement Index”, a simple way to determine how exciting a game was. I developed three measures, which are all highly correlated, based on a small sample of 7 games that I tried them on.

The simplest was to compute the mean Leverage Index. The second was the Root Mean Square(WPA), or RMS(WPA), which as implied, squares each play’s WPA, computes the average and takes the square root. The third, takes the mean of the absolute values of of the individual play’s WPA. In all three cases, I use 54 as the divisor rather than the actual number of plays, as 54 (51 with a home team victory) is the minimum number of plays. This rewards extra inning games. Maybe “Excitement Index” would be a misnomer, and “Value Index”, to determine how much baseball entertainment the fan gets for the price of a ticket, would be better.

For Game 163, these values are: 3.796, 0.1428 and 0.1618, respectively. Compare these to the game two days earlier between Washington and Atlanta, where the values are 4.484, 0.1366 and 0.1202.

Game 163 had more large swings in WP, but the WN/ATL game had a more back and forth excitement.

AndyS
Member
AndyS
6 years 2 months ago

Could you add a plot of the magnitude of the velocity?

Jon
Guest
Jon
6 years 2 months ago

I’m not sure why you need the cubic splines. Couldn’t you just measure the slope of the line that connects two points on a WPA graph to determine the “WE Velocity” of each play? But wouldn’t that figure just be what we already know about how much a play contributes to a team’s likelihood of winning?

Splines allow us to “smooth” curves and make them differentiable (and integrable), but I’m not sure why we would want to eliminate the discrete set of inputs in WPA graphs (plate appearances) with the continuous model proposed by fitting a curve with splines.

Interesting, but not quite sure what to do with it…

danmerqury
Member
6 years 2 months ago

This was my first thought as well. WPA is already disjointed, discrete data. You could differentiate each line between each open point (i.e., each play). Any kind of polynomial modeling would necessarily distort the numbers a little (not that it really matters all that much).

danmerqury
Member
6 years 2 months ago

Jack, the mathematician in me thinks this is pretty sweet, but there’s one big problem with it–WPA is not a continuous function. It’s composed of discrete points, all evenly spaced. That even spacing is key. There’s nothing you can get from a WE velocity derivative graph that you can’t get from simply measuring the WPA difference between two points. Slope is rise/run, but if every line’s run is 1 play, then the slope (i.e., the WE velocity) is just the rise.

tangotiger
Guest
tangotiger
6 years 2 months ago

After much discussion with us a few years ago, David already computes average LI and the average win expectancy of each game. (It’s on each game’s scoreboard.)

The higher the LI, the higher the “drama”. The the closer to .500 the average win expectancy is, the “tighter” the game.

I would say that if someone wants to introduce something new, use those as the baseline benchmarks, and then tell us what your method shows that these two metrics don’t.

bluejaysstatsgeek
Member
6 years 2 months ago

If I might address your comments in the context of what I had posted:

I think there are two types of drama in a game, as exemplified by the two games I discussed. There is the tight close game, typically a pitching/defense dominated game, like the WN/ATL affair. In these games the tension or drama is high the entire time, and well displayed graphically by the LI graph below the main WP graph. In this case, the LI is probably the best indicator or tension.

The other time of game is the rollercoaster that was exemplified by Game 163. In that game the LI graph has more ups and downs as there were 6 win/loss/tie changes so a fan of one of the Twins or Tigers has more emotional ups and downs, which is a different type of excitement. Either the mean absolute WPA [MA(WPA)?] or the RMS(WPA) do a better job of capturing this type of tension. I’m still trying to decide which of these I like better,

I guess I’m simply trying to derive a single number measure that captures each of the two pictures: aLI or my adjusted version, captures the story on the LI graph, while RMS(WPA) or MA(WPA) captures the story of the Win Probability graph.

Finally, my measures differ by dividing by 54, which normalizes these measures to a minimum 9-inning game as the more plays there are, the more entertainment value. That is probably biased against a game in which both pitchers are posting a no-no which gets decided in the last at bat.

bluejaysstatsgeek
Member
6 years 2 months ago

Tom: I have tried to replicate the aWE numbers without success and I cannot find a link to it’s computation. Maybe I’m missing the obvious, but can you point me in the right direction, please?

bluejaysstatsgeek
Member
6 years 2 months ago

Never mind. Delete or ignore the second comment. I am able to replicate the aWE now. I had a error in a anchored value in my spreadsheet which was giving bogus results.

(Gee, if one of my students had done that…)

giemer
Guest
giemer
6 years 2 months ago

I don’t understand the use of cubic splines.

You aren’t operating in continuous space, a discrete model for the derivatives is appropriate, and easy. Otherwise you are arbitrarily adding bias based on the method of curve fitting… no?

J Bravo
Guest
J Bravo
6 years 2 months ago

The curve fitting doesn’t add a lot of bias, but it is unnecessary when you could just use a discrete model: difference equations instead of differential equations.

However, this is less interesting when you consider that the “differences,” that is, the slope between each discrete time, is simply the WPA of that event.

A way to measure how this aggregates over the game would be to take the absolute values of the WPAs (that is, add both teams’ WPA positively). The higher the sum, the more “back-and-forth” there was in the game. If the sum is low (say, close to 1), then the game was simply a steady march to a seemingly inevitable conclusion.

Zac
Guest
Zac
6 years 2 months ago

I need to stop posting, but this is so much fun! The greatest game by a pitcher in terms of WPA (that we have a record of, of course), was Jack Harshman’s August 13, 1954 performance. He and Al Aber both pitched complete games, with Harshman’s White Sox winning 1-0 when Minny Minosa tripled in the bottom of the 16th! Harshman had a WPA of 1.612, while Aber had a WPA of 1.112 in a losing effort.

And here we are, the Bonds game I mentioned earlier is only 18th all time on the hitting list. The greatest game was by Art Shamsky, on August 12, 1966. It’s even more amazing because Shamsky only came into the game at the top of the 8th pinch hitting for the pitcher and then moving to left field. In the 8th he hit a 2 run homerun to give the Reds a 1 run lead. Pittsburgh tied it in the 9th, and then took the lead in the top of the 10th. In the bottom of the 10th Shamsky hit another home run to retie the game at 9 all. Once again the Reds pitcher failed to deliver, and the Reds were down 2 in the bottom of the 11th. ONCE AGAIN Shamsky hits a home run, another two run shot, and the game is again tied!

Sadly the game did not have a happy ending for Shamsky, as 6 of the next 7 Reds failed to get a hit, Pittsburgh scored the winning runs in the top of the 13th, and the game ended with the #7 batter hitting into a double play (Shamsky was batting 9th).

So 3 ABs for a WPA of 1.50. And in a losing effort, no less.

wpDiscuz