Offensive Volatility and Beating Win Expectancy

Armed with a new measure for offensive volatility (VOL), I wanted to revisit research I conducted  last year about the value of a consistent offense.

In general, the literature has suggested if you’re comparing two similar offenses, the more consistent offense is preferable throughout the season. The reason has to do with the potential advantages a team can gain when they don’t “waste runs” in blow-out victories. The more evenly a team can distribute their runs, the better than chances of winning more games.

I decided to take my new volatility (VOL) metric and apply it to team-level offense to see if it conformed to this general consensus*.

To determine a team’s overall offensive VOL, I used the same approach as I did with individual hitters — with two slight tweaks:

VOL = STD(RS/G)/Yearly_(RS/G)^.67


VOL = volatility

STD(RS/G) = the standard deviation of a team’s runs scored per game

Yearly_(RS/G)^.67 = a team’s seasonal runs scored per game, raised to the .67-power

The correlation between team VOL and the number of wins above or below their expected wins from 2002 to 2012** was -.34.

To get a better sense of the overall impact, I grouped teams into four buckets based on their VOL scores’ rankings (relative to other teams in each season) and then calculated the average wins above/below expected for each bucket. Here are the results:

VOL Rank Ave Wins +/- Expected Wins
Top 8 2
Upper-Mid 8 0
Lower-Mid 8 -1
Bottom 6 -2

Teams that ranked first through eighth in terms of VOL for a given season (where lower VOL equates to a more consistent offense) beat their expected win total by an average of two wins and were 1.6 times as likely to beat their expected wins than teams that finished outside of the top-eight in VOL for a season (64% vs. 40%). Compare that to teams ranked 25 through 30 and you have an overall difference of plus-four wins.

If we look at the top- and bottom-20 teams since 2002 — in terms of wins over/under expectations — the relationship is even clearer:

Year Team Actual Wins W-L% Over/Under Expected Wins VOL VOL Rank
2012 BAL 93 0.574 12 1.06 5
2008 LAA 100 0.617 12 1.02 3
2005 ARI 77 0.475 12 1.09 12
2004 NYY 101 0.623 12 1.18 25
2007 ARI 90 0.556 11 1.08 10
2004 CIN 76 0.469 10 1.04 5
2009 SEA 85 0.525 10 1.01 1
2007 SEA 88 0.543 9 1.08 11
2008 HOU 86 0.534 9 0.98 2
2010 HOU 76 0.469 8 1.09 8
2009 SDP 75 0.463 8 1.04 5
2002 MIN 94 0.584 8 1.15 20
2005 CHW 99 0.611 8 1.06 9
2006 OAK 93 0.574 7 1.04 2
2007 STL 78 0.481 7 1.23 28
2003 SFG 100 0.621 7 0.96 1
2002 OAK 103 0.636 7 1.05 4
2009 NYY 103 0.636 7 1.04 4
2003 CIN 69 0.426 7 1.11 11
2012 CIN 97 0.599 7 0.98 3
2007 BOS 96 0.593 -6 1.22 26
2011 HOU 56 0.346 -6 1.05 6
2006 TEX 80 0.494 -6 1.13 16
2009 WSN 59 0.364 -6 1.11 11
2007 SFG 71 0.438 -6 1.10 14
2005 NYM 83 0.512 -6 1.20 28
2006 ATL 79 0.488 -6 1.10 11
2008 ATL 72 0.444 -6 1.17 21
2005 SEA 69 0.426 -7 1.07 10
2008 TOR 86 0.531 -7 1.16 18
2011 KCR 71 0.438 -7 1.16 21
2003 HOU 87 0.537 -7 1.15 20
2009 CLE 65 0.401 -7 1.24 30
2002 BOS 93 0.574 -7 1.25 29
2004 DET 72 0.444 -7 1.15 21
2011 SDP 71 0.438 -8 1.30 29
2005 TOR 80 0.494 -8 1.09 13
2002 CHC 67 0.414 -8 1.21 27
2009 TOR 75 0.463 -9 1.13 15
2006 CLE 78 0.481 -11 1.19 26

The average VOL rank of the top-20 teams since 2002 was 8.5, with 14 of the 20 finishing in the top 10 for VOL. The bottom 20 came in at 19.6, with only two teams ranking in the top 10.

But how did teams fare in 2012 when it came to beating win expectations and the volatility of their lineups?

Here are the top- and bottom-five teams from this past season:

Team Actual Wins W-L% Expected Wins Over/Under Expected Wins VOL VOL Rank
BAL 93 0.574 82 12 1.06 5
CIN 97 0.599 91 7 0.98 3
SFG 94 0.580 88 6 1.09 10
CLE 68 0.420 64 5 1.16 19
WSN 98 0.605 96 3 1.07 7
TBR 90 0.556 95 -4 1.11 13
STL 88 0.543 93 -5 1.13 16
BOS 69 0.426 74 -4 1.17 23
ARI 81 0.500 86 -5 1.17 24
COL 64 0.395 69 -5 1.20 27

Baltimore, Cincinnati, San Francisco and Washington each ranked in the top 10 in terms of offensive consistency, while none of the five worst teams broke the top-10. The bottom three teams posted the 23rd, 24th and 27th ranked offenses when it came to VOL.

As has been stated previously, VOL isn’t a silver bullet. At the end of the day, a team’s success is mostly determined by it’s run differential. Putting together a highly consistent team at the sake of more run scoring doesn’t make sense.

To illustrate this point, I looked at teams with high- (1-8) and low-ranked (24-30) offenses and compared their average win totals based on whether those offenses where high-volatility (a VOL ranking of 24-30) or low-volatility (1-8):

Runs Scored Runs Scored
VOL High Low
High 85 70
Low 93 72

Poor offenses won about the same number of games, regardless of the volatility. But elite offenses won eight more games on average when they were also elite in terms of their consistency (93), compared to their highly inconsistent counterparts (85). (The results were similar when I compared poor and elite run-prevention teams.)

So what have learned so far?

First, it appears there’s difference between how players distribute their offensive performance throughout a season. (That has some relationship year-to-year.) Second, it seems the degree to which a team’s offensive production is consistent can have an impact on whether they can beat their expected record.

Both findings are still preliminary, but they suggest the next question: How does hitter volatility combine to determine overall run-scoring volatility? That’s a much trickier question, but I will hopefully have something on it in the near future.


*I am still looking at the great feedback from colleagues and commenters about the new metric. For now, I decided to run this quick test with the existing metric.

**I derived expected wins using the Pythagenpat approach.

Print This Post

Bill works as a consultant by day. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, consults for a Major League Baseball team, and has appeared on MLB Network's Clubhouse Confidential as well as several MLB-produced documentaries. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Tumblr or Twitter @BillPetti.

Sort by:   newest | oldest | most voted
Albert C.
Albert C.
3 years 7 months ago

Channelclemente will loooooooooooooooooove this!

3 years 7 months ago

My guess here is that “blowout” type games are more on the losing team than the winner. IE, you have a blowout because the starting pitcher was terrible that day, went out early, and you spent 8 innings hitting the last 3 guys in the bullpen. (not because everyone on your offense was great that day)

So, at a particular level of runs scored, a team with less blowouts, is probably a better team. Essentially, if a team has lower volatility, more of their runs were talent based, versus opportunity based.

Matthew Murphy
3 years 7 months ago

The next question regarding the effect of individual hitter volatility on team volatility will be very interesting to see. If it is, and hitter consistency/volatility is a repeatable skill, then you will be able to project team volatility in the future. Just a gut feeling, but I think that most of team volatility will probably come from randomness in sequencing, and possibly injuries, that will be impossible to predict.

Matthew Murphy
3 years 7 months ago

By “If it is”, I really meant, “If hitter volatility does have a significant impact on team volatility”. Poor wording on my part.

Pizza Cutter
3 years 7 months ago

Bill, fascinating. A couple of questions.

1) Where did .67 come from?
2) Perhaps this calls for a rank order correlation (Wilcoxon? Spearman?)

3 years 7 months ago

I think there is a subtle flaw here, which perhaps Pizza Cutter’s suggestion of rank ordering might help.

Embedded in all of this is the assumption that two teams with equal run totals had equal offensive performances overall, just with some variation in the distribution of runs. This is generally fine for first order analysis and similar assumptions underpin much of our analytic knowledge of baseball. However you are looking at a (definitionally) second order question. And for this purpose the equivalency assumption may very well break down, not least because runs per game are non-normally distributed.

Consider this extreme example (in a world of only two teams where fractional runs are allowed):

Through 161 games, Team A has scored exactly 5 runs per game, every game.

Through 161 games, Team B has scored exactly 4.907 runs per game, every game.

In game 162, Team B wins 20 to 5.

Team A finishes with a record of 161-1, averaging 5 runs per game with a zero standard deviation.

Team B finishes with a record of 1-161, averaging 5 runs per game with a non-zero standard deviation.

Now, the question: Is Team B an equal offensive team that suffered for having more volatility, or is Team B an inferior offensive team whose inherent inferiority was masked by volatility? We don’t know for sure without filling in more details of game 162, but it is plausible that the 20 run game, like all extreme run outputs, might reflect lack of effort on Team A’s part (i.e. the game was out of hand early and they put in their worst pitcher) as much as skill on Team B’s part.

I know this is a silly example, but the underlying problem (that a single high-run game can raise a team’s total runs scored or allowed by a non-trivial amount, and the distribution of run-scoring is asymmetric) is serious. Is volatility an attribute that can differentiate between two otherwise equal teams, or is it a consequence of an inferior team getting lucky and beating up on mop-up relievers a bit more frequently than average?

You could perhaps address this by summing ln(runs) over each game or something like that. But without addressing this in some way or another I think high volatility, rather than being an attribute which explains sub-par performance, is a consequence of an underlying process which causes us to overestimate the team’s win expectancy.

3 years 7 months ago

Very interesting. I would have guessed that greater volatility was better and that the difference was neary negligible. Now I am much smarter in two ways.

3 years 7 months ago

There’s another obvious next step: an update the Pythagenpat formula with a volatility adjustment.

Take some dataset of team wins and expected wins. Calculate the volatility metrics for each team-season, then create a correlation between the volatility and (expected wins – actual wins). This correlation becomes an adjustment to the original Pythangenpat formula: New Exp Wins = Old Exp Wins + F(Vol). Then, use some metric like AVERAGE((exp-act)^2) to see how much better the adjustment is than the original.