Take some dataset of team wins and expected wins. Calculate the volatility metrics for each team-season, then create a correlation between the volatility and (expected wins – actual wins). This correlation becomes an adjustment to the original Pythangenpat formula: New Exp Wins = Old Exp Wins + F(Vol). Then, use some metric like AVERAGE((exp-act)^2) to see how much better the adjustment is than the original.

]]>Embedded in all of this is the assumption that two teams with equal run totals had equal offensive performances overall, just with some variation in the distribution of runs. This is generally fine for first order analysis and similar assumptions underpin much of our analytic knowledge of baseball. However you are looking at a (definitionally) second order question. And for this purpose the equivalency assumption may very well break down, not least because runs per game are non-normally distributed.

Consider this extreme example (in a world of only two teams where fractional runs are allowed):

Through 161 games, Team A has scored exactly 5 runs per game, every game.

Through 161 games, Team B has scored exactly 4.907 runs per game, every game.

In game 162, Team B wins 20 to 5.

Team A finishes with a record of 161-1, averaging 5 runs per game with a zero standard deviation.

Team B finishes with a record of 1-161, averaging 5 runs per game with a non-zero standard deviation.

Now, the question: Is Team B an equal offensive team that suffered for having more volatility, or is Team B an inferior offensive team whose inherent inferiority was masked by volatility? We don’t know for sure without filling in more details of game 162, but it is *plausible* that the 20 run game, like all extreme run outputs, might reflect lack of effort on Team A’s part (i.e. the game was out of hand early and they put in their worst pitcher) as much as skill on Team B’s part.

I know this is a silly example, but the underlying problem (that a single high-run game can raise a team’s total runs scored or allowed by a non-trivial amount, and the distribution of run-scoring is asymmetric) is serious. Is volatility an attribute that can differentiate between two otherwise equal teams, or is it a consequence of an inferior team getting lucky and beating up on mop-up relievers a bit more frequently than average?

You could perhaps address this by summing ln(runs) over each game or something like that. But without addressing this in some way or another I think high volatility, rather than being an attribute which explains sub-par performance, is a consequence of an underlying process which causes us to overestimate the team’s win expectancy.

]]>1) Where did .67 come from?

2) Perhaps this calls for a rank order correlation (Wilcoxon? Spearman?)

So, at a particular level of runs scored, a team with less blowouts, is probably a better team. Essentially, if a team has lower volatility, more of their runs were talent based, versus opportunity based.

]]>