It seems to me, as a Reds fan at least, that this usually comes up in the context of: Fan/Announcer complains that player X strikes out too much and insists that either he should be more consistent individually or that the team would be better off with a more consistent player. Of course, the obvious trade-off here is that the guy you have is more productive/talented overall, so you live with the inconsistency.

But if more consistent is better than less, where’s the break-even point. What’s the offsetting amount of overall production/talent you can give up and still be better off with a more consistent player? Is a consistent 2.5 WAR player actually helping your team more than an inconsistent 3.0 WAR?

And perhaps more fundamentally, how should consistency be defined at the player level?

It certainly applies to pitching. Imagine two pitchers, with both pitching exactly 6 innings each game with ERAs of 4.5. Pitcher A gives up 3 runs every outing, while Pitcher B gives up 1 run in 4 of 5 outings and 11 runs in the fifth outing. You’ll win more games with Pitcher B.

I posted something about this on some blog during the 2009 season, but cannot remember which one.

Comment by bluejaysstatsgeek — March 7, 2012 @ 1:40 pm

This is a great post. Glad to see this done with statistical analysis and coefficient of variance. It’s intuitive that if a team has zero variance in their production, scoring 5 runs and allowing 4 every game, they win every game. As their variance approaches an arbitrary maximum, their win percentage approaches 1/2.

It’s likely my reading comprehension is the issue here, but I’m not understanding how the sample size is arrived at, or why it’s not interfering with the results.

Given the amount of trades, free agents and other shuffling done, I’d think that a 25-man roster for a team would, on average, change rather significantly over a 5-year period.

If you want to look at it in the abstract, maybe they’re only swapping out the high and low ends of their spectrum (measured however you like: wOBA, WAR, etc.) and the core of their team is still the same basic approach…? But I’d think that you can’t smash together RS and RA from 5 different years to make your hypothesis. Getting better at reducing volatility in 2008 doesn’t let the team time-travel back to 2007 and pick up some extra wins.

So based on that, it’s about looking at just a single season. Perhaps framing the question as, “do teams that seem to consistently beat their Pythagorean Expectation, seem to have been above average or below average in volatility of RS or RA, when compared to the rest of the major leagues in that season?”.

In one of Bill James abstracts he does this with pitchers careers. I think talking about Don Sutton. Can’t remember the year. He concluded that pitchers who have up and down careers are more helpful to teams winning pennants. I once had a link to a site that had an index of the abstracts. Doesn’t seem to work anymore.

It’s not 5 different years. It’s volatility in runs scored and against in each individual season for each team over 6 years, where each team’s individual season is a case (so 30 team season over 6 years gives you an N of 180)

Excellent question, and that’s what I hope to expand on further this year. Assume my analysis above is right at a team level (and it’s an assumption right now), how does that translate to roster construction? It’s a big, interesting question I can hopefully start teasing out.

So if pythag is independent of volatility, and high volatility maximizes your actual wins over pythag, logically you should want to maximize volatility as long as you do not sacrifice too much on quality.

Yes, this is why fantasy managers value volatile youth over consistent veterans in later rounds, even when their projections are similar.

This is particularly true in fantasy where 2nd place = last place. Missing the playoffs in MLB is a “bad season” but a .525 team will still draw more than a .400 team.

An article published by Braunstein in 2010 in the Journal of Qualitative Analysis in Sports addressed this issue. Here is the abstract:

“Pythagorean win share has been one of the fundamental contributions to Sabermetrics. Several hundred articles, both academic and non-academic, have explored variations on Bill James’ original formula and its fit to empirical data. This paper considers a variation that is previously unexplored on any systematic level, consistency. After discussing several important contributions to the line of literature, we demonstrate the strong correlation between Pythagorean residuals and several notions of run distribution consistency. Finally, we select the “correct” form of consistency and use it to construct a simple regression estimator, which improves RMSE by 11%.”

Makes sense. Think of a team that is on the cusp of making the playoffs, ie they are a 90 win true talent team. They can go with pitcher A who is always a 2 WAR player or Player B who is a 4 WAR player half the time and a 0 WAR player the other half. While they will sometimes make the playoffs as a 90 win team, they will always make it as a 92 win team and sometimes even as an 88 win team.

Ya, Braunstein publishes some neat stuff. Here is another one I really like. It shows what the best measures of effectiveness are for pitchers and shows how the best measures are different for starters than relief. If you don’t know him well, you should definitely read up on some of his stuff. Here is the abstract for this one:

“A plethora of statistics have been proposed to measure the effectiveness of pitchers in Major League Baseball. While many of these are quite traditional (e.g., ERA, wins), some have gained currency only recently (e.g., WHIP, K/BB). Some of these metrics may have predictive power, but it is unclear which are the most reliable or consistent. We address this question by constructing a Bayesian random effects model that incorporates a point mass mixture and ?tting it to data on twenty metrics spanning approximately 2,500 players and 35 years. Our model identi?es FIP, HR/9, ERA, and BB/9 as the highest signal metrics for starters and GB%, FB%, and K/9 as the highest signal metrics for relievers. In general, the metrics identi?ed by our model are independent of team defense. Our procedure also provides a relative ranking of metrics separately by starters and relievers and shows that these rankings differ quite substantially between them. Our methodology is compared to a Lasso-based procedure and is internally validated by detailed case studies.”

Before we try to figure out anything on the player level, we need to accurately define what is team consistency.

A team with unlimited depth can simply swap out struggling players with better players, which means consistent play that creates the illusion of consistent players. Further, inconsistency could be due to injuries. Once depth and injuries have been controlled for, then we need to see if there is any relationship.

I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed. The best test would be to look at the correlation between the residuals (the difference between actual wins and Pythagorean wins) and the coefficient of variation.

If a given team constantly scores 25% more than their adversaries, wouldn’t arbitrarily large variance lead to a win% that reflects that advantage? Somewhere between the pythagorean (.609) and the run-ratio (.556).

I have a couple of thoughts here. Of course these are all narratives with no real stats behind them. Please take at face value.

1) I would imagine that for Pitching quality and consistency are closely related. If a player is constantly below average, he doesn’t last long in MLB. The only reason below average players can hang around is if they show flashes of brilliance. Of the consistent pitchers, I would imagine they would all be above average. Thus, the team with the most consistent pitchers would also have the best pitchers as a general rule.

2) Position players can make up for poor or inconsistent hitting by being valuable in the field or on the base paths. Defense and fielding are not accounted for in this study. Combined with #1 this might explain why low volatility in RA is more responsible for total wins than RA.

3) I think the wins above pythag expected comes down to avoiding outs. In order to score runs, a team usually needs to string together hits. This is why consistency would benefit. Four singles are much more valuable than one HR and three outs. Linear weights bear this out. Winning more than expected requires teams to have their hits strung together (consistency) which to the best of my knowledge is luck.

4) For pitching variance, I see this as getting good results from bad players (see #1, quality and consistency related). This, like #3 comes down to luck. To summarize #3 and #4, lucky hitting looks like consistency. Lucky pitching looks volatile.

Comment by Steve the Pirate — March 7, 2012 @ 3:55 pm

Is there a way to measure “streakiness” of players, and simulate a thousand games played by nine 2011 Mark Reynolds vs nine 2011 Yadier Molinas?

Comment by Mario Mendoza of commenters — March 7, 2012 @ 3:58 pm

That’s what the second set of correlations looks at–the relationship between runs scored/allowed consistency (i.e. the coefficient of variation) and the difference between actual and expected wins.

…trying to pick examples of similar wOBA but seemingly different “consistency.”

Comment by Mario Mendoza of commenters — March 7, 2012 @ 3:59 pm

Good idea. I also wonder if it might be useful to consider the interaction between runs scored and prevented which is inherent. For example, put teams into buckets that represent the quintiles of run production and see if the relationship between run-prevention volatility and success is similar across the buckets. The same then for buckets of run-prevention.

Basically, the gap between a team’s run prevention (or production)and the league average should impact how valuable volatility is. I speculate that volatility softens the impact of below average performance and blunts the benefit of above-average performance.

Great stuff. Just wondering out loud if its individual player volatility that *matters* or simply the volatility of the roster of a whole. On an individual level, there’s presumably zero correlation, so Volatile Hitter A’s hot streak is going to get offset by Volatile Hitter B’s cold streak often enough that net-net, the *whole* may be not particularly volatile.

But what about a “studs and scrubs” sort of team? Your work suggests a team with a couple aces and an awful back end of the rotation and shallow bullpen is preferable to a similar ERA team with a more even distribution.

Runs allowed volatility should be related to differences in umpiring, team defense, and a big gap in performance between the Ace and 5th or lower starters.
A paper needs to be written that looks at team defense and range and wins. I suspect teams that have truly exceptional defense win more than their run differential would imply.

Comment by kick me in the GO NATS — March 7, 2012 @ 4:23 pm

I think it’s Koufax in his historical baseball abstract, or in his Hall of Fame book.

Random thought: The lower the run scoring (allowed or created), the higher the measured volatility will be — even if ‘true’ volatility is the same. In an extreme example, if a team averages scoring 1.5 runs, the std dev will be a lot higher than that of a team which averages 10 runs a game….simply because you cannot score 1.3 runs or 1.7 runs in a given game, and not because the 1.5 rpg team is any less inherently consistent.

Not sure if this affects the goal of your analysis at all, since you may be willing to lump in this source of volatility with your assumed source (player inconsistency), but it’s worth noting.

The same is true for a team that averages 10.5 runs. They can’t score 10.3 or 10.7 runs in a game. It would in fact be astonishing if a team that scored 10.5 runs/game had a lower standard deviation than a team that scores 1.5 runs/game.

As the mean goes to 0, the standard deviation must go to 0 as well (since you always score at least 0 runs in a game), so it’s not really clear that there should be any scale effect like you’re describing.

Around the time of the WS IIRC Tango demonstrated the same scenario that top level pitchers are better off consistent lesser guys do better being more erratic.

I believe the discussion was about Edwin Jackson and pitchers like him.

I found it interesting since inconsistency is usually a term used in a negative way.

When we talk about consistency we’re usually asking why a certain player can’t just be really good all the time? The answer should be obvious. They don’t have poor performances just to make things interesting.

While this article is really about consistency, I just feel like mentioning that one other thing that helps teams outperform their Pythag values is the play of their bullpen. If you’ve got a really strong back end of the bullpen, you’ll perform better than expected in close games. They won’t have an effect either way in games that are blowouts, of course, but if the game is tied or within one or two runs, having better run prevention late in the game will keep a team in it.

You are correct. The point still holds, though, if a team scores 101 and allows 100 every time vs maximum variance. Winning every game versus winning roughly 1/2. By recognizing the extremes, one would probably guess the gist of this article. Interesting that it’s more considerable for run prevention.

Some argue that consistency (streakiness) does not exist. It’s just random variance, or lack of variance, that occurs by chance.

I don’t actually buy that though.

I would like to see a study on the most consistent players game to game over a several year period. One thing I found that is interesting in a limited study, is that if you take the 100 worst games of any players in a given season, 90% of them bat under the Mendoza line in these games, even some of the best players in the game.

Another way to look at it is to look at players record in games where the team scores between 3 and 8 runs only, to exclude the effect of dominant and awful pitching performances. I suspect a lot of offensive inconsistency is due to hitters padding their stats against sub-par pitching and being dominated by good pitching.

A few teams may get lucky (unlucky) over the course of the season by missing the other teams ace more often than not (or getting the other teams ace more often). This is more of an issue in todays game with 5 man rotations and so many 2-3 game series (compared to previous era where a 4 man rotation was used and there were more 4 game series). It averages out over the season for most teams, but not every team.

Pitcher consistency is a different story. Good pitchers can pitch well even against the best teams and regardless of an umpires strike zone Bad pitchers are more dependent on the quality of the team they face, how hot/cold that team is, and the size of the umpires strike zone. So while they appear more inconsistent, their pitching is probably the same, but the results differ for reasons they don’t control. The good games by bad pitchers are simply flukes due to bad/cold teams and/or a generous umpire.

I think the last paragraph is only true if the back end of the rotation has high volatility. (I can’t help but think of someone like Edwin Jackson. He seems like the quintessential 1-run, 6-run starter with a 4+ ERA.) Otherwise, you run the risk of not even giving yourself a chance in the vast majority of scrub starts…and you’ll still only perform “normally” for the ace starts, since you don’t get to choose which days the offense clicks, and which games it doesn’t.

Any thoughts on how team consistency in run production ties in to lineup construction?
I know optimal lineup construction has a relatively small impact on runs scored totals. But could optimal construction provide better run consistency? If so, this would mean optimal lineup construction has a bigger impact than gross totals imply, right?

C’mon. Realize that the difference between 10 and 10.3 is tiny, and that the difference between 1 and 1.5 is huge, in percent terms — which is how the author clearly states he has measured volatility:

“I calculated what is called the coefficient of variationâ€“simply the standard deviation of a set of numbers divided by their mean.”

Any time you are measuring something where the unit of measurement is a large percentage of the total values being measured, you need to be aware of this issue.

Were the Braves better off last year to have Dan Uggla have the wild swings from month to month, going from a .160 average for a couple months and then putting up great numbers in July/August? He certainly, along with hot-hitting Freddie Freeman(1.033 OPS in August) helped carry the Braves for a couple months, and you have to credit Fredi Gonzalez for batting them 3-4 in the lineup while they were really hot together. They were 33-20 during that time, but two players alone are not enough to prove anything, of course. Fun to think about, though.

Fascinating stuff. I’ve thought about this in the past, and I would be interested to see if this has any impact on playoff performance. It certainly seems like high variance teams might have an advantage in the postseason, because they have the ability to string together a few great outlier performances. It seems like most of the wild card teams that have had success recently were teams that ran hot and cold during the regular season.

This is a good comment, and I greatly look forward to Petti’s further studies.
Remember, however, when a manager, announcer, fan or other baseball type calls for a player to be consistent, he really is calling for the player to be consistently good, not that he want the player to play closer to his average.

Bill, do you have any specific team examples of teams that outperform their pythags and are found to be runs-allowed volatile? I wonder if any of those recent overachieving Angels teams are on that list.

Bill, thanks for digging up those old THT articles on the Weibull distribution. One thing that occurs to me is that “standard deviation” is poorly-defined for run distributions because the distribution is not normal. For a Weibull distributed variable, I would look at the shape parameter (“gamma” in the second plot in the THT article you linked to).

The league shape parameter is typically around 1.85 or so, and is equivalent to the exponent in the Pythagorean (actually, Pythagenpat) win% equation. But individual teams have different shape parameters based on their actual run distributions. The difference between the team shape parameter and the league shape parameter is a better measure, in my opinion, of volatility. This would require fitting a Weibull distribution for every team, but that’s a relatively easy programming task (I would love to do this if I had the time).

My belief from having studied this is that you can’t *construct* a team to have greater or smaller volatility, but that it does happen and that it explains a great deal of deviation from Pyth win%. If anyone finds a way to predict, a priori, what the necessary conditions required for controlling the volatility (or shape) of a team’s run distribution, I’ll buy you a beer.

In the meantime, it’s a heck of a lot easier to stock up on hitters that hit the snot out of the ball and pitchers with great breaking stuff than it is to squeeze blood out of the run distribution stone. It’s a really fun analytical thing to look at but I can’t see the practical consequence.

I first ran a regression with team winning percentage as the dependent variable and runs per game and opponents’ runs per game as the independent variables. Then I added two variables in a second regression which measured consistency. HITCON was the standard deviation (SD) of runs per game divided by runs per game (just the SD would not be right since high scoring teams will have a greater SD). PITCON does something similar on the pitching side.

I found that the more consistent hitting teams win more for a given average runs per game while the less consistent pitching teams win more.

I found it is much more important to score and prevent runs than become more consistent (or less, on the pitching side).

in a comment newsense wrote: “I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed.” I completely agree. The coefficient of variation is not independent of runs scored or runs allowed. I looked at runs scored and allowed data from 1998-2011 (420 data sets). The coefficient of variation (RS_Vol) clearly decreases with increasing runs scored. The same goes for runs against (RA_Vol). This correlation explains the below/above average table and suggests it may have nothing to do with consistency.

This is what I expected as runs scored should follow a Poisson distribution. There are some reasons why the Poisson distribution is not exactly correct (runs come in bunches). If the Poisson distribution was followed the standard deviation in runs should go as sqrt(runs). So that RS_Vol = sqrt(runs)/runs = 1/sqrt(runs). The actual data does not show this form precisely but the trend of decreasing coefficient of variation with increasing runs was certainly present. My toying with the data found that a revised measure of volatility of sigma(Runs^(1.5))/Runs gives a measure of volatility that is close to independent of runs.

The comparisons of actual wins with pythag expected wins should remove the link between runs scored and volatility. However, since the volatility is correlated to runs scored and runs against I would be worried that the actual measures are more expected-actual trends with runs scored and against. It would be interesting to see if these trends remain with a modified measure of consistency that is independent of runs scored and allowed.

Given the time I will try the Weibull distribution analysis discussed above.

Bad paren placement above. Should be: My toying with the data found that a revised measure of volatility of sigma(Runs)^(1.5)/Runs gives a measure of volatility that is close to independent of runs.

If the outliers of the runs scored were removed from the sample using Chauvenet’s criterion I conjecture that the results data would show more in line with the initial thinking, that more consistency was optimal for an offense, even against their expected wins.

Comment by Anthony D'Addeo — June 4, 2014 @ 10:35 pm

I’m actually not sure I came to the write conclusion, but it could possibly give a more accurate picture of consistency.

Comment by Anthony D'Addeo — June 4, 2014 @ 10:40 pm

It seems to me, as a Reds fan at least, that this usually comes up in the context of: Fan/Announcer complains that player X strikes out too much and insists that either he should be more consistent individually or that the team would be better off with a more consistent player. Of course, the obvious trade-off here is that the guy you have is more productive/talented overall, so you live with the inconsistency.

But if more consistent is better than less, where’s the break-even point. What’s the offsetting amount of overall production/talent you can give up and still be better off with a more consistent player? Is a consistent 2.5 WAR player actually helping your team more than an inconsistent 3.0 WAR?

And perhaps more fundamentally, how should consistency be defined at the player level?

Comment by RMR — March 7, 2012 @ 1:14 pm

It certainly applies to pitching. Imagine two pitchers, with both pitching exactly 6 innings each game with ERAs of 4.5. Pitcher A gives up 3 runs every outing, while Pitcher B gives up 1 run in 4 of 5 outings and 11 runs in the fifth outing. You’ll win more games with Pitcher B.

I posted something about this on some blog during the 2009 season, but cannot remember which one.

Comment by bluejaysstatsgeek — March 7, 2012 @ 1:40 pm

This is a great post. Glad to see this done with statistical analysis and coefficient of variance. It’s intuitive that if a team has zero variance in their production, scoring 5 runs and allowing 4 every game, they win every game. As their variance approaches an arbitrary maximum, their win percentage approaches 1/2.

Comment by michael — March 7, 2012 @ 1:42 pm

It’s likely my reading comprehension is the issue here, but I’m not understanding how the sample size is arrived at, or why it’s not interfering with the results.

Given the amount of trades, free agents and other shuffling done, I’d think that a 25-man roster for a team would, on average, change rather significantly over a 5-year period.

If you want to look at it in the abstract, maybe they’re only swapping out the high and low ends of their spectrum (measured however you like: wOBA, WAR, etc.) and the core of their team is still the same basic approach…? But I’d think that you can’t smash together RS and RA from 5 different years to make your hypothesis. Getting better at reducing volatility in 2008 doesn’t let the team time-travel back to 2007 and pick up some extra wins.

So based on that, it’s about looking at just a single season. Perhaps framing the question as, “do teams that seem to consistently beat their Pythagorean Expectation, seem to have been above average or below average in volatility of RS or RA, when compared to the rest of the major leagues in that season?”.

Comment by Snowblind — March 7, 2012 @ 1:43 pm

In one of Bill James abstracts he does this with pitchers careers. I think talking about Don Sutton. Can’t remember the year. He concluded that pitchers who have up and down careers are more helpful to teams winning pennants. I once had a link to a site that had an index of the abstracts. Doesn’t seem to work anymore.

Comment by Steve N — March 7, 2012 @ 1:52 pm

It’s not 5 different years. It’s volatility in runs scored and against in each individual season for each team over 6 years, where each team’s individual season is a case (so 30 team season over 6 years gives you an N of 180)

Comment by Bill Petti — March 7, 2012 @ 1:52 pm

Seems like a perfect assignment for a simulator. BBTF?

Comment by Mario Mendoza of commenters — March 7, 2012 @ 1:52 pm

Fantastic work. Currently doing research on the profile of a consistent hitter by identifying what type of hitters perform well against good pitchers.

Comment by Andy — March 7, 2012 @ 1:53 pm

Excellent question, and that’s what I hope to expand on further this year. Assume my analysis above is right at a team level (and it’s an assumption right now), how does that translate to roster construction? It’s a big, interesting question I can hopefully start teasing out.

Comment by Bill Petti — March 7, 2012 @ 1:53 pm

So if pythag is independent of volatility, and high volatility maximizes your actual wins over pythag, logically you should want to maximize volatility as long as you do not sacrifice too much on quality.

Comment by Cliff — March 7, 2012 @ 1:57 pm

Yes, this is why fantasy managers value volatile youth over consistent veterans in later rounds, even when their projections are similar.

This is particularly true in fantasy where 2nd place = last place. Missing the playoffs in MLB is a “bad season” but a .525 team will still draw more than a .400 team.

Comment by Toffer Peak — March 7, 2012 @ 2:13 pm

An article published by Braunstein in 2010 in the Journal of Qualitative Analysis in Sports addressed this issue. Here is the abstract:

“Pythagorean win share has been one of the fundamental contributions to Sabermetrics. Several hundred articles, both academic and non-academic, have explored variations on Bill James’ original formula and its fit to empirical data. This paper considers a variation that is previously unexplored on any systematic level, consistency. After discussing several important contributions to the line of literature, we demonstrate the strong correlation between Pythagorean residuals and several notions of run distribution consistency. Finally, we select the “correct” form of consistency and use it to construct a simple regression estimator, which improves RMSE by 11%.”

Here is a link to the PDF: http://www.largedocument.com/2/56d6a4c1/pythagoras.pdf

Comment by Daniel — March 7, 2012 @ 2:14 pm

Typo, meant to say Journal of Quantitative Analysis

Comment by Daniel — March 7, 2012 @ 2:15 pm

Makes sense. Think of a team that is on the cusp of making the playoffs, ie they are a 90 win true talent team. They can go with pitcher A who is always a 2 WAR player or Player B who is a 4 WAR player half the time and a 0 WAR player the other half. While they will sometimes make the playoffs as a 90 win team, they will always make it as a 92 win team and sometimes even as an 88 win team.

Comment by Toffer Peak — March 7, 2012 @ 2:19 pm

Daniel, thanks for this. Definitely going to check it out.

Comment by Bill Petti — March 7, 2012 @ 2:23 pm

Ya, Braunstein publishes some neat stuff. Here is another one I really like. It shows what the best measures of effectiveness are for pitchers and shows how the best measures are different for starters than relief. If you don’t know him well, you should definitely read up on some of his stuff. Here is the abstract for this one:

“A plethora of statistics have been proposed to measure the effectiveness of pitchers in Major League Baseball. While many of these are quite traditional (e.g., ERA, wins), some have gained currency only recently (e.g., WHIP, K/BB). Some of these metrics may have predictive power, but it is unclear which are the most reliable or consistent. We address this question by constructing a Bayesian random effects model that incorporates a point mass mixture and ?tting it to data on twenty metrics spanning approximately 2,500 players and 35 years. Our model identi?es FIP, HR/9, ERA, and BB/9 as the highest signal metrics for starters and GB%, FB%, and K/9 as the highest signal metrics for relievers. In general, the metrics identi?ed by our model are independent of team defense. Our procedure also provides a relative ranking of metrics separately by starters and relievers and shows that these rankings differ quite substantially between them. Our methodology is compared to a Lasso-based procedure and is internally validated by detailed case studies.”

Here is the PDF: http://www.largedocument.com/2/56d6a4c1/A_Point%2DMass_Mixture_Random_Effects_Model_for_Pitching_Metrics.pdf

He also has one on the other side for hitters but the results aren’t as exciting: http://arxiv.org/pdf/0911.4503.pdf

Comment by Daniel — March 7, 2012 @ 2:39 pm

i might be wrong but i remember the dbacks get outscored on the season a few years back but still led the NL in wins ?

Comment by johnny 5 — March 7, 2012 @ 3:01 pm

Before we try to figure out anything on the player level, we need to accurately define what is team consistency.

A team with unlimited depth can simply swap out struggling players with better players, which means consistent play that creates the illusion of consistent players. Further, inconsistency could be due to injuries. Once depth and injuries have been controlled for, then we need to see if there is any relationship.

Comment by guesswork — March 7, 2012 @ 3:30 pm

I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed. The best test would be to look at the correlation between the residuals (the difference between actual wins and Pythagorean wins) and the coefficient of variation.

Comment by newsense — March 7, 2012 @ 3:52 pm

If a given team constantly scores 25% more than their adversaries, wouldn’t arbitrarily large variance lead to a win% that reflects that advantage? Somewhere between the pythagorean (.609) and the run-ratio (.556).

Comment by Someanalyst — March 7, 2012 @ 3:52 pm

I have a couple of thoughts here. Of course these are all narratives with no real stats behind them. Please take at face value.

1) I would imagine that for Pitching quality and consistency are closely related. If a player is constantly below average, he doesn’t last long in MLB. The only reason below average players can hang around is if they show flashes of brilliance. Of the consistent pitchers, I would imagine they would all be above average. Thus, the team with the most consistent pitchers would also have the best pitchers as a general rule.

2) Position players can make up for poor or inconsistent hitting by being valuable in the field or on the base paths. Defense and fielding are not accounted for in this study. Combined with #1 this might explain why low volatility in RA is more responsible for total wins than RA.

3) I think the wins above pythag expected comes down to avoiding outs. In order to score runs, a team usually needs to string together hits. This is why consistency would benefit. Four singles are much more valuable than one HR and three outs. Linear weights bear this out. Winning more than expected requires teams to have their hits strung together (consistency) which to the best of my knowledge is luck.

4) For pitching variance, I see this as getting good results from bad players (see #1, quality and consistency related). This, like #3 comes down to luck. To summarize #3 and #4, lucky hitting looks like consistency. Lucky pitching looks volatile.

Comment by Steve the Pirate — March 7, 2012 @ 3:55 pm

Is there a way to measure “streakiness” of players, and simulate a thousand games played by nine 2011 Mark Reynolds vs nine 2011 Yadier Molinas?

Comment by Mario Mendoza of commenters — March 7, 2012 @ 3:58 pm

That’s what the second set of correlations looks at–the relationship between runs scored/allowed consistency (i.e. the coefficient of variation) and the difference between actual and expected wins.

Comment by Bill Petti — March 7, 2012 @ 3:59 pm

…trying to pick examples of similar wOBA but seemingly different “consistency.”

Comment by Mario Mendoza of commenters — March 7, 2012 @ 3:59 pm

Good idea. I also wonder if it might be useful to consider the interaction between runs scored and prevented which is inherent. For example, put teams into buckets that represent the quintiles of run production and see if the relationship between run-prevention volatility and success is similar across the buckets. The same then for buckets of run-prevention.

Basically, the gap between a team’s run prevention (or production)and the league average should impact how valuable volatility is. I speculate that volatility softens the impact of below average performance and blunts the benefit of above-average performance.

Comment by Someanalyst — March 7, 2012 @ 4:08 pm

Great stuff. Just wondering out loud if its individual player volatility that *matters* or simply the volatility of the roster of a whole. On an individual level, there’s presumably zero correlation, so Volatile Hitter A’s hot streak is going to get offset by Volatile Hitter B’s cold streak often enough that net-net, the *whole* may be not particularly volatile.

But what about a “studs and scrubs” sort of team? Your work suggests a team with a couple aces and an awful back end of the rotation and shallow bullpen is preferable to a similar ERA team with a more even distribution.

Comment by Adam — March 7, 2012 @ 4:09 pm

Runs allowed volatility should be related to differences in umpiring, team defense, and a big gap in performance between the Ace and 5th or lower starters.

A paper needs to be written that looks at team defense and range and wins. I suspect teams that have truly exceptional defense win more than their run differential would imply.

Comment by kick me in the GO NATS — March 7, 2012 @ 4:23 pm

I think it’s Koufax in his historical baseball abstract, or in his Hall of Fame book.

Comment by Kinanik — March 7, 2012 @ 5:02 pm

Random thought: The lower the run scoring (allowed or created), the higher the measured volatility will be — even if ‘true’ volatility is the same. In an extreme example, if a team averages scoring 1.5 runs, the std dev will be a lot higher than that of a team which averages 10 runs a game….simply because you cannot score 1.3 runs or 1.7 runs in a given game, and not because the 1.5 rpg team is any less inherently consistent.

Not sure if this affects the goal of your analysis at all, since you may be willing to lump in this source of volatility with your assumed source (player inconsistency), but it’s worth noting.

Comment by evo34 — March 7, 2012 @ 5:28 pm

Oh, so it

islooked at year by year, and then all of that is rolled up into overall trends and values? Ok, these work for me then.Comment by Snowblind — March 7, 2012 @ 7:10 pm

The same is true for a team that averages 10.5 runs. They can’t score 10.3 or 10.7 runs in a game. It would in fact be astonishing if a team that scored 10.5 runs/game had a lower standard deviation than a team that scores 1.5 runs/game.

As the mean goes to 0, the standard deviation must go to 0 as well (since you always score at least 0 runs in a game), so it’s not really clear that there should be any scale effect like you’re describing.

Comment by Jon — March 7, 2012 @ 7:43 pm

Around the time of the WS IIRC Tango demonstrated the same scenario that top level pitchers are better off consistent lesser guys do better being more erratic.

I believe the discussion was about Edwin Jackson and pitchers like him.

I found it interesting since inconsistency is usually a term used in a negative way.

When we talk about consistency we’re usually asking why a certain player can’t just be really good all the time? The answer should be obvious. They don’t have poor performances just to make things interesting.

Comment by CircleChange11 — March 7, 2012 @ 9:20 pm

While this article is really about consistency, I just feel like mentioning that one other thing that helps teams outperform their Pythag values is the play of their bullpen. If you’ve got a really strong back end of the bullpen, you’ll perform better than expected in close games. They won’t have an effect either way in games that are blowouts, of course, but if the game is tied or within one or two runs, having better run prevention late in the game will keep a team in it.

Comment by Bronnt — March 7, 2012 @ 10:13 pm

You are correct. The point still holds, though, if a team scores 101 and allows 100 every time vs maximum variance. Winning every game versus winning roughly 1/2. By recognizing the extremes, one would probably guess the gist of this article. Interesting that it’s more considerable for run prevention.

Comment by michael — March 7, 2012 @ 11:40 pm

Some argue that consistency (streakiness) does not exist. It’s just random variance, or lack of variance, that occurs by chance.

I don’t actually buy that though.

I would like to see a study on the most consistent players game to game over a several year period. One thing I found that is interesting in a limited study, is that if you take the 100 worst games of any players in a given season, 90% of them bat under the Mendoza line in these games, even some of the best players in the game.

Another way to look at it is to look at players record in games where the team scores between 3 and 8 runs only, to exclude the effect of dominant and awful pitching performances. I suspect a lot of offensive inconsistency is due to hitters padding their stats against sub-par pitching and being dominated by good pitching.

A few teams may get lucky (unlucky) over the course of the season by missing the other teams ace more often than not (or getting the other teams ace more often). This is more of an issue in todays game with 5 man rotations and so many 2-3 game series (compared to previous era where a 4 man rotation was used and there were more 4 game series). It averages out over the season for most teams, but not every team.

Pitcher consistency is a different story. Good pitchers can pitch well even against the best teams and regardless of an umpires strike zone Bad pitchers are more dependent on the quality of the team they face, how hot/cold that team is, and the size of the umpires strike zone. So while they appear more inconsistent, their pitching is probably the same, but the results differ for reasons they don’t control. The good games by bad pitchers are simply flukes due to bad/cold teams and/or a generous umpire.

Comment by pft — March 7, 2012 @ 11:49 pm

I think the last paragraph is only true if the back end of the rotation has high volatility. (I can’t help but think of someone like Edwin Jackson. He seems like the quintessential 1-run, 6-run starter with a 4+ ERA.) Otherwise, you run the risk of not even giving yourself a chance in the vast majority of scrub starts…and you’ll still only perform “normally” for the ace starts, since you don’t get to choose which days the offense clicks, and which games it doesn’t.

Comment by KDL — March 8, 2012 @ 1:17 am

This makes sense in my gut and experience, and I agree with it…but is there evidence that backs up this idea?

Comment by KDL — March 8, 2012 @ 1:19 am

Any thoughts on how team consistency in run production ties in to lineup construction?

I know optimal lineup construction has a relatively small impact on runs scored totals. But could optimal construction provide better run consistency? If so, this would mean optimal lineup construction has a bigger impact than gross totals imply, right?

Comment by KDL — March 8, 2012 @ 1:24 am

C’mon. Realize that the difference between 10 and 10.3 is tiny, and that the difference between 1 and 1.5 is huge, in percent terms — which is how the author clearly states he has measured volatility:

“I calculated what is called the coefficient of variationâ€“simply the standard deviation of a set of numbers divided by their mean.”

Any time you are measuring something where the unit of measurement is a large percentage of the total values being measured, you need to be aware of this issue.

Comment by evo34 — March 8, 2012 @ 1:36 am

Were the Braves better off last year to have Dan Uggla have the wild swings from month to month, going from a .160 average for a couple months and then putting up great numbers in July/August? He certainly, along with hot-hitting Freddie Freeman(1.033 OPS in August) helped carry the Braves for a couple months, and you have to credit Fredi Gonzalez for batting them 3-4 in the lineup while they were really hot together. They were 33-20 during that time, but two players alone are not enough to prove anything, of course. Fun to think about, though.

Comment by bstar — March 8, 2012 @ 3:33 am

I recall a few studies that showed bullpen performance was at least mildly correlated with over/under performing Pythag.

Comment by CJ — March 8, 2012 @ 8:45 am

Fascinating stuff. I’ve thought about this in the past, and I would be interested to see if this has any impact on playoff performance. It certainly seems like high variance teams might have an advantage in the postseason, because they have the ability to string together a few great outlier performances. It seems like most of the wild card teams that have had success recently were teams that ran hot and cold during the regular season.

Comment by Andrew — March 8, 2012 @ 10:46 am

That was the 2007 D’Backs… .66 RS_VOL and .69 RA_Vol…

Comment by Eric R — March 8, 2012 @ 1:13 pm

“Defense and fielding are not accounted for in this study. ”

aren’t they included since defense [and?] fielding are part of run prevention?

Comment by Eric R — March 8, 2012 @ 1:15 pm

This is a good comment, and I greatly look forward to Petti’s further studies.

Remember, however, when a manager, announcer, fan or other baseball type calls for a player to be consistent, he really is calling for the player to be consistently good, not that he want the player to play closer to his average.

Comment by Baltar — March 8, 2012 @ 2:06 pm

Interesting, if true. Perhaps you could do a study on this. I would welcome it.

Comment by Baltar — March 8, 2012 @ 2:21 pm

Bill, do you have any specific team examples of teams that outperform their pythags and are found to be runs-allowed volatile? I wonder if any of those recent overachieving Angels teams are on that list.

Comment by bstar — March 8, 2012 @ 10:36 pm

Bill, thanks for digging up those old THT articles on the Weibull distribution. One thing that occurs to me is that “standard deviation” is poorly-defined for run distributions because the distribution is not normal. For a Weibull distributed variable, I would look at the shape parameter (“gamma” in the second plot in the THT article you linked to).

The league shape parameter is typically around 1.85 or so, and is equivalent to the exponent in the Pythagorean (actually, Pythagenpat) win% equation. But individual teams have different shape parameters based on their actual run distributions. The difference between the team shape parameter and the league shape parameter is a better measure, in my opinion, of volatility. This would require fitting a Weibull distribution for every team, but that’s a relatively easy programming task (I would love to do this if I had the time).

My belief from having studied this is that you can’t *construct* a team to have greater or smaller volatility, but that it does happen and that it explains a great deal of deviation from Pyth win%. If anyone finds a way to predict, a priori, what the necessary conditions required for controlling the volatility (or shape) of a team’s run distribution, I’ll buy you a beer.

In the meantime, it’s a heck of a lot easier to stock up on hitters that hit the snot out of the ball and pitchers with great breaking stuff than it is to squeeze blood out of the run distribution stone. It’s a really fun analytical thing to look at but I can’t see the practical consequence.

Thanks for a well thought out article.

Comment by the llama — March 9, 2012 @ 1:18 am

I looked at this issue a couple of years ago (and even earlier many years ago on the SABR list). See

http://cybermetric.blogspot.com/2010/04/how-much-does-team-consistency-matter.html

I first ran a regression with team winning percentage as the dependent variable and runs per game and opponents’ runs per game as the independent variables. Then I added two variables in a second regression which measured consistency. HITCON was the standard deviation (SD) of runs per game divided by runs per game (just the SD would not be right since high scoring teams will have a greater SD). PITCON does something similar on the pitching side.

I found that the more consistent hitting teams win more for a given average runs per game while the less consistent pitching teams win more.

I found it is much more important to score and prevent runs than become more consistent (or less, on the pitching side).

Comment by Cyril Morong — March 9, 2012 @ 4:57 pm

What exactly do the numbers in the results table mean, like the -.37?

Comment by Cyril Morong — March 9, 2012 @ 5:10 pm

in a comment newsense wrote: “I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed.” I completely agree. The coefficient of variation is not independent of runs scored or runs allowed. I looked at runs scored and allowed data from 1998-2011 (420 data sets). The coefficient of variation (RS_Vol) clearly decreases with increasing runs scored. The same goes for runs against (RA_Vol). This correlation explains the below/above average table and suggests it may have nothing to do with consistency.

This is what I expected as runs scored should follow a Poisson distribution. There are some reasons why the Poisson distribution is not exactly correct (runs come in bunches). If the Poisson distribution was followed the standard deviation in runs should go as sqrt(runs). So that RS_Vol = sqrt(runs)/runs = 1/sqrt(runs). The actual data does not show this form precisely but the trend of decreasing coefficient of variation with increasing runs was certainly present. My toying with the data found that a revised measure of volatility of sigma(Runs^(1.5))/Runs gives a measure of volatility that is close to independent of runs.

The comparisons of actual wins with pythag expected wins should remove the link between runs scored and volatility. However, since the volatility is correlated to runs scored and runs against I would be worried that the actual measures are more expected-actual trends with runs scored and against. It would be interesting to see if these trends remain with a modified measure of consistency that is independent of runs scored and allowed.

Given the time I will try the Weibull distribution analysis discussed above.

Comment by lucask7 — March 11, 2012 @ 10:40 pm

Bad paren placement above. Should be: My toying with the data found that a revised measure of volatility of sigma(Runs)^(1.5)/Runs gives a measure of volatility that is close to independent of runs.

Comment by lucask7 — March 11, 2012 @ 10:45 pm

If the outliers of the runs scored were removed from the sample using Chauvenet’s criterion I conjecture that the results data would show more in line with the initial thinking, that more consistency was optimal for an offense, even against their expected wins.

Comment by Anthony D'Addeo — June 4, 2014 @ 10:35 pm

I’m actually not sure I came to the write conclusion, but it could possibly give a more accurate picture of consistency.

Comment by Anthony D'Addeo — June 4, 2014 @ 10:40 pm