## Does Consistent Play Help a Team Win?

One of the many insights to come from Bill James was the fact that a team’s winning percentage could very easily be estimated based simply on the difference between the runs they scored and the runs they allowed. And while James’ Pythagorean Expectation cannot account for all variation in team performance, it does a fantastic job.

One possibility that is not accounted for is that teams may distribute their runs differently, game to game, than others throughout the season. It’s possible that two teams with identical run differentials could have significantly different records. Here’s a short example:

Assume two teams, A and B, both with a run differential of 0 (both score and allow 29 runs) over the course of a 10-game series against each other. The Pythagorean Expectation tells us that both teams should have a record of 5-5. However, in this scenario, team B wins 6 out of 10.

Game | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

Team A | 10 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 | 5 |

Team B | 1 | 4 | 4 | 3 | 3 | 5 | 3 | 1 | 1 | 4 |

The difference here is in how both teams distributed their runs. Team B was more consistent in terms of their runs scored compared to Team A. Now, this example only looks at 10 games against the same opponent. Does a team’s consistency really impact their chances of winning over the course of 162 games?

Looking at teams over the past 6 seasons the short answer is less volatility (particularly in terms of runs allowed) leads to more wins. The effect was much greater for run prevention than run scoring. So the more consistent a team is in terms of their runs allowed, game by game, the more wins they should expect. However, greater volatility–particularly in run prevention–leads to more wins than expected based on a team’s Pythagorean Expectation. The less consistent teams were in terms of their game by game run prevention, the more wins above their expected wins they realized.

The topic of team run distribution as well as player consistency or volatility has been dealt with a number of times in the past.

One of the earliest looks at the distribution of team runs was done by Keith Woolner at Baseball Prospectus (see here and here). Tom Tango followed up on this with his own model for predicting runs per game distributions.

Sal Baxamusa took this research one step further at The Hardball Times and began teasing out the implications of different run distributions–i.e. does team performance change depending on how they distribute their runs. In a 2007 article, Baxamusa showed that team’s did not gain a great advantage if they scored more than five runs per game. Therefore, the less variation teams had around their run scoring (i.e. the more they scored between 3-6 runs per game) the greater their likely winning percentage.

Baxamusa suggested that while run scoring should be consistent, run prevention should be less so. The logic was based on some work by David Gassko that for top-line starters you want a more consistent pitcher, but with lesser arms you want more volatility. Why? Imagine two different pitchers that both average the same runs allowed (say, 5 per game). Over the course of 30 starts, Pitcher A gives up exactly 5 runs in every start, while Pitcher B gives up 10 runs in 15 starts and 0 runs in the other 15 starts. Pitcher A is likely to go 15-15, while Pitcher B is likely to go 18-12.

Gassko’s research began to bridge the gap between the utility of team consistency and what that means for roster construction based on individual consistency. Our own Eric Seidman looked at individual pitcher consistency in a pair of articles while at Baseball Prospectus back in 2010 and I looked at this issue for both hittersÂ and pitchers at Beyond the Box Score last year.

For this article, I wanted to look at whether being more or less consistent gave teams an advantage in terms of their winning percentage.

Here are the details.

**Methodology:**

To start teasing this idea out I calculated the game by game difference of each team’s runs scored versus their seasonal average for each team from 2006 through 2011. I did the same for their runs allowed. Because team’s have different averages I calculated what is called the coefficient of variation–simply the standard deviation of a set of numbers divided by their mean. This helps to control for the impact that higher run scoring and lower run prevention can have on standard deviations.

**Runs Scored Volatility**Â **(RS_Vol)** = the Coefficient of Variation (Standard Deviation/Average) of the difference between individual game Runs Scored and Seasonal Runs Scored Average

**Runs Allowed Volatility (RA_Vol)**Â = the Coefficient of Variation (Standard Deviation/Average) of the difference between individual game Runs Allowed and Seasonal Runs Allowed Average

**Results:**

Team Wins | Actual Wins – Expected Wins | |
---|---|---|

RS_Vol | -0.37 | 0.16 |

RA_Vol | -0.71 | 0.85 |

Overall, RS_Vol had a negative relationship to team wins. So the more consistent a team’s run scoring, game to game, the higher their win total. The relationship was the same for RA_Vol, just much stronger. This runs a bit counter to some earlier thoughts on teamÂ consistency, as some have suggested that you would rather have a more consistent offense, but a more volatility in your run prevention.

I also looked the average wins for teams with different combinations of volatility. Teams that were below average in terms of their run scoring and run prevention volatility (i.e. less volatile, more consistent) won 93 games on average. Even teams that were just below average for RA_Vol averaged 84 wins, which further illustrates the advantage of consistency in that area (at least, that’s what the initial analysis suggests).

RS_Vol | RA_Vol | Average Wins |
---|---|---|

Below Average | Below Average | 93 |

Above Average | Below Average | 84 |

Below Average | Above Average | 79 |

Above Average | Above Average | 68 |

However, the relationship looks much different in terms of whether teams under- or over-performed their expected wins.

RS_Vol has a positive relationship with wins above expected (or simply actual wins minus expected wins). So the more volatile your offense the higher your likely wins above expected. RA_Vol was also positively correlated, but to a much higher degree (.85). So for teams that really outperform their Pythagorean Expectation it is most likely due to the fact that they are wildly inconsistent, game to game, when it comes to preventing runs.

What to make of all this? I think it leads to some interesting questions, most of which deal with roster construction. Given that larger impact on wins, should an above average team focus on building a generally consistent offense (particularly at the top and heart of the line-up)? Should a below average team look to bring in players–particularly pitchers–that are highly volatile in the run prevention game? What about the rotation? Should you build through consistent 1-3 starters and high volatile 4-5 starters?

As I found in earlier research, volatility is generally not a characteristic that is stable year to year, so building on it is difficult. However, over time players do seem to display a “skill” for consistency. In this way, volatility is like Clutch–highly variable year to year, but identifiable after a large number of observations.

There is lot’s of work to be done in this area. I plan on looking into these questions more, as well as other authors here at FanGraphs.

Peer review is always welcome, so please do send along your own thoughts, theories, or links to similar work.

*(Big hat tip to Jeff Zimmerman and Eric Seidman for help with this post and to Noah Isaacs for inspiring the research)*

Print This Post

It seems to me, as a Reds fan at least, that this usually comes up in the context of: Fan/Announcer complains that player X strikes out too much and insists that either he should be more consistent individually or that the team would be better off with a more consistent player. Of course, the obvious trade-off here is that the guy you have is more productive/talented overall, so you live with the inconsistency.

But if more consistent is better than less, where’s the break-even point. What’s the offsetting amount of overall production/talent you can give up and still be better off with a more consistent player? Is a consistent 2.5 WAR player actually helping your team more than an inconsistent 3.0 WAR?

And perhaps more fundamentally, how should consistency be defined at the player level?

Excellent question, and that’s what I hope to expand on further this year. Assume my analysis above is right at a team level (and it’s an assumption right now), how does that translate to roster construction? It’s a big, interesting question I can hopefully start teasing out.

This is a good comment, and I greatly look forward to Petti’s further studies.

Remember, however, when a manager, announcer, fan or other baseball type calls for a player to be consistent, he really is calling for the player to be consistently good, not that he want the player to play closer to his average.

It certainly applies to pitching. Imagine two pitchers, with both pitching exactly 6 innings each game with ERAs of 4.5. Pitcher A gives up 3 runs every outing, while Pitcher B gives up 1 run in 4 of 5 outings and 11 runs in the fifth outing. You’ll win more games with Pitcher B.

I posted something about this on some blog during the 2009 season, but cannot remember which one.

This is a great post. Glad to see this done with statistical analysis and coefficient of variance. It’s intuitive that if a team has zero variance in their production, scoring 5 runs and allowing 4 every game, they win every game. As their variance approaches an arbitrary maximum, their win percentage approaches 1/2.

If a given team constantly scores 25% more than their adversaries, wouldn’t arbitrarily large variance lead to a win% that reflects that advantage? Somewhere between the pythagorean (.609) and the run-ratio (.556).

You are correct. The point still holds, though, if a team scores 101 and allows 100 every time vs maximum variance. Winning every game versus winning roughly 1/2. By recognizing the extremes, one would probably guess the gist of this article. Interesting that it’s more considerable for run prevention.

It’s likely my reading comprehension is the issue here, but I’m not understanding how the sample size is arrived at, or why it’s not interfering with the results.

Given the amount of trades, free agents and other shuffling done, I’d think that a 25-man roster for a team would, on average, change rather significantly over a 5-year period.

If you want to look at it in the abstract, maybe they’re only swapping out the high and low ends of their spectrum (measured however you like: wOBA, WAR, etc.) and the core of their team is still the same basic approach…? But I’d think that you can’t smash together RS and RA from 5 different years to make your hypothesis. Getting better at reducing volatility in 2008 doesn’t let the team time-travel back to 2007 and pick up some extra wins.

So based on that, it’s about looking at just a single season. Perhaps framing the question as, “do teams that seem to consistently beat their Pythagorean Expectation, seem to have been above average or below average in volatility of RS or RA, when compared to the rest of the major leagues in that season?”.

It’s not 5 different years. It’s volatility in runs scored and against in each individual season for each team over 6 years, where each team’s individual season is a case (so 30 team season over 6 years gives you an N of 180)

Oh, so it

islooked at year by year, and then all of that is rolled up into overall trends and values? Ok, these work for me then.In one of Bill James abstracts he does this with pitchers careers. I think talking about Don Sutton. Can’t remember the year. He concluded that pitchers who have up and down careers are more helpful to teams winning pennants. I once had a link to a site that had an index of the abstracts. Doesn’t seem to work anymore.

Makes sense. Think of a team that is on the cusp of making the playoffs, ie they are a 90 win true talent team. They can go with pitcher A who is always a 2 WAR player or Player B who is a 4 WAR player half the time and a 0 WAR player the other half. While they will sometimes make the playoffs as a 90 win team, they will always make it as a 92 win team and sometimes even as an 88 win team.

I think it’s Koufax in his historical baseball abstract, or in his Hall of Fame book.

Seems like a perfect assignment for a simulator. BBTF?

Before we try to figure out anything on the player level, we need to accurately define what is team consistency.

A team with unlimited depth can simply swap out struggling players with better players, which means consistent play that creates the illusion of consistent players. Further, inconsistency could be due to injuries. Once depth and injuries have been controlled for, then we need to see if there is any relationship.

Fantastic work. Currently doing research on the profile of a consistent hitter by identifying what type of hitters perform well against good pitchers.

So if pythag is independent of volatility, and high volatility maximizes your actual wins over pythag, logically you should want to maximize volatility as long as you do not sacrifice too much on quality.

Yes, this is why fantasy managers value volatile youth over consistent veterans in later rounds, even when their projections are similar.

This is particularly true in fantasy where 2nd place = last place. Missing the playoffs in MLB is a “bad season” but a .525 team will still draw more than a .400 team.

An article published by Braunstein in 2010 in the Journal of Qualitative Analysis in Sports addressed this issue. Here is the abstract:

“Pythagorean win share has been one of the fundamental contributions to Sabermetrics. Several hundred articles, both academic and non-academic, have explored variations on Bill James’ original formula and its fit to empirical data. This paper considers a variation that is previously unexplored on any systematic level, consistency. After discussing several important contributions to the line of literature, we demonstrate the strong correlation between Pythagorean residuals and several notions of run distribution consistency. Finally, we select the “correct” form of consistency and use it to construct a simple regression estimator, which improves RMSE by 11%.”

Here is a link to the PDF: http://www.largedocument.com/2/56d6a4c1/pythagoras.pdf

Typo, meant to say Journal of Quantitative Analysis

Daniel, thanks for this. Definitely going to check it out.

Ya, Braunstein publishes some neat stuff. Here is another one I really like. It shows what the best measures of effectiveness are for pitchers and shows how the best measures are different for starters than relief. If you don’t know him well, you should definitely read up on some of his stuff. Here is the abstract for this one:

“A plethora of statistics have been proposed to measure the effectiveness of pitchers in Major League Baseball. While many of these are quite traditional (e.g., ERA, wins), some have gained currency only recently (e.g., WHIP, K/BB). Some of these metrics may have predictive power, but it is unclear which are the most reliable or consistent. We address this question by constructing a Bayesian random effects model that incorporates a point mass mixture and ?tting it to data on twenty metrics spanning approximately 2,500 players and 35 years. Our model identi?es FIP, HR/9, ERA, and BB/9 as the highest signal metrics for starters and GB%, FB%, and K/9 as the highest signal metrics for relievers. In general, the metrics identi?ed by our model are independent of team defense. Our procedure also provides a relative ranking of metrics separately by starters and relievers and shows that these rankings differ quite substantially between them. Our methodology is compared to a Lasso-based procedure and is internally validated by detailed case studies.”

Here is the PDF: http://www.largedocument.com/2/56d6a4c1/A_Point%2DMass_Mixture_Random_Effects_Model_for_Pitching_Metrics.pdf

He also has one on the other side for hitters but the results aren’t as exciting: http://arxiv.org/pdf/0911.4503.pdf

i might be wrong but i remember the dbacks get outscored on the season a few years back but still led the NL in wins ?

That was the 2007 D’Backs… .66 RS_VOL and .69 RA_Vol…

I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed. The best test would be to look at the correlation between the residuals (the difference between actual wins and Pythagorean wins) and the coefficient of variation.

That’s what the second set of correlations looks at–the relationship between runs scored/allowed consistency (i.e. the coefficient of variation) and the difference between actual and expected wins.

Good idea. I also wonder if it might be useful to consider the interaction between runs scored and prevented which is inherent. For example, put teams into buckets that represent the quintiles of run production and see if the relationship between run-prevention volatility and success is similar across the buckets. The same then for buckets of run-prevention.

Basically, the gap between a team’s run prevention (or production)and the league average should impact how valuable volatility is. I speculate that volatility softens the impact of below average performance and blunts the benefit of above-average performance.

I have a couple of thoughts here. Of course these are all narratives with no real stats behind them. Please take at face value.

1) I would imagine that for Pitching quality and consistency are closely related. If a player is constantly below average, he doesn’t last long in MLB. The only reason below average players can hang around is if they show flashes of brilliance. Of the consistent pitchers, I would imagine they would all be above average. Thus, the team with the most consistent pitchers would also have the best pitchers as a general rule.

2) Position players can make up for poor or inconsistent hitting by being valuable in the field or on the base paths. Defense and fielding are not accounted for in this study. Combined with #1 this might explain why low volatility in RA is more responsible for total wins than RA.

3) I think the wins above pythag expected comes down to avoiding outs. In order to score runs, a team usually needs to string together hits. This is why consistency would benefit. Four singles are much more valuable than one HR and three outs. Linear weights bear this out. Winning more than expected requires teams to have their hits strung together (consistency) which to the best of my knowledge is luck.

4) For pitching variance, I see this as getting good results from bad players (see #1, quality and consistency related). This, like #3 comes down to luck. To summarize #3 and #4, lucky hitting looks like consistency. Lucky pitching looks volatile.

“Defense and fielding are not accounted for in this study. ”

aren’t they included since defense [and?] fielding are part of run prevention?

Is there a way to measure “streakiness” of players, and simulate a thousand games played by nine 2011 Mark Reynolds vs nine 2011 Yadier Molinas?

…trying to pick examples of similar wOBA but seemingly different “consistency.”

Great stuff. Just wondering out loud if its individual player volatility that *matters* or simply the volatility of the roster of a whole. On an individual level, there’s presumably zero correlation, so Volatile Hitter A’s hot streak is going to get offset by Volatile Hitter B’s cold streak often enough that net-net, the *whole* may be not particularly volatile.

But what about a “studs and scrubs” sort of team? Your work suggests a team with a couple aces and an awful back end of the rotation and shallow bullpen is preferable to a similar ERA team with a more even distribution.

I think the last paragraph is only true if the back end of the rotation has high volatility. (I can’t help but think of someone like Edwin Jackson. He seems like the quintessential 1-run, 6-run starter with a 4+ ERA.) Otherwise, you run the risk of not even giving yourself a chance in the vast majority of scrub starts…and you’ll still only perform “normally” for the ace starts, since you don’t get to choose which days the offense clicks, and which games it doesn’t.

Runs allowed volatility should be related to differences in umpiring, team defense, and a big gap in performance between the Ace and 5th or lower starters.

A paper needs to be written that looks at team defense and range and wins. I suspect teams that have truly exceptional defense win more than their run differential would imply.

Random thought: The lower the run scoring (allowed or created), the higher the measured volatility will be — even if ‘true’ volatility is the same. In an extreme example, if a team averages scoring 1.5 runs, the std dev will be a lot higher than that of a team which averages 10 runs a game….simply because you cannot score 1.3 runs or 1.7 runs in a given game, and not because the 1.5 rpg team is any less inherently consistent.

Not sure if this affects the goal of your analysis at all, since you may be willing to lump in this source of volatility with your assumed source (player inconsistency), but it’s worth noting.

The same is true for a team that averages 10.5 runs. They can’t score 10.3 or 10.7 runs in a game. It would in fact be astonishing if a team that scored 10.5 runs/game had a lower standard deviation than a team that scores 1.5 runs/game.

As the mean goes to 0, the standard deviation must go to 0 as well (since you always score at least 0 runs in a game), so it’s not really clear that there should be any scale effect like you’re describing.

C’mon. Realize that the difference between 10 and 10.3 is tiny, and that the difference between 1 and 1.5 is huge, in percent terms — which is how the author clearly states he has measured volatility:

“I calculated what is called the coefficient of variationâ€“simply the standard deviation of a set of numbers divided by their mean.”

Any time you are measuring something where the unit of measurement is a large percentage of the total values being measured, you need to be aware of this issue.

Around the time of the WS IIRC Tango demonstrated the same scenario that top level pitchers are better off consistent lesser guys do better being more erratic.

I believe the discussion was about Edwin Jackson and pitchers like him.

I found it interesting since inconsistency is usually a term used in a negative way.

When we talk about consistency we’re usually asking why a certain player can’t just be really good all the time? The answer should be obvious. They don’t have poor performances just to make things interesting.

While this article is really about consistency, I just feel like mentioning that one other thing that helps teams outperform their Pythag values is the play of their bullpen. If you’ve got a really strong back end of the bullpen, you’ll perform better than expected in close games. They won’t have an effect either way in games that are blowouts, of course, but if the game is tied or within one or two runs, having better run prevention late in the game will keep a team in it.

This makes sense in my gut and experience, and I agree with it…but is there evidence that backs up this idea?

I recall a few studies that showed bullpen performance was at least mildly correlated with over/under performing Pythag.

Some argue that consistency (streakiness) does not exist. It’s just random variance, or lack of variance, that occurs by chance.

I don’t actually buy that though.

I would like to see a study on the most consistent players game to game over a several year period. One thing I found that is interesting in a limited study, is that if you take the 100 worst games of any players in a given season, 90% of them bat under the Mendoza line in these games, even some of the best players in the game.

Another way to look at it is to look at players record in games where the team scores between 3 and 8 runs only, to exclude the effect of dominant and awful pitching performances. I suspect a lot of offensive inconsistency is due to hitters padding their stats against sub-par pitching and being dominated by good pitching.

A few teams may get lucky (unlucky) over the course of the season by missing the other teams ace more often than not (or getting the other teams ace more often). This is more of an issue in todays game with 5 man rotations and so many 2-3 game series (compared to previous era where a 4 man rotation was used and there were more 4 game series). It averages out over the season for most teams, but not every team.

Pitcher consistency is a different story. Good pitchers can pitch well even against the best teams and regardless of an umpires strike zone Bad pitchers are more dependent on the quality of the team they face, how hot/cold that team is, and the size of the umpires strike zone. So while they appear more inconsistent, their pitching is probably the same, but the results differ for reasons they don’t control. The good games by bad pitchers are simply flukes due to bad/cold teams and/or a generous umpire.

Any thoughts on how team consistency in run production ties in to lineup construction?

I know optimal lineup construction has a relatively small impact on runs scored totals. But could optimal construction provide better run consistency? If so, this would mean optimal lineup construction has a bigger impact than gross totals imply, right?

Were the Braves better off last year to have Dan Uggla have the wild swings from month to month, going from a .160 average for a couple months and then putting up great numbers in July/August? He certainly, along with hot-hitting Freddie Freeman(1.033 OPS in August) helped carry the Braves for a couple months, and you have to credit Fredi Gonzalez for batting them 3-4 in the lineup while they were really hot together. They were 33-20 during that time, but two players alone are not enough to prove anything, of course. Fun to think about, though.

Fascinating stuff. I’ve thought about this in the past, and I would be interested to see if this has any impact on playoff performance. It certainly seems like high variance teams might have an advantage in the postseason, because they have the ability to string together a few great outlier performances. It seems like most of the wild card teams that have had success recently were teams that ran hot and cold during the regular season.

Interesting, if true. Perhaps you could do a study on this. I would welcome it.

Bill, do you have any specific team examples of teams that outperform their pythags and are found to be runs-allowed volatile? I wonder if any of those recent overachieving Angels teams are on that list.

Bill, thanks for digging up those old THT articles on the Weibull distribution. One thing that occurs to me is that “standard deviation” is poorly-defined for run distributions because the distribution is not normal. For a Weibull distributed variable, I would look at the shape parameter (“gamma” in the second plot in the THT article you linked to).

The league shape parameter is typically around 1.85 or so, and is equivalent to the exponent in the Pythagorean (actually, Pythagenpat) win% equation. But individual teams have different shape parameters based on their actual run distributions. The difference between the team shape parameter and the league shape parameter is a better measure, in my opinion, of volatility. This would require fitting a Weibull distribution for every team, but that’s a relatively easy programming task (I would love to do this if I had the time).

My belief from having studied this is that you can’t *construct* a team to have greater or smaller volatility, but that it does happen and that it explains a great deal of deviation from Pyth win%. If anyone finds a way to predict, a priori, what the necessary conditions required for controlling the volatility (or shape) of a team’s run distribution, I’ll buy you a beer.

In the meantime, it’s a heck of a lot easier to stock up on hitters that hit the snot out of the ball and pitchers with great breaking stuff than it is to squeeze blood out of the run distribution stone. It’s a really fun analytical thing to look at but I can’t see the practical consequence.

Thanks for a well thought out article.

I looked at this issue a couple of years ago (and even earlier many years ago on the SABR list). See

http://cybermetric.blogspot.com/2010/04/how-much-does-team-consistency-matter.html

I first ran a regression with team winning percentage as the dependent variable and runs per game and opponents’ runs per game as the independent variables. Then I added two variables in a second regression which measured consistency. HITCON was the standard deviation (SD) of runs per game divided by runs per game (just the SD would not be right since high scoring teams will have a greater SD). PITCON does something similar on the pitching side.

I found that the more consistent hitting teams win more for a given average runs per game while the less consistent pitching teams win more.

I found it is much more important to score and prevent runs than become more consistent (or less, on the pitching side).

What exactly do the numbers in the results table mean, like the -.37?

in a comment newsense wrote: “I think you may have overlooked the possibility that the coefficient of variation could still be correlated with runs scored or allowed.” I completely agree. The coefficient of variation is not independent of runs scored or runs allowed. I looked at runs scored and allowed data from 1998-2011 (420 data sets). The coefficient of variation (RS_Vol) clearly decreases with increasing runs scored. The same goes for runs against (RA_Vol). This correlation explains the below/above average table and suggests it may have nothing to do with consistency.

This is what I expected as runs scored should follow a Poisson distribution. There are some reasons why the Poisson distribution is not exactly correct (runs come in bunches). If the Poisson distribution was followed the standard deviation in runs should go as sqrt(runs). So that RS_Vol = sqrt(runs)/runs = 1/sqrt(runs). The actual data does not show this form precisely but the trend of decreasing coefficient of variation with increasing runs was certainly present. My toying with the data found that a revised measure of volatility of sigma(Runs^(1.5))/Runs gives a measure of volatility that is close to independent of runs.

The comparisons of actual wins with pythag expected wins should remove the link between runs scored and volatility. However, since the volatility is correlated to runs scored and runs against I would be worried that the actual measures are more expected-actual trends with runs scored and against. It would be interesting to see if these trends remain with a modified measure of consistency that is independent of runs scored and allowed.

Given the time I will try the Weibull distribution analysis discussed above.

Bad paren placement above. Should be: My toying with the data found that a revised measure of volatility of sigma(Runs)^(1.5)/Runs gives a measure of volatility that is close to independent of runs.

If the outliers of the runs scored were removed from the sample using Chauvenet’s criterion I conjecture that the results data would show more in line with the initial thinking, that more consistency was optimal for an offense, even against their expected wins.

I’m actually not sure I came to the write conclusion, but it could possibly give a more accurate picture of consistency.