Playing at the extremes

As the 2012 regular season wound to its close, eyes turned to the top teams in the leagues as they battled for playoff berths and seeding positions. Meanwhile, other baseball was being played. Losing teams faced losing teams, with nothing at stake but a ballgame, and still charging admission to watch them play.

Which games was I more interested in? The playoff stuff, naturally. But I was thinking about those other games.

What I was thinking about was, were those games between cellar-dwellers intrinsically different from other games? Were they necessarily boring in and of themselves, divorced from the standings? Would the scores be lower through the futility of their hitters, or higher due to the incapability of their pitchers and fielders?

Of course, similar questions can be asked about teams in the upper echelon when they play each other. Are the games necessarily exciting even when you forget the division title at stake? Will good pitching stop good hitting, or vice versa (as either Casey Stengel or Yogi Berra sorta said)?

If I’ve written this much about those questions, you know by now that I went and tried to answer them. I looked over teams and games from 2000 onward (except for this past season: The right data weren’t available yet to include 2012 easily) to see what’s different when the best play the best, or the worst play the worst.

Who’s good and who’s bad

The first step is choosing dividing lines for the very good and very bad teams we’re studying. My original thought was to put the cutoffs at .600 and .400 winning percentages: 98 and 64 wins respectively. However, while this might have worked a few generations ago, it doesn’t today. There is much greater parity in modern baseball; records don’t stray as far or as often from .500.

That parity is, interestingly, a bit lopsided. In the years 2000-2011, only seven teams finished a season above .600, but 15 had a year below .400. Only twice did two .600+ teams play in the same season (2001 and 2002), but it happened five times for sub-.400 teams, including 2002 when there were three in one year. (2002 was an extreme year, for whatever reason.) It is easier to stink than to soar, even when everyone is trying to soar.

So I had to lower my cutoff standards. I looked at a .575/.425 line (94 wins for winners, 68 for losers), but this brought in a few too many teams, so I decided on two .575/.425 lines. To make the survey, a team had to beat the mark both with its winning percentage and with its Pythagorean record. The Goldilocks method finally achieved success: Most years 2000-2011 had games between both very good teams and very bad teams, but an overload of such games was uncommon (2002, of course, being the worst).

My data set includes only regular-season games. Postseason play is a natural time for very good teams to play each other, and there are nine playoff series (including two World Series) in my 12 surveyed years that would qualify. However, this brings in the confounding factor of weather. October baseball is cooler baseball than average, which suppresses run production. Since I want to see whether run production changes due to the quality of teams playing, that’s no good. Using only regular-season games gives me a spread across the months that I hope matches average weather conditions.

Where the runs come from

I compared the runs scored by each team to the average home and visiting runs scored in the home ballparks that season. I had to be a bit careful, as teams didn’t always play 81 games at home. (From the “bad” bucket, the 2011 Mariners played 84 home games, but that’s the outlier.)

My sample includes 222 games played between very good teams and 230 played between very bad teams. This should be enough to give us a general idea of how the run scoring plays out. The general idea it produces is … intriguing.

                     HomeRunDiff  VisRunDiff  TtlRunDiff  RunDiff/Gm
Good vs. Good (222)    -147.9       +95.3       -52.6       -0.237
Bad vs. Bad (230)      +184.95     -102.91      +82.0       +0.357

It stands to reason that the differential trends run toward the middle: Good home teams will score less playing other good teams, and so forth. It’s the totals that call for our attention. Good teams lower their combined run totals, while bad teams raise theirs, and by a greater amount. The numbers aren’t wholly decisive with the moderate sample size, but they are definitely suggestive.

As for why this happens, I have a hypothesis ready to hand. I propose that these numbers suggest an inherent primacy of baseball defense—pitching and fielding—over baseball offense.

I assume that very good teams will have superior offense and defense in roughly equal measure, and the reverse for very bad teams. Given that, the defense appears to have the upper hand in determining total scores. The good pitching of the good teams holds down the numbers, while the bad pitching of the bad teams lets even inferior bats run a little wild.

This idea falls in line with the work Bill James did in creating his Win Shares system. He found the numbers worked best when pitching plus fielding received a combined 52 percent of the credit for producing wins, with offense at 48 percent. His rationale involved the different bounds of futility for offense and defense. There is a floor to how badly you can bat—zero—but there is no ceiling to how many runs a hapless pitching staff can give up.

The survey numbers give this idea some support. It’s the bad versus bad games that shift run differential more: Bad pitching is creating more runs than bad hitting can give away.

The Incompleat Starting Pitcher
The end of the nine-inning start and how we got here.

So it may be that either Mr. Berra or Mr. Stengel was right, as their convoluted quotations so often were. Good pitching will stop good hitting, but the vice versa in this case is that bad pitching will get beaten by bad hitting. We can forgive Casey or Yogi for not making it clearer back in the 1950s: Neither man dealt much with bad pitching on his team.

On a tangent dealing with the deeper numbers, I briefly thought I had made a real find when I looked at the combined winning records and run differentials for teams in the survey. In the “good” section, home teams went 120-101 (with one tie) while outscoring the visitors only 1,017-990. The “bad” home teams went 130-100 with a run differential of 1,134-1,054. These Pythagorean theorem-busting numbers seemed to mean something big, a demonstration of the massive tactical advantage to batting last when teams are evenly matched, at least at the far ends of the spectrum.

Then I remembered it: Winning home teams don’t bat in the ninth. Home scoring numbers get depressed a little by the rules. Nothing to see here. I’ll just move along now, properly reminded to check all the influences on the numbers.

Where the excitement comes from

The offensive/defensive balance is all fine, but are the games more fun? Do the clashes of the titans make for great baseball, and are battles in the cellar worth using up good beer and chips* watching them?

* Beer and chips were chosen as examples, not requirements. You can eat and drink whatever you like watching baseball, within bounds of legality, sanity, and hygiene.

Anyone reading this site in the last month knows all too well about the WPS Index, my attempt to measure statistically the excitement of baseball games. I had thought I’d be putting it in mothballs for a while after the final out of the World Series, but here’s another question for it to answer. Stick with me just a little longer: There’s a nice surprise coming.

As my baseline, if a somewhat unusual one, I had available the WPS scores for virtually all 1,387 postseason games in major league history. (I’m missing only the 1946 NL playoff: Win Probability Added numbers are not available.) This is not far from being its own good teams versus good teams subset, though the playoffs of recent decades are rather less exclusive than my standards.

One small problem with the postseason subset is that virtually all the games are played in October, with a few September contests balancing the few in November. The cooler weather suppresses scoring a bit, as observed earlier, and probably also has a cooling effect on the WPS Index, as the system tilts a little toward high-scoring games as being more exciting. Awareness of the bias is its own remedy: We can be mindful of these effects when interpreting the data.

Also, I will make a further subdivision of postseason games played after the divisional system was instituted in 1969. This isn’t so much for the new playoff rounds as to set aside World Series played in the two deadball eras of 1903-1919 and the 1960s. Low scoring damps WPS values, and make those series an iffy comparison to the 21st-century games I’m analyzing.

The average WPS numbers are listed below. Mean scores are higher than the medians for a reason mathematically related to things discussed above: There’s a floor to the ratings, but theoretically no ceiling.

                           WPS Median   WPS Mean
Postseason, 1903-2012       297.45      327.37
Postseason, 1969-2012       303.5       335.96
Good vs. good, 2001-2011    307.55      338.00
Bad vs. bad, 2001-2011      312.5       349.63

The battles of the best match up pretty well with post-deadball playoffs, just a few points ahead. Given the cooled-off bats of the postseason, weather could well account for the difference (though we’ve seen that good teams playing each other arguably lowers the run totals without weather as a factor). We can call it even: Regular-season match-ups of the top-level teams are as exciting as playoff games.

But then there are the battles of the basement. They beat the playoffs; they beat the contemporaneous games at the top of the standings, and by a wider margin that the good versus good games beat the postseason contests. The margin isn’t wide enough to make blanket statements, not with sample sizes in the 200s for the 21st-century groups, but again it is highly suggestive.

The suggestion is that, if you’re looking for a fun and exciting game, you don’t have to limit yourself to the playoff contenders. A couple of tail-enders can provide as much enjoyment, or even more, as a meeting of the powerhouses, as long as you aren’t drawing too much of that enjoyment from the postseason scenarios. So break out your preferred combination of food and drink: The Cubs and Rockies are playing today!

The basement games may be getting part of their boost from the uptick in combined scores that I found earlier, which tends to raise WPS numbers. For both sets, penthouse and outhouse, it may be the relative balance of the teams playing, rather than the excellence compared to the league, that makes a baseball game the most exciting. This makes some objective sense: A close game has the most potential for excitement, and that’s more likely between teams of roughly even quality. I did not survey games between very good and very bad clubs—perhaps I should have, and perhaps I will—so you may take this with however large a grain of salt you like.

That’s my quick look at baseball games at the ends of the bell curve: just different enough to be interesting, and just interesting enough to be different.

References & Resources
Game data came from Baseball-Reference and Retrosheet. Bill James’ Win Shares offered perspective on the offense-defense balance in the game.

Print This Post
A writer for The Hardball Times, Shane has been writing about baseball and science fiction since 1997. His stories have been translated into French, Russian and Japanese, and he was nominated for the 2002 Hugo Award.
Sort by:   newest | oldest | most voted
1946 NL and all other games/series to break a tie for the pennant, division, or last wildcard spot are regular season games, not postseason. I wonder if the differences we see between good -good and bad-bad are statistically relevant. Historically there is evidence that teams that play in more extreme hitters parks tend to bad because they don’t properly evaluate their players in light of the park factors.  This may also have them keep trying to improve their pitching, when it is already better than their offense. I don’t believe that the developers of the various WAR systems available now… Read more »
Shane Tourtellotte
Shane Tourtellotte

Absoutely right. (smacks forehead)  How about I call them “post-schedule” games?

There probably has been some mis-evaluation of talent due to park factors, or at least mis-application. (And occasionally some insight.  There was talk at some point of trading Ted Williams for Joe DiMaggio, so both could enjoy parks better suited to their power.  Never really would have happened—too lopsided a trade—but it showed someone grasped something.)  I wonder whether modern analysis has mitigated the errors, or whether we could tell if they had.


Let me offer a few guesses at why the good/good games are lower scorring.  Increased tendencies to play for one run, and also to use your best pitchers.  Both of these would be because of the importance of needing to win against your main competetion.

David P Stokes
David P Stokes
A few points: First, I’m not at all surprised that games between 2 good times tend to be lower scoring, and games between 2 bad teams tend to be higher scoring. Second, winning home teams do bat in the ninth sometimes, if they mount a late comeback (or if the game goes to extra innings).  I would expect that games between 2 good teams to have fewer late comebacks (given that they are generally lower scoring contests) but perhaps more extra-inning games.  Conversely, I’d expect more late comebacks (but fewer extra inning games?) when 2 bad teams play each other. … Read more »

Don’t forget that there is a lower bound on scoring but not an upper bound.  My guess is that’s the entire reason for the offensive vs. defensive numbers you’re seeing.


Reading closer, I see you didn’t forget that!  But I’m not sure that supports the notion of defense over offense. It could, but there can be other factors at play, including the home park of winning teams and the natural strengths of winning teams.

BTW, Bill admitted that he used 52% because he wanted to make sure he didn’t underweight pitchers batting, or something like that. I’d have to look it up.

Love the WPS results. They make sense. BTW, in the THT Annual, I’ll find that the most “boring” game of the year was Matt Cain’s perfect game!