Quantifying the Market Size Advantage in MLB

Let’s face it; baseball is a business. Men and women who want to make money run it. To me this is not a problem, because many things that make me happy in life businesses provide. But the fact that baseball is a business has a powerful impact on explaining what type of baseball fans receive.

When consumers want steak, the grocer that fails to provide steak is going to bring in less income than if he had steak. Similarly, if fans want baseball with competitive balance, owners would be hurting themselves by not providing it.

I believe this is why MLB operates as it does despite the fact that larger cities may have a revenue-generating advantage over smaller markets. As long as the advantage is not too large, fans still receive the competitive balance they desire from baseball. And any prolonged under/over performance by specific clubs might be the result of factors other than market size.

Attempts to limit the big market advantage are not without risk. Revenue sharing, the most popular solution to the problem, while giving low revenue teams more cash also creates a disincentive for winning. Thus, proposals to minimize competitive imbalance must be crafted with caution.

In order to determine whether or not there exists a problem that needs correcting we must know the exact magnitude of the imbalance and how important this imbalance is to the current distribution of wins across teams of differing market sizes.

In this article I present a study on the effect of population differences inherent to the current geographic dispersal of teams on the variance of wins across teams. I find that market size did affect winning, but that the effect was too small to explain the gap between the good and bad teams of recent history.

It is no surprise that MLB has at least one team in all but two (Portland and Sacramento) of the top 26 Metropolitan Statistical Areas (MSAs) in the US. If you are going to generate fans by tying your team to a locality, locating in bigger localities will generate more fans and more revenue for owners.

Extending the logic, this means that bigger cities ought to produce more revenue for owners than smaller cities. In an open market for players, the best players gravitate towards the teams with higher salary offers. Because fans generally prefer to watch winning teams, owners seek the best players to bring in more revenue.

Players who generate wins command salaries commensurate with the extra revenues they generate for owners in terms of increasing fan interest. Teams in large cities have a greater pool of fans to enjoy wins; therefore, wins are more valuable to big market teams. While an extra win per season in Kansas City may increase yearly attendance by 10,000 fans, one more win for a New York team could generate a million more fans in a season.

It appears that big cities ought to have an inherent advantage over small cities in attracting good players; thus, big cities can translate this advantage into wins on the field by purchasing the better players. (Though this result is quite intuitive, I want to acknowledge that it is strongly supported in the academic economics literature. For example, see “An Economic Model of a Professional Sports League” by Mohamed El-Hodiri and James Quirk in Journal of Political Economy, 1971.)

For baseball fans this is a troubling situation. The joy of competition is watching players on the field exploit all of their abilities to win the game. The uncertainty of the outcome is part of the thrill of witnessing sports events. The indeterminacy of the game is what makes the game addictive and fun.

If certain teams have advantages over other teams solely due to the population of their fan bases, the indeterminacy of competition disappears. Big market teams will always have the best players, and small markets can rarely host winners due to the inequality of revenue sources for clubs. Baseball’s Blue Ribbon Panel in 2000 devoted a significant portion of its analysis to remedying this potential problem.

According to the panel, the owners feel that it is important for every team to have “at least periodic opportunities for success,” in order to keep maximum interest in the game. If some teams have inherent advantages due to their market, this “standard” may be in jeopardy. In order to determine the magnitude of the problem, I tried to measure how much city size influenced the ability of teams to win games. Big cities may have an advantage over small teams, but how many games is this advantage?

I used regression analysis to estimate the impact of population size on the average wins of teams from 1995-2002. The eight-year average ought to be sufficient to cancel out abnormally high or low win years for teams. I measured market size with two measures: 1) the population of all metropolitan statistical areas (MSAs) of MLB teams in 2000 and 2) the number of Nielsen rating households in the MSA.

Unfortunately, the sample size of 30 teams was smaller than I preferred, but there was not much I could do about it. City sizes are fairly stable so not much would have been gained by adding multiple observations of teams in different years. [I could quibble all day about my empirical methodology, but that would be distracting. I would be happy to handle any specific concerns via e-mail.]

For simplicity, I present the regression estimates graphically on two scatter-plots. First are the estimates for MSA population, the second is for Nielsen households.

A comparative study on an unwritten rule of baseball.
image image

Each point in the figures plots the average wins for each team and the size of the city for the sample. The regression lines estimate the relationship between wins and population size. The lines are upward sloping, indicating that larger cities were associated with winning more games than smaller cities. Both of these estimates are “statistically significant,” which means that it is very unlikely that these relationships were the result of random chance. I do not attempt to say why, other than that big cities may have had more revenue to purchase free agents, coaches, management, minor leagues, etc.

However, the story does not end here; the real question is “how much was the big city advantage over small cities?” The regression estimates were:

Wins = 76.43 + .0526 (MSA Population in 100,000s)
Wins = 75.68 + .168 (Nielsen Households in 100,000s)

For simplicity I translate these estimates into wins. The MSA population regression estimated that every 1.9 million residents generated one extra win per season. For illustration, the largest market (New York) is expected to win 10 more games than the smallest market (Milwaukee). The Nielsen regression estimated that 600,000 households resulted in an extra win per season. The difference between the smallest (Kansas City) and largest (New York) markets of this estimate was approximately 11 games.

So what did this advantage mean? During the sample the Yankees won an average of 30 games more than the Kansas City Royals and 24.5 games more than the Milwaukee Brewers. The difference in market size explained 30 to 40 percent of the difference in wins between the top and bottom markets.

Thus, the other 60 to 70 percent of the difference was due to other things such as luck, the ineptitude of the Royals and Brewers, and the skill of Yankees owners, managers, coaches, and players. Additionally, the estimates had low R-squares of about .12 indicating that the estimated equation explains about 12% of the variance of wins.

To further the analysis I subtracted the total number of wins due to MSA population from the Actual Wins total to create the Population-Adjusted Wins per season. Population-Adjusted Wins reflect the results of the team due to factors other than market size. The table below lists all of the teams in MLB with the mean Actual Wins per season from 1995-2002, along with several measures that modify wins per season by removing the influence of population. I did not make similar calculations using Nielsen households because its correlation with MSA population is high (r = .97) and would yield nearly identical estimates.

Team            Wins  Pop-Adj  Predicted  Residual  Post-Season
                        Wins     Wins      Wins     Appearances
Atlanta         97.5    95.3     78.6      18.9         8
Cleveland       90.7    89.2     77.9      12.7         6
Seattle         88.6    86.7     78.3      10.3         4
Arizona*        88.0    86.2     78.1       9.8         2
NY Yankees      95.5    84.3     87.5       7.9         8
Boston          86.8    83.8     79.4       7.3         3
Houston         86.2    83.7     78.8       7.3         4
St. Louis       83.2    81.8     77.8       5.4         4
San Francisco   85.2    81.5     80.1       5.1         3
Oakland         83.3    79.6     80.1       3.2         3
Cincinnati      80.5    79.4     77.4       3.0         1
San Diego       78.7    77.2     77.9       0.8         2
Texas           80.0    77.2     79.1       0.8         3
Los Angeles     85.0    76.3     85.0      -0.0         2
Colorado        77.5    76.1     77.7      -0.2         1
Chicago WS      80.8    76.0     81.2      -7.0         1
Toronto         77.3    74.9     78.9      -1.5         0
Baltimore       77.2    73.2     80.4      -3.1         2
NY Mets         83.0    71.8     87.5      -4.5         2
Florida         73.8    71.8     78.4      -4.6         1
Anaheim         80.3    71.7     85.0      -4.6         1
Minnesota       72.8    71.3     77.9      -5.1         1
Montreal        72.8    71.1     78.1      -5.3         0
Philadelphia    73.3    70.1     79.6      -6.3         0
Milwaukee       71.0    70.1     77.3      -6.3         0
Chicago Cubs    74.2    69.4     81.2      -0.3         1
Pittsburgh      70.0    68.7     77.6      -7.6         0
Kansas City     69.0    68.0     77.3      -8.3         0
Detroit         65.7    62.8     79.3     -13.5         0
Tampa Bay*      63.6    62.3     77.6     -14.0         0

*Totals for Arizona and Tampa Bay are for 1998-2002.

The Population-Adjusted Wins are useful for analyzing how much of a team’s success was due to factors other than market size. It is as if each team played in a locality of equal size, and only chance and the skill of the owners, managers, coaches, and players influenced the outcome.

With these metrics the Yankees could not win more games than other teams because of a population-generated revenue advantage. Even without their big-market advantage the Yankees were fifth in Population-Adjusted Wins. While the Yankees were second in Actual Wins, it is clear that any big market advantage was only a small part of the success of this organization over the sample period.

Predicted Wins are the wins predicted by teams based solely on population size, irrespective of team success over the sample period. Viewed with Residual Wins (the difference between Actual Wins and Predicted Wins) this tells how good teams were at over/under performing given their population size. The Yankees operated in a big market, but they did many other good things to attain success, winning eight games more than predicted by the population advantage.

Similarly, the small market Royals did many bad things that contributed to their failure leading the team to win eight games fewer than predicted. Seven of the eight clubs that never made the playoffs during the sample years were among the bottom eight teams in Population-Adjusted wins and Residual Wins, with Toronto being the lone exception.

In summary, market size has played a role in winning in the recent history of MLB, but the market size effect was not the main reason for the dismal performances of several small market clubs. It was the “other stuff,” including poor management and bad luck, which explained more of these clubs losing ways. These teams would have been bad even without the influence of market size.

I would like to end by addressing a few possible objections to the study. There are a few questions I expect readers to ask, so I will try to answer them in advance. If you have further questions or find my answers unsatisfactory, please let me know.

• What about localities with multiple teams?

I did consider this and ran regressions using specifications that halved the population in New York, Los Angeles, Chicago, and San Francisco/Oakland. I also ran a specification leaving population as is and included a dummy variable equal to one for dual-team cities. The effect on the regression estimates was very small and the fit of the models was slightly worse.

• Why not use team revenue instead of population size to measure market size?

Revenue is related not just to market size, but also to the quality of team management. The problem is determining whether the failure to generate revenue is predetermined by the geographic structure of the league or due to bad business decisions. Cleveland and Seattle bring in large amounts of revenue with small metropolitan areas, while the larger markets of Philadelphia and Detroit do not.

The quality of business decisions is the main cause of this disparity, which means the effect on competitive balance is not inherent to the structure of the league. New management, not revenue sharing, is the answer to these problems. Of course teams with higher revenues will spend more on winning than teams with low revenue. I think that is a good thing, because it encourages teams to seek out new revenue sources by satisfying consumer demand.

• What about using other control variables in addition to population?

I am not sure what I would gain, while giving up very precious degrees of freedom. I considered including a few other variables, but I ultimately decided they were not worth the effort of gathering. If you suggest and supply some data, I would be willing to run estimates on alternate specifications and discuss the results.

References & Resources
Thanks to Doug Drinen, Aaron Gleeman, Charles Israel, and Studes for their helpful comments and suggestions.

Print This Post
Sort by:   newest | oldest | most voted
Charles Devlin
Charles Devlin

Hey Jc, I was wondering how to run a regression? What would the variables be and such if I was going to mimic your results exactly? I need to do so for my economic’s class and I can’t figure it out. If you could help me out that would be great. Im reading the baseball economist now and I need to mimic the results from the Big City vs Small city problem chapter. Your help would be appreciated greatly

Bryan Bing
Bryan Bing

Hey J.C. I actually have the same identical question as Charles, regarding how to run a regression & what would the variables be. Would really appreciate your help, thank you.