Predicting Salary Inflation For 2011

This is the first post in a series about salary inflation in baseball.

If the gas station around the corner offered you a 10-year contract, wherein you could only buy gas from them, but the price would always be \$3.00 a gallon, would you take it?

It sounds like a steal today, but only because you assume that gas prices will continue to rise and therefore \$3.00 per gallon will be less than market value for most of the contract. But if gas prices unexpectedly fell, to say \$1.50 per gallon, you would be paying double market value, and you are locked into it for the next decade.

The point of this hypothetical scenario is that any multi-year contract can only be evaluated using some estimate of future inflation. Here at FanGraphs, we generally apply an average of five percent inflation when looking at multi-year deals. In an attempt to use a more sophisticated system, two salary inflation models were built, and they have conflicting predictions about which direction salaries are heading.

Prices in the United States have increased by about three percent per year over the last 20 years. Salaries in baseball have increased an average of seven percent over the same time.

As shown in the above graph, MLB salaries have increased at a higher rate than the stock market, the gross domestic product, or the consumer price index.

In order to build a model which can project salary inflation, the first step is to make a calculation so that the series is not constantly rising. This is called removing the unit root. The purpose is to strip away the trend and drill down to what really matters; the year to year change.

The dependent variable used for the model is the percent change in average salary, which will remove the unit root and give easy to interpret projections. For independent variables, a number of baseball and non-baseball statistics were tested. In addition to removing unit roots, all independent variables had to be lagged by one season. This is done because salary changes almost always happen in the offseason, so the salaries of 2010 depend on what happened in 2009.

Many variables did not have a significant effect on salary inflation, including GDP, the CPI, the Dow Jones average, the number of new ballparks and World Series ratings. The two variables that had significant effects on salaries were ballpark attendance and the unemployment rate. That yields the following model:

This model revealed that for every 10 percent increase in attendance, salaries increase by 3.5 percent the next season, and for every 10 percent increase in the unemployment rate, salaries decrease by about one percent.

The model is acceptable, but the unemployment rate does not contain a unit root, that is to say, it is not constantly increasing or decreasing. So, there is no problem with including the unemployment rate as a “raw” variable, without taking the percentage change. Doing so produces two similar models, neither of which are obviously better or worse than the other. Both models have similar r-squared values, acceptable p-values for all statistics, and produce similar projections with similar forecast errors. It is the data modeling equivalent of the Salma Hayek vs. Penelope Cruz debate; you may have your preference, but they’re both great.

Two quick housekeeping notes: models had lower forecast errors than assuming a constant five percent inflation. Also, using a model which included both unemployment variables worsened the forecast errors.

Now, here is where things really get interesting (assuming you’re still awake). The one year where the two models’ projections are farthest apart is for the 2011 season. The percent change model projects a six percent increase in salary inflation, while the raw model projects a six percent deflation.

Given, the vast majority of contracts for next season are already in place. There was a good crop of free agents this offseason, and teams were rather loose with their money, so the average salary will surely rise next year. Through this prism, a projection of the 2011 average salary is kind of irrelevant, but the implications are not.

If you trust Model 1, which uses the percent change in unemployment, then the economy is slowly turning around, and we can continue to expect modest salary inflation. If you trust Model 2, which uses the raw unemployment rate, then the economy is still in bad shape, baseball will not continue to outperform the rest of the marketplace, and a consolidation is coming.

Projecting farther than next season is a topic all its own. Let’s just say that there are economists whose entire job is to project the unemployment rate, and even they would admit it is a fool’s errand.

With several high-dollar, multi-year deals being signed this offseason, salary inflation will be a huge factor in how we remember these deals 10 years from now. One model suggests that teams reasonably predicted the future salary landscape. The other model suggests that the contracts signed this offeason will be untradeable albatrosses in the near future.

These disagreeing models lead to curiosity about the calculations done in baseball front offices. Did the big spenders this offseason project that the market would continue to rise, while the teams which sat on their hands project that the market will soon fall?

Print This Post

Jesse has been writing for FanGraphs since 2010. He is the director of Consumer Insights at GroupM Next, the innovation unit of GroupM, the world’s largest global media investment management operation. Follow him on Twitter @jesseberger.

18 Responses to “Predicting Salary Inflation For 2011”

You can follow any responses to this entry through the RSS 2.0 feed.
1. tangotiger says:

Can you include MLB revenue growth as a parameter?

Also, there could be lags, so rather than looking at same-year, can you also look at past two years or past three years?

2. jaywrong says:

Based on this assumption, and gas did drop to 1.50, wouldn’t I just be able to trade gas stations with the Angels?

+35

• bcp33bosox says:

Lol…that made me laugh pretty hard. Nice.

3. The Ancient Mariner says:

“If you trust Model 2, which uses the raw unemployment rate, then the economy is still in bad shape, baseball will not continue to outperform the rest of the marketplace, and a consolation is coming.”

Did you mean a *consolidation* is coming, or am I missing something here?

4. Telo says:

I wanted to like this article, but I felt like it compromised some complexity or analysis for understand-ability (and even then, I had a little trouble grasping why we could treat unemployment so differently in both models that it would lead us to opposite conclusions – no unit root??? This is the crux of the article if it’s determining whether or not salaries are going up or down, and it was left pretty open ended), and then ended up delivering muddled results.

I think more qualitative points would have really helped me understand why there was a huge jump from 98-02 (I assume it wasn’t just the economy, or maybe it was), and what things about baseball, TV deals, CBAs and lack thereofs, expansion, etc – things that affect the salaries in the league. And really all it is is attendance and unemployment, then talk about why attendance was up, and what might move it in the future.

Basically, I don’t know any more about inflation in baseball salaries than I did before I started reading this, except that it might have something to do with people wanting to see the games (and having jobs=money to get themselves there.)

5. Luke in MN says:

What do the numbers on the left side of the first graph mean? Index?

6. Mario Mendoza says:

Sigh. If only this were the trend for all salaries.

• Mario Mendoza says:

disregard

7. sea of stories says:

What’s going on with the indexed growh rate graph? Shouldn’t all 4 of the data sets be 1 during some base year?

Why did you only consider a first-order time-series and not use an f-test or another metric to select the optimal model order? You may need to consider data from multiple time steps in the past in order to get the most appropriate model

9. Peter Jensen says:

When you say average salary do you mean the mean or median salary? Does it make a difference if you use one rather than the other?

10. Xeifrank says:

What is the average salary inflation in baseball over the past X years?

11. BillWallace says:

Great article.

I don’t know if you knew this, but your gas hypothetical is actually a real world deal that companies and even individuals can make with natural gas companies. When contracting for gas they give you an option to have your rate track with the market or to lock it in for a set contract. The set contract price is based on futures prices, but they add a little margin for themselves for taking on the risk. At the companies I’ve done it with I’ve always taken the market rate.

Very interesting to see how MLB salaries will go over the next few years.

12. Todd says:

I wholeheartedly approve of exploring these kinds of topics on Fangraphs. I’d like to see more stuff like this.

13. CJ says:

(1) What is the R-squared and t statistic for this forecast? I know you said it is acceptable, but that is a subjective judgement.

(2) The unemployment rate is a proxy for some other variable. Perhaps it’s a proxy for how the overall economy affects teams’ budgeting. If that’s the case, the team’s expectation about the future direction of the economy is probably most important. And in that case, the percentage change, rather than the raw unemployment rate, may be a closer to what influences the team’s expectations. Teams probably review several economic national, regional, and/or local economic forecasts as part of their estimation of future revenues. You do not have a variable which reflects forecasted economic variables. But I think the annual percentage in unemployment rate probably is more closely related to short term forecasts.

14. Erik says:

I’d like to see the relative effects of inflation uncertainty relative to other sources in uncertainty in player valuation. My instinct is that it’s small, but I’m interested in how it compares.

15. Marver says:

I don’t think you’re using independent variables; it’s likely to confuse your model.