## Money Wins: Is There Enough Parity In Baseball?

Yesterday afternoon, Jayson Stark considered the question, “Is the MLB’s competitive balance a joke?” His answer was a rather blunt no:

MLB’s competitive balance is NOT a joke.

It beats the NFL.

It beats the league formerly known as the NBA.

And … I can prove it.

Stark’s method of proving it — plucking facts from the recent playoff series and comparing them generally to the NFL and other major leagues — was less than rigorous. In general, I agreed with his assertion: Parity in the MLB exists naturally far more than any other sports league.

HOWEVER, if my foot has less gangrene than your foot, does that mean I don’t need a doctor? No. I probably still need a doctor, and I probably need to stop playing barefoot tag on Rusty Nails Pier.

Relative success does not necessitate absolute success. And frankly, I feel the “parity” in the MLB indeed has a gangrene of sorts, a disease that is causing only specific segments of the league to rot while the rest hum along uncaring.

Of course, it is one thing to suspect something and demand more research, but it is another to pull the sabermetrician stocking over your head and answer that suspicion with a Falcon Punch of data.

Let’s do just that.

Okay, first of all, we must gather together as much reliable payroll data as Internet can possibly give us. I don’t always look up payrolls, but when I do, I use Cot’s Contracts — and then I thank them profusely and publicly.

Taking Cott’s contract data from 2000 through 2011, we can then pair it with the winning percentages of the teams in that era.

Stark focused a lot on the playoff successes of small market and low budget teams like the Tampa Bay Rays and Arizona Diamondbacks, but having success in the playoffs is like having success invading France — the hardest part is getting through the Ardennes, but anyone can do it once they arrive. Teams have much more control over their success in the regular season, so that’s what we need to examine here.

But we cannot look at just raw numbers — the steadily increasing payrolls of the past decade have rendered that a useless task:

Moreover, ordinal ranking (Stark’s tool of choice) does us little good when the gap between the 1st payroll (the New York Bankees, er, Yankees) and the 2nd payroll constitues a considerable and uneven distance — a gap far greater than that of the 30th and 31st payrolls.

Instead, let’s use z-scores — standardized numbers we can adjust according to their period. So, $97M in 2000 is a z-score of 1.28 (Angels payroll), but $97M in 2008 (Blue Jays) is only 0.31 — or 0.31 standard deviations above the mean — because the league as a whole has begun to spend more.

Using this method, we can more effectively plot payrolls against wins:

The relationship, as we might expect, is loose. An uninspiring R-squared in the 0.17 range tells us payrolls have accounted for 17% of the variation in winning percentages over the last 12 seasons.

Now we reach the philosophical portion of the program, asking ourselves: How much do we really *want* payroll to effect winning?

The NFL has decided it wants payroll to have essentially no impact on winning, so teams basically trot out the same amount of money every Sunday and hope their money was better-spent. Is that what the MLB wants?

There is some justice to the notion of rewarding owners who spend more, though. For all his despicable wealthiness, George Steinbrenner deserves praise for willingly pouring huge palettes of cash into his team. Few make the assertion that Brian Cashman or the Yankees GMs of the past orchestrated success on their own — no, George “The Real Cash Man” Steinbrenner got and gets his due.

So, yes, the Yankees have a great business model (see: YES Network, or *cash cow* entry) and a franchise predisposed towards financial success, but they also have a willingness to spend their money.

But what about the Milwaukee Brewers who, as Stark notes, play in the smallest media market? Or what about basically any team in the history of ever to play in Florida, where it is somehow impossible for my home state brethren to sell out any sporting event outside of college football?

These teams will never be able to spend at the same level — on the MLB payroll or on the developmental and drafting areas — regardless of their willingness to do so. Sustaining long-term success is next to impossible for these franchises. For them, 17% is far too large.

Consider this: In the above period, teams with a payroll less than $70M (pretty much the lower quartile) had an average winning percentage of .475. Teams who broke the $100M barrier had a winning percentage of .540.

Multiplied out, that’s a 77-win team and an 87-win team.

In other words, the Rays and the Diamondbacks are still very much the exception to the Money Rules rule — and the Diamondbacks benefit from their relatively winnable division. In fact, the current outlay of divisions — sliced into the smallest pieces they have ever been in baseball’s long and racist history — allow for sometimes un-deserving teams to slip into the playoffs.

For instance, in 2009, three AL West teams were at 85 wins or better, yet only *one* made the playoffs. Meanwhile, the 87-win Minnesota Twins, which would have been in 3rd place in the AL or NL West, won their division. In 2008, the 84-win Los Angeles Dodgers made the playoffs even though they would have been in a whopping 5th place in the NL Central.

In other words, there is hope for the low-in-coin franchise — whether it comes in the form of saving money for several years and then going crazy for a few years, a la the Florida/Miami Marlins, or slowly and meticulously building a world-class farm system and then hoodwinking players into team-friendly contracts, a la Rays du Tampa Bay.

So, yes, the MLB’s competitive balance is indeed “second to none”; it is not like the NFL of the 70s; it does not hand trophies to the biggest spenders; it does not blindly oppress the impoverished; it is not a joke by any stretch — but that does not make it perfect.

*TWO NOTES: 1) Allow me to amend the 17% number there. As Burke points out in the comments, I forgot the final step of that consideration (taking the square root of the R-squared, making it a sense-ful number again). This of course means payroll accounts for 41% of the variation in wins, not just 17% — which only further exacerbate my, and presumably your, ire. I’m not sure who to trust anymore, so I’m going to trust myself on this one. There is no first amendment.*

*2) No, do not fear, there are only 30 MLB teams. All this NFL talk got me thinking 32, but instead of adjusting my words above, I will defer to Steve’s observation below, that I was merely counting the Blargon Nebulons as team No. 31.*

Print This Post

31st Payroll?

Yes, the gap is so large because the Blargon Nebulons use their own planets’ currency, the Zolak. When converted to US$$, their payroll is actually negative.

The Zolak? Is that what Greece is calling the Drachma these days?

Great article. One error/need for clarification: re: the gap between the 30th and 31st payrolls, there are only 30 teams. 29 and 30?

Thanking Cot’s Contracts profusely and publicly is a great thing because it is a great website!

Please also spell ‘Cot’s’ correctly :)

I’LL SPELL COT’S HOWEVER I DAMN WELL PLEASE! Fortunately for you, I damn well please employing “correct” as my spelling of choice.

+1 for the Fall Gelb reference

Why should the teams have an equal chance of winning? Why shouldn’t the FANS have an equal chance? More fans = more revenue = higher payroll = better chance of winning. Democracy in action.

That’s an odd definition of democracy.

I think he means capitalism, not democracy.

What, whoever has the most votes wins?

More like, whoever has the most dollars wins. They’re not, despite what the conservatives on our Supreme Court might like to pretend, the same thing.

Seems like the Brewers are a glaring counterexample. Always good, sometimes well above average attendance, but the bad media market (as Bradley noted) keeps their payroll low. Hard to see what’s fair about assigning revenue based on accidents of American media economy.

No. No. No.

Payroll does not account for “17% of the variation in winning percentages over the last 12 seasons.”

It accounts for 17% of the *variance*, which is an abstract statistical concept. Variance is correlation squared. In this case, the units you’re reporting are winning percentage-squared”. Nobody cares about “winning percentage squared.” They care about winning percentage.

The real answer is sqrt(.17), the units of which is “winning percentage”. In other words, payroll accounts for 41% of the variation in winning percentage.

For every additional standard deviation of payroll a team spends, it can expect a 0.41 standard deviations of additional wins. That’s enormous. It may be even worse than that because there are diminishing returns at the extremes–i.e. it’s not a linear correlation up through $200 million. No team can get 130 wins, no matter how much $ it spends, and 100 wins is just as good anyway.

Ah! Very correct!

I will amend it to reflect your correctness.

“The drawback to the coefficient of correlation is that except for the three values -1, 0 and +1, we cannot interpret the correlation. For example, suppose that we calculated the coefficient of correlation to be -0.4. What does this tell us? It tells us two things. The minus sign tell us the relationship is negative and since 0.4 is closer to 0 than 1, we judge that the linear relationship is weak.” (Keller, Statistics for Management and Economics, 8e, p. 124.)

“The coefficient of determination (R-squared) measures the amount of variation in the dependent variable that is explained by variation in the independent variable.” [Referring to an example}] …the coefficient of determination is r^2 = (0.8722)^2 = 0.7588. This tells us that 75.88% of the variation in electrical costs is explained by the number of tools.” (ibid. p. 135.)

So, yes, payroll explains or accounts for 17% of the variation in winning percentage over the past 12 seasons, a very weak relationship

bluejaysstatsgeek is right and Brian Burke is incorrect. In particular, Brian Burke’s unit analysis is incorrect. The correlation coefficient r is defined as a quotient, and in this quotient the original units of measurement cancel out. Thus, r (and r^2) are unitless quantities (like a z-score). This is why you can compare correlations coefficients from different studies (and different original units of measurement). Thus, r^2 is not measured in “winning percentage squared”, but is a unitless quantity.

Yes, this is the way I’ve been using it in the past, but something Burke said seemed correct. Maybe I’m double wrong then. :-/

“Variance is correlation squared.” No, variance is standard deviation squared.

Also r-squared is unitless. is is the sum of squared deviations explained by the regression divided by the total sum of squared deviations. Any units (or actually, squared units) that would have been present would have been in numerator and denominator and therefore cancel out.

When you take the square root you might explain well over 100% with independent predictors (yikes). I’d avoid using % in correlations, or interpreting them (except to say, small, large, etc.) but they are grate (As you point out) because you can square them and get the percent explained from them.

Sorry to drag this out, but I believe I’m correct. Phil Birnbaum has written extensively on this. “Variance” is not the same as “variation.” I think this is the primary source of this misunderstanding.

Here’s the short response: r-sqaured is 17%. But 17% of what? It’s 17% of the *total variance* in the dependent variable. The units of total variance here is…”win% ^2″. 17% is very misleading because readers think that only 17% of win% is explained by payroll.

Here’s the long response:

BJSG is right about variance being SD squared. What I meant to write is that r-squared is correlation-squared. Variance is the share of the variance explained by the model divided by the total variance in the outcome variable. The units of variance are always “unit squared”. In this case it’s “win% squared.”

So r-squared is literally win%^2 explained/total win%^2. The units are win%^2/win%^2, which you can call unitless.

So although r-squared by itself is unitless, when you say 17% of something, that something is not unitless. In this case it’s 17% of the total variance of the dependent variable, whose unit is… “win%^2″. So if you want to know SD of win% per SD of salary, you need to take the square root of .17.

Here’s the underlying reason: The “r-squared” of a regression does not account for covariance. In reality, salary interacts with all sorts of different things along the way to eventually produce the effect it does on win%. The .17 number is the pure effect of salary all by itself, without any interaction effects. This is a completely abstract concept with no real meaning. The r of .41 includes the interactions.

Besides, the right answer is already in the post in the graph. If you want the best estimate of win% based on salary alone, you’d use win% = .0187*Z_salary + .5. That coefficient of 0.187 is what’s important. It’s essentially the un-standardized version of the .41 correlation.

Hope that helps explain my position. Anyway, great post. This is just a pet peeve of mine.

Brian Burke, you appear to have to explanations for why you take the square-root. The first is that you are talking about variance, and the second is that 0.17 is a pure salary effect and that 0.41 “include … interactions.”

Both of these are just non-sense and even contradictory. For the first one, I challenge you to find a text book that says this. I have no idea where the second one came from.

I think Brian and his detractors might be talking past one another. Forgive me for putting words in people’s mouths, but I think Brian is saying that, although 17% of the variance is explained (because R^2 = .17), nobody cares about variance. Variance has a horrible name and it’s not what it sounds like. Variance is measured in units squared, and nobody has any intuition about units squared. Since we don’t have any intuition about units squared, it’s not useful to speak of a percentage of units squared (as R^2 shows the percentage of variance explained).

You and I do have an intuition about standard deviation though, as that is in more familiar units. Brian is saying to focus on standard deviation and therefore R. However, one loses some of the nice summing properties of variance, so one must be careful when one speaks of percentages of standard deviations explained (I think I’ve made this mistake myself in the past).

Is everyone cool now?

mickeyg13, Nope. I understand the argument, it is not correct.

Try this. There are 12 donuts in a dozen box of donuts. If my dog at 48% of my donuts, I do not need to divide by 12 to get the fraction of my boxes that my dog ate and then warn that when I use the new 4% term warnings that I have lost some of the nice summing properties.

Barkey now I think we are talking past one another. When I was talking about summing properties, my point was that variance has the nice propery (under certain conditions) that Var(X) = Var(Explained) + Var(Unexplained). So one can clearly speak percentages of the variance.

However, one cannot so clearly say StdDev(X) = StdDev(Explained) + StdDev(Unexplained), so I was saying to be careful what you say there with regards to percentages. That’s what I was referring to when I mentioned the summing properties. I think I was largely agreeing with you, but I’m losing track of what everyone is talking about.

You seem kinda confused on statistics. Can you explain why the percent of variation accounted for suddenly relates to its regression coefficient? Because they are two completely different things. How else could we have negative regression coefficients? Also, r is the abstract statistical concept in correlation, not r-squared. r’s values range from -1 to 1, not 0 to 1, as we would expect for a proportion. But hey, R-squared’s values range from 0 to 1, perfect! At least, that’s a decent way to remember it. But I think you should probably brush up on a few things. Also R-squared never has units. Try not to come off as so snotty about something you clearly are wrong about.

Bravo!

(from a stats prof)

Bonus points for Falcon Punch and French invasion references.

“In 2008, the 84-win Los Angeles Dodgers made the playoffs even though they would have been in a whopping 5th place in the NL Central.” Three NL East teams had better records as well. That places them 8th in the NL. The 2006 Cardinals were 5th in the NL that year. The Padres in 2005 (82-80) won the NL West with the 7th best NL record that year.

So, in other words, the arbitrary division alignment allows bad teams to slip into the playoffs, and thus low payrolls look successful.

Yes. Well worded.

Go back to East/West divisions with two wild card teams. Lessens this problem and might stop Selig from expanding playoffs.

The problem is we lack the numbers we need to actually understand how money effects baseball. We’ve got the payroll numbers, but we’re missing the operating budgets, the scouting budgets, the draft and international signing budgets, minor league budgets, etc. As we’ve come to understand over the years, spending on the MLB roster is just one area where teams can gain an advantage.

Very true.

That’d be tricky anyways since most of those things don’t apply year-to-year.

For 2011, probably the scouting budget from 2001-2005 is more important than the scouting budget in 2011. Similar for the rest of them…

Player development cost is just as expensive as mlb payrolls for many teams. Teams with low revenue streams are forced to choose between payroll at the major league level and development at the minor league level.

Screw it, let’s just invade France, seems easier.

history would say it usually is…

If the Miami Marlins are able to sign Jose Reyes away from the New York Mets, that says something about the effect of revenue sharing and competitive balance. Don’t know exactly what it says, but it says something!

It would say something, but this is not going to happen

It might say “Don’t get involved in pyramid schemes.” I’m not sure that the Mets functionally count as a high-revenue team right now, though surely they will again sometime.

another interesting question is also whether or not winning is sustainable at lower payrolls. the model for sucking–>garnering draft picks–>winning–>selling off can only last so long, and fleeting success can’t be relied on to fill seats and build a winning team.

basically, instead of looking at success on a year to year basis, i would have to assume that the R^2 of median payroll v. median win% only increases when the amount of years simultaneously taken into account increases. perhaps this could work to take into account the money a franchise has built up over time and can sink into scouting and non-payroll expenditures.

Ryan asks one of my questions…..can you sustain winning with a low (or even high) payroll?

And, why does Stark continue to assert that the playoffs are the only way to judge parity? Parity is about hope. I guess to judge parity, I’d look at the percent of teams that suck year after year, or dominate year after year, and see how those compare.

And, why is he so insecure about his sport that he brings it up all the time?

I think the problem with this analysis is that is based upon Winning Percentage when Winning Percentage is not the determinant of whether someone reaches the playoffs.

The determinant for reaching the playoffs is whether your record was better than other teams in your division. If you compared where a team finishes in their division and you can compare the salaries of the teams in your division you would have a more reliable indicator on how much purchasing power can affect the final result.

For example: Take the AL East, If the Baltimore Orioles where to increase spending $20 million dollars they may move up in the standings 1 or 2 places because they are being outspent(Yankees and Red Sox) or out-coached(Rays and the Blue Jays). If Kansas City, in a weak division, could get more mileage out of that $20 million increase because there are fewer teams that would be able to increase their own payroll to counter.

So, I have a much longer reply planned going into the statistics employed here – I’m still wrapping my head around what linear regression of z-scores means. Before that however, I wanted to throw up the trite correlation does not imply causation warning.

Applied in this particular instance, one explanation for payroll and winning percentage being related is that winning teams (typically those with some great cost-controlled young talent) will bump payroll during winning years.

Still, the positive relationship is there. This article is testing the validity of the easy narrative that rich teams will win more. Fangraphs, and sabermetrics in general, always digs deeper to find out the truth of these narratives. Based on this study, is still seems likely that money positively impacts winning to some degree. I’d advise people to not dig too deeply into the 17% number. The exact number isn’t what linear regression is for. All it really says is what conventional wisdom already thinks – teams with more money will tend to win more often. That’s what I’m taking away from the article. No more, no less.

The problem is that linear regression also isn’t for data sets that have one huge clump plus a couple of outliers way out to the side in the independent variable.

Essentially all the result tells you is what the outliers are doing. So really all you should take away from the article is “the Yankees spend so much more than everyone else that they’re guaranteed to at least be successful.”

The numbers as presented tell you approximately nothing about whether payrolls relate to winning for teams spending in the broad band of payroll below the Yankees. I’d be interested to see what the numbers look like without the Yankees in there.

Good point John.

I am sure that all the data points with payroll z-scores above 4 are the Yankees, and I strongly suspect that all the observations with values above 3 are Yankees, as well. This cluster of data points is actually FLATTENING the relationship. Without them, the slope would be steeper and the correlation and r_squared would be higher.

As much as the Yankees have spent on payroll, they have had a rather poor return, when measured in wins.

You must be the most interesting fan in the world.

A statistical critique:

Z-scores should only be used on normally distributed data. I started poking around the data set and since 2000 there has definitely been a trend of right-skewed (non-normal) distribution in MLB payroll numbers.

Why exactly should we care about payroll vs. market size?

Imagine two markets that are identical in every way, except Market Northeast is filled with people who adore baseball. Market Southeast is filled with people who think baseball is boring, and prefer watching college football.

Market Northeast’s team draws more fans to the stadium, gets better TV ratings, and has more money to spend on salary, coaches, scouting etc. As a result, it wins more games, and has a better chance of winning it all, as compared to Southeast market’s team.

So why exactly should Southeast market’s team have the exact same chance to win as Northeast market’s team? It is not clear at all why it would be more fair to engineer parity between the two teams. From a utilitarian perspective, arguably it is preferable for the Northeast team to win more often, because each win and championship will create more total happiness.

I like the Rays. They win, they play exciting baseball. But apparently people don’t give a crap in Tampa. Why should I?

Well, for one thing, you’re ignoring the utility effects that balance and competitive games have for nonresidents of either area. Watching the same teams dominate year after year is trite and boring. I like soccer, but I have zero interest in the table of the Premier League, because I already knew long before the season started that Manchester United was going to win the league. MY utility is therefore diminished from the optimal because of their royally (no pun intended) effed-up league structure, even though I am not particularly a fan of any one Premier League team.

For another thing, even if it would make sense from a utilitarian standpoint to have the Yankees win more often, there are major diminishing returns there. It’s simply not the case that a team’s fourth championship of a decade is as treasured as the first title in 50 years. I could get behind a system where the Yankees would (based on payroll) win an average of once every 15 years and the A’s would (based on payroll) win an average of once every 45 years, but that’s not the system we have. The system we have is immensely more unbalanced than that.

It would be interesting to study what would increase parity more:

– a cap system with a ceiling of 150 and floor of 75 million

– a balanced schedule, achieved through abolishing the divisional system

As a Jays fan I am biased towards the second option for obvious reasons, since the Jays are capable of being a middle to high-end spender and struggle to stay above .500 playing 50 games against the Yankees, Red Sox and Rays. But I wonder what the league-wide consequences of both scenarios would be.

The latter system would eliminate parity outright. The Red Sox and Yankees would literally make the playoffs every single season.

When people talk about parity in MLB, what they really mean are two things. First, the casual fan resents the idea that a team like the Yankees can be the top spender year after year and basically guarantee themselves a playoff spot.

The second idea is that a smart team with little financial resources can’t keep a dynasty team they happen to build from scratch. The window for a small market team is very small when it comes to sustained dominance, because usually the top players start demanding big money when that happens or they become free agents.

That 2001 Seattle Mariners team looks absurd.

But in what sport can a team keep its best players through a dynasty? Thw window for a great team that is built from scratch is small in almost all sports. Football caps themselves out. Maybe the NBA pre-2011-2012 was the best at this? However this might be a result of top NBA teams being built aruond 1 or 2 stars and a bunch of expendable parts.

Just to take an example I’m personally familiar with, the Sharks have been winning a lot of hockey games behind Joe Thornton and Patrick Marleau for going on seven consecutive seasons. It’s not at all uncommon in the NHL, actually.

“Moreover, ordinal ranking (Stark’s tool of choice) does us little good when the gap between the 1st payroll (the New York Bankees, er, Yankees) and the 2nd payroll constitues a considerable and uneven distance — a gap far greater than that of the 30th and 31st payrolls.”

True, but linear regression is also a *terrible* choice as well when the data consists of an enormous clump around a central value plus a small number of outliers far away in the independent variable (salaries).

Essentially the entire result will be determined by whatever the Yankees do. To give a physical analogy, the Yankees are far from the center of mass, so they exert more torque.

Your graph is a textbook example of “when linear regression isn’t that useful a tool.”

I’d like to see the numbers with the Yankees removed.

Baseball doesn’t really have a parity problem, it has a Yankees (and Red Sox) and everybody else problem.

The problem with NFL-style fairness is that baseball ratings and revenues are stronger with good big-market teams, especially the Yankees (1965-75 were awful years for MLB and especially the American League, the NFL zoomed past it). So they want big-market teams to do well, on the other hand, if it gets out of hand with revenue disparity and you have the 1949-64 AL where the Yankees win all but two pennants, the other teams in the league suffer. The ideal for revenue maximization might be a situation where every team has at least a 10 percent chance of making the playoffs but some teams have 30 percent and the Yankees even more. Revenue sharing and luxury taxes can be adjusted to roughly achieve this.

Well the ratings in Ny and Boston are good because of all the previous winning. If oakland had extended year on year success and the yankees sucked for 20 years, then they would swap in ratings.

Parity can be achieved through three steps:

1. Cap the mlb payroll

2. Cap other costs – player development costs inparticular*

3. Create a fair system instead of divisions

*Player development costs are more expensive than mlb payrolls for many teams. Not only is it unfair for the yankees to spend way over anyone else’s mlb payroll, but it’s also unfair for the Pirates not to be able to draft a guy in the upper rounds because they can’t afford to pay his signing bonus.

If parity is to be achieved money will have to be sacrificed, there is no way around it. I am a Capitalist, however I do not believe in capitalism for entertainment. Let the revenue slide for the betterment of the game.

The linked article is stupid. (Even setting aside the apples-to-oranges comparison of the football and baseball playoff structures. Baseball playoffs are crapshoots; football playoffs are not, partly because there’s inherently less luck involved, but mostly because the structural incentives to finish higher– a bye and home field advantage– give a big leg up to the team with the better record.)

What that article actually indicates is that when you have fiscal parity, teams that are well-managed can consistently beat teams that are poorly managed. (Sounds like a plus to me!) When you don’t, teams that are well-managed frequently come up short because the deck is so heavily stacked against them.

E.g., Dave Dombrowski is an utterly terrible GM, and by talent, should pretty much never win a damn thing. On the other hand, by payroll, the Tigers should blow out the rest of the AL Central every season. Those two factors combine to produce illusory competitiveness in the AL Central. But it could vanish at any time if the Tigers were to actually hire someone with any competence.

Substantially equal payrolls is NOT a recipe for a different team winning every season. It might well result in MORE dynasties than currently exist. That is a GOOD THING, because it means that the ability to spot and employ baseball talent is being tangibly rewarded (which, in the long run, will improve the sport).

What’s the p-value of this regression? Is it even statistically significant at a reasonable alpha? An R-squared of 0.17 isn’t too convincing on its own. Try adding another independent variable to include the payroll of the other teams in a given team’s division.

To me the what creates the most parity in baseball is the random nature of it. Both through the division structure and the playoffs. Let’s take all this out and look at team war against money and I would think that the correlation would be higher. A second point I think is that things are changing. The fact that front offices are getting smarter will decrease parity. The fact that now the cubs and mets have a smart front office and maybe the angels and the dodgers in the future will have one will rapidly decrease parity in the future.

I hate to be anal, but, well, I can’t help myself. “Pallet” is a platform, usually wooden, used to move heavy loads by forklift. “Palette” is a small dish or board used to hold paint.

Jim, you make an excellent point in bringing up the regression’s p-value. If the model is not significant then the validity of this piece is next to nothing.

This represents 360 data points (12 seasons, 30 teams per season) and knowing the r-squared, we can compute the F-statistic to be 74.158 compared to critical F-value of 3.868. The p-value of the F-statistic is 0.000000000000000233 (2.33×10^-16). This could have also been computed by using the correlation coefficient and computing its t-statistic (8.61), which has the same p-value.

When a weak relationship persists over this much data, it doesn’t take an enormous amount of correlation to have a great amount of significance.

I have my students look at similar data a few years ago, but they also had home attendance for each team. The correlation between attendance and payroll was stronger than winning and payroll. Maybe high payrolls is more about putting bottoms in seats. Or potentially more people come out to watch a successful team – yes, the correlation between winning percentage and attendance was stronger than winning and payroll, too.

But then again, correlation does not imply causation.

Are you drunk? You are confusing impact of payroll with structure of game. Baseball is structured such that each individual game will have more variance than football (and certainly basketball). That’s why the Worls Series is seven games and the Super Bowl is one game. Sample size. Get familiar with it. If NFL teams played 162 games, you certainly would see winning percentages much closer to .500, even though the teams would not be any different.

Baseball is not a sport. “Sport” implies a level playing field where athletic performance is measured. Baseball is driven by which teams squatted where, and how much money given owners care to spend. If I wanted to watch revenue contests instead of sporting events, I would watch the stock market.

Wins don’t necessarily follow payroll. Payroll sometimes follow wins. When teams are terrible, they ought to lower payroll and when they are better they raise payroll (i.e. The Phillies). This effect is exacerbated by payroll being the wrong number to follow. Teams have to decide what portion of their revenues to spend on the MLB roster versus the farm system. So a team in the fray will tend to sign lots of free agents while a team at the bottom end will shovel money into player development.

Payroll disproportionately goes to older, expensive veterans, who are signed in free agency to put a team over the edge. Wins disproportionately come from players in their first six years of MLB service.*

* To digress: How awful is the MLB union for younger players? And probably to the detriment of the talent pool, which would rather pay football and get paid. And if they cave on slotting…

I found the answer to the r and r squared coefficient question.

“The coefficient of determination represents the percent of the data that is the closest. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation). The other 15% of the total variation in y remains unexplained.”

– http://mathbits.com/mathbits/tisection/statistics2/correlation.htmI

oops that should read

“The coefficient of determination represents the percent of the data that is the closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation). The other 15% of the total variation in y remains unexplained.”