“The coefficient of determination represents the percent of the data that is the closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation). The other 15% of the total variation in y remains unexplained.” ]]>

“The coefficient of determination represents the percent of the data that is the closest. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation). The other 15% of the total variation in y remains unexplained.”

– http://mathbits.com/mathbits/tisection/statistics2/correlation.htmI

]]>Payroll disproportionately goes to older, expensive veterans, who are signed in free agency to put a team over the edge. Wins disproportionately come from players in their first six years of MLB service.*

* To digress: How awful is the MLB union for younger players? And probably to the detriment of the talent pool, which would rather pay football and get paid. And if they cave on slotting…

]]>Baseball is not a sport. “Sport” implies a level playing field where athletic performance is measured. Baseball is driven by which teams squatted where, and how much money given owners care to spend. If I wanted to watch revenue contests instead of sporting events, I would watch the stock market.

]]>I am sure that all the data points with payroll z-scores above 4 are the Yankees, and I strongly suspect that all the observations with values above 3 are Yankees, as well. This cluster of data points is actually FLATTENING the relationship. Without them, the slope would be steeper and the correlation and r_squared would be higher.

As much as the Yankees have spent on payroll, they have had a rather poor return, when measured in wins.

]]>When a weak relationship persists over this much data, it doesn’t take an enormous amount of correlation to have a great amount of significance.

I have my students look at similar data a few years ago, but they also had home attendance for each team. The correlation between attendance and payroll was stronger than winning and payroll. Maybe high payrolls is more about putting bottoms in seats. Or potentially more people come out to watch a successful team – yes, the correlation between winning percentage and attendance was stronger than winning and payroll, too.

But then again, correlation does not imply causation.

]]>However, one cannot so clearly say StdDev(X) = StdDev(Explained) + StdDev(Unexplained), so I was saying to be careful what you say there with regards to percentages. That’s what I was referring to when I mentioned the summing properties. I think I was largely agreeing with you, but I’m losing track of what everyone is talking about.

]]>(from a stats prof)

]]>