## How’d We Do a Year Ago?

I’m probably not being biased at all when I say we offer a lot of different great features here at FanGraphs, but I’m personally a huge huge fan of our projected standings and playoff-odds pages. Now that we have ZiPS folded into the mix, things are pretty complete, and it’s exciting to be able to see the numbers whenever one wants to. The numbers are based on depth charts maintained by some of our own authors, and they’re living and breathing, so you can see the direct impact of, say, the Phillies signing A.J. Burnett. (It lifted their playoff odds about four percentage points.) FanGraphs is always improving, and these additions have been a big recent improvement.

Now, as discussed briefly yesterday, we never want the projections to be actually perfect. Thankfully, that’s never going to be a problem, on account of the damned human element. But we do want the projections to be *meaningful*, because otherwise, what’s the point? We want the data to be smart and more right than wrong. So that brings to mind the question: how did things go last year, in our first thorough experiment with depth charts and team projections?

This is a helpful page. We never really projected true standings, but over the course of the 2013 positional power rankings, we did project team-by-team WAR with author-maintained depth charts. So it would make sense to compare projected team WAR against end-of-season actual team WAR, which we can get easily from the leaderboards. Win/loss record isn’t as easy as just adding WAR to the replacement-level baseline, but WAR does more or less capture the core performance of a team, so this should be informative.

One complication: right after posting the team WAR projections, we changed the replacement level to agree with Baseball-Reference. So I can’t just do a straight comparison, because the projections add up to far more WAR than the end-of-season numbers. What I decided to do was calculate z scores. The Astros, for example, projected to be 2.5 standard deviations worse than the mean. The Braves finished the year 0.8 standard deviations better than the mean. It is possible to do a straight comparison of the z scores, or at least I think it is, and this should give us a sense of how well last year’s teams were projected, relative to the average.

So here’s a graph, of z scores against z scores. I want to emphasize that I have never been a mathematician. Probably never will be. Maybe I did something stupid, but this shouldn’t tell us *nothing*.

At least to me, it feels like there’s pretty good agreement. There’s an obvious relationship in the direction we’d expect. The Tigers were projected to be 1.8 standard deviations better than the mean, and they came out 1.8 standard deviations better than the mean. The Mariners were projected at -1.0, and they came out at -1.0. The Cubs were projected at -0.5, and they came out at -0.5. Of the 30 teams, 16 had z-score differences no greater than half of a point.

For position-player WAR z scores, the r value came out to 0.7. For pitchers, 0.7. For teams overall, 0.7. Basically, the projections weren’t totally off base. But there were some bad misses, and I’ll highlight them below. Maybe I shouldn’t say “misses” — projections can’t really be wrong. But there were some teams that deviated rather significantly.

**RED SOX**

**Projection:**+0.5 standard deviations**Reality:**+2.0**Difference:**+1.5

Initially, the Red Sox were ranked in the upper-middle tier. They finished as the best team in baseball, based on regular-season performance and subsequently postseason performance. Pitching projections missed by 0.8 standard deviations, but position-player projections missed by 1.5. They got more than what was expected from most players, especially Shane Victorino and Jacoby Ellsbury. Oh, also John Lackey and that Koji Uehara guy. It wouldn’t be fair or accurate to say the Red Sox came out of nowhere. They were, however, baseball’s biggest positive surprise.

**ORIOLES**

**Projection:**-0.9 standard deviations**Reality:**+0.4**Difference:**+1.3

I mean, this isn’t hard to explain. Baltimore’s pitchers did what they were projected to do. Baltimore’s position players did not, by which I mean, Chris Davis and Manny Machado did not. Davis and Machado were projected for a combined WAR of about 4. They came out to a combined WAR of 13. Pretty easy to surprise when you field two unexpected superstars.

**INDIANS**

**Projection:**-0.8 standard deviations**Reality:**+0.3**Difference:**+1.1

Some of this was as simple as big years from Jason Kipnis and Yan Gomes. But the bulk of this came from the pitching staff, which was projected to be a pretty big problem. The Indians were projected to have baseball’s fourth-worst staff. They came out ever-so-slightly above average, with Justin Masterson getting better, Ubaldo Jimenez getting better, and Corey Kluber and Scott Kazmir being outstanding. I’m never going to blame a projection system for not anticipating Scott Kazmir. That’s a thing. That’s a thing that happened in real life.

**BLUE JAYS**

**Projection:**+0.6 standard deviations**Reality:**-0.4**Difference:**-1.0

This was shared equally between the hitters and the pitchers. The catchers were atrocious, and Melky Cabrera was atrocious, and Maicer Izturis was atrocious, and Jose Reyes got hurt. Then Josh Johnson didn’t do what he was supposed to do, and Ricky Romero was something out of the dark parts of the Bible, and Brandon Morrow got himself injured. Last year, for the Blue Jays, had a lot of bad luck, which is why they’re reasonably expected to regress upward in 2014. But by the same token, they’re not expected to regress up into a playoff spot.

**ANGELS**

**Projection:**+1.3 standard deviations**Reality:**+0.3**Difference:**-1.1

This, despite Mike Trout over-performing. That’s what happens when you get way too little out of guys like Albert Pujols and Josh Hamilton. The Angels had every reason to expect those guys to be terrific, hence all the money, but Pujols had to play hurt and Hamilton’s just one of the more volatile players in the game. The pitching staff was its own kind of problematic, but it was the position players who were most responsible for this big negative deviation. Pujols and Hamilton should improve in the future, but then, 2013 counts.

**PHILLIES**

**Projection:**+0.3 standard deviations**Reality:**-1.4**Difference:**-1.7

And here’s the biggest team deviation. The Phillies were projected to be a little better than average. In the end, they were a catastrophe, despite big statistical years from Cliff Lee and Cole Hamels. Of course, an enormous problem was that Roy Halladay was projected for almost four wins, and he was actually worth about negative one. A rotation can’t really recover from that. But Chase Utley was the only position player to be worth more than 1.6 WAR. Carlos Ruiz was a whole lot worse than expected. Delmon Young batted 300 times. The Phillies, like other bad teams from 2013, are expected to be better in 2014, and while they still won’t be *good*, they should be watchable more than two or three times a week.

—–

Based on our first depth chart and team-projection experience, the numbers seem to be worth paying close attention to. Due to the nature of sample sizes, projections aren’t going to nail the spread in wins that we observe in reality, but last year’s numbers did pretty well in projecting the end-of-season order. Obviously, there are things we can’t predict, and it’s better that way. But when it comes to talking about the season ahead, our projected numbers make for a pretty good foundation.

Print This Post

Interesting to me is the slope of the best-fit line in that graph. If a team was projected for +.7 SD, they (on average) ended up at +1, and if they were projected for -.7 SD, they (on average) ended up at -1. This might just be because of the way projections work — they regress to the mean, so the human element pushes them further away — but I think there should be just as much chance that the human element pushes them closer to the mean. That would indicate a potential source of bias, with Fangraphs projecting the bad teams not as bad as they really are and the good teams not as good. Would love to see some more data from past years to get a better sample.

I don’t think there’s nearly enough data points or high enough r value to make too much of the 0.7 slope. It could easily regress to 1.0

Generally, a good metric for the accuracy of a projection is the mean squared difference between the projected and actual results. An optimal projection system that minimizes this amount will tend to have the property you’ve noted: the projected values are in a narrower range than the actual values. This is the same reason fewer individual players are projected to hit above .300 than we actually expect to see.

If you were to scale the projections up to match the expected distribution of actual results, you’d actually introduce more error.

Another way to say this is that we can’t project the noise part of what actually happens during a season, only the signal part. So projections have a variance that is related just to the signal part, whereas actual results have both variance from the signal AND from the noise. So the range of actual results is wider than the projected component.

That is an excellent explanation, and matches my intuitive expectations. You can expect some batter to hit .375 this season, but not one of your individual projections will have someone hitting .375. Thanks!

So if you used monte carlo projections across the player population and their range of potential outcomes instead of projecting based on the mean player outcome the error rate on the monte carlo projections would roll up into more variation than the actual — we know some players are going to have outlier seasons but trying to project which ones is more likely to mess up the overall projection on a team level than to capture human variation?

Since it’s standard deviation the difference from mean are normalized. You can’t reasonably expect 1.0 slope for best fit unless your projection is nearly dead on correct.

You certainly can’t expect the slope to be more than 1.0.

Actual records are always going to be slightly more disparate than projected records because of in-season transactions. Good teams become buyers at the trade deadline and get better. Bad teams become sellers at the trade deadline and get worse. Projections can’t anticipate those trades, but we know they’re going to happen.

Slight addendum: Right, this analysis is based on WAR, not record. My bad. The same logic still holds, just replace ‘actual record’ and ‘projected record’ with ‘actual WAR’ and ‘projected WAR.’

Catoblepas, I think the relationship is actually opposite of what you indicated. The projections are the x-axis, and the actual are the y-axis. Just look at the Red Sox point (projected at +0.5, actual was +2.0). So the linear regression suggests that the Fangraphs projections were farther from the mean than the actual results, in this case.

Look at the fit residuals, that and the variance. You don’t really know anything about individual teams that a coin flip wouldn’t tell you.

You’ve got decent results versus choosing randomly, but that’s hardly surprising. What’s the correlation for using, say, previous season’s pythagorean record? Or ESPN author predictions? (OK, I guess you can’t get z-scores out of that. But something similarly unsophisticated.)

Intuitively I don’t think .5 is very good, but I don’t place very high confidence on my intuition in this case.

This. The interesting question to me (and I might actually be alone here) is how much better did you do than than just plain intuition? I would consider projected standings made using pure intuition, AND NOT ANY SABREMETRICS AT ALL, to be the baseline. If you did significantly better than that (and we can’t test for significance here because we would be using the whole population rather than a sample), then you didn’t waste your time.

Of course, what exactly is pure intuition? We couldn’t ask a Fangraphs writer to guess at end of year standings because they would confound the whole thing by trying to add up WAR projections, which they probably memorized, in their head. Even the unenlightened laypeople would probably use some sort of statistically based reasoning to project end of year standings. How far is that from sabremetrics?

Perhaps the way to get a really good comparison would be to ask someone who’s actively hostile to the concept of sabremetrics like Tracy Ringolsby or Hawk Harrelson to project what every team’s win-loss record will be at the end of the year. I figure this gives you a baseline of someone who knows baseball and active rosters, but who would actively suppress any intrusion of statistical reasoning into his thinking. There would only be variable that differentiated your projections: WAR vs. ‘my gut.’

The correlation between z-scores and between the raw winning percentages will be the same thing. I ran the correlation for 2012 pythagorean w-l with 2013 results and got an r^2 = 0.32.

In reference to Catoblepas’ comment above, the slope of the best fit line between 2012 pythag vs 2013 actuals is 0.5617. This isn’t a lot of data, but it implies to some degree that there is some regression to the mean from year to year. Since projections are based on historical results, I’d say you’re seeing part of this phenomenon to explain the slope of the line above.

Thanks.

Comparing to the early over/under numbers from here: http://www.baseballnation.com/2013/2/14/3986584/mlb-over-under-odds-2013-GAMBLOR

I got an r^2=0.477. So not much different.

If those are actual sportsbook lines, doing approximately as well as they do is pretty impressive. Good linesetters are about as far from unsophisticated as you can get, and they’re definitely aware of all the public projection systems.

I wouldn’t give sportsbooks too much credit. They are generally a step behind when it comes to original analysis. Right now, they basically average the projection systems and tweak a little for perceived public biases. And there is still usually massive movement in the lines from late Feb to Opening Day (not based on injury, just smart money). As recently as 6-7 years ago, they were fully eyeballing it and subsequently got torched by anyone using BP or anything similar in sophistication.

im also under the impression that a lot of times, lines are based on encouraging bets as much as who will actually win. I know trying to even bets on both sides can be a big factor of NFL games. Not sure about MLB WS futures props tho

Indeed. I appreciate all Fangraphs does for my baseball habit, but I hope they do add more statistical savvy over the coming years. It might cut down on the people wandering around the internet saying things like, “FIP is predictive, that’s why it’s good,” and other such horrifying things.

Do you need a blacklight to read the dark parts of the bible?

The thing is, when you project WAR, you naturally regress to the mean. That’s what projection systems do. So obviously teams will be far worse or far better than what they were “projected” to be. Mike Trout hasn’t put up a single season less than 10 WAR, but his 2014 projection is 9.5 WAR. Players are expected to fall back to earth, or not be as awful as they were in the past year. It’s the only way for projection systems to work, is to take them and pull them back towards the average.

I think that’s the best explanation for the slope being 0.7. It clearly is a demonstration of the projection system not being able to project any outliers in WAR (and subsequently, why most teams are projected between 70 and 90 wins, when almost every year at least 3-4 teams are over the 90 win barrier).

Jeff, we would like to extend you an invitation to apply for a current opening in our division based on this article’s statistical analysis. While most statisticians would have tried a neural network with resilient backpropogation, you saw the inherent flaw in doing so and continued with a simple linear regression. Your innovation in statistical analysis is refreshing; our top students could learn a lot from you.

Clever.

It must be; I don’t get it.

Neither do I

I think its just a joke that many grad students overthink the problem and introduce more noise by searching for a “clever” solution, when an old established method would work quite well.

You can see this in any field really, and I’m being charitable by saying its just grad students that do it…

I’m pretty sure you are all missing the snobby joke… The point is that no one on this site is doing anything particularly groundbreaking from a statistical perspective. The toolbox that the authors use here is fairly rudimentary, probably not anything past a first course in Statistics at MIT or CalTech.

This is all fine, since it isn’t a math/statistics site, but Jeff/Dave are not mathematicians by any stretch of the imagination.

It’s a good thing we have a MIT-trained uber-genius to point it out in a post so obtuse/unfunny that no one gets it, then.

I’m college student who is currently in an intro statistics class. Are there any easily understood sabermetric principles that I could use for a group project?

Play around with Steve Staude’s correlation tools and see if you can come up with anything interesting.

http://www.fangraphs.com/blogs/tool-version-2-pitching-correlations-with-improved-filtering/

http://www.hardballtimes.com/tool-basically-every-hitting-stat-correlation/

I don’t know if the axes are just labeled the opposite of what they should be labeled, but I don’t see the Red Sox point on the graph. They should have +.5 on the Y-axis and +2.0 on the X-Axis, no? The only team I see with +2.0 on the X-axis (Actual WAR) was projected around 2 WAR.

The X-axis is ‘Projected WAR’ and the Y-axis is ‘Actual WAR’.

The vertically typed words correspond to the vertical axis (not the axis it is next to) as is the case with the horizontal text.

I thought the same thing, ryarriz. It’s super confusing that the axis labels aren’t labeling the axis they are right next to (at least to me, but I am, how you say, rather dense).

i think kendall’s tau would be a great correlation coefficient for this type of analysis. basically, it measures how well it did in predicting the order of finish rather than the values themselves. so you can get the order right, but the actual numbers off by a bit, and still arrive at a high correlation.

http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient

FWIW, I went back and compared the 2013 converted projected wins to the Vegas totals found here.

On those occasions where the fangraphs predicted total differed from the Vegas total by more than two games, fangraphs was “correct” on five of eleven. I guess I’ll cancel my plans to wager my life savings on 2014 projections…

That just means you’re due!