Testing Projections for 2011

Each year, baseball fans and commentators across the nation make bold predictions about what they expect in the coming year. They frequently make outlandish claims like “Adam Dunn is going to hit 50 home runs in Comerica Park!” or “This is the year that Joe Mauer finally hits .400!” but such predictions are far more likely to be high than low. Sure, if you said Jose Bautista was going to summon greatness going into 2010, you looked pretty smart, but anyone who predicts performance seriously knows that you need to hedge your bets. While frequently accused of being overly pessimistic about whoever your Home Nine are, on average, they land high about as often as they land low. This field of “projection systems” grows by the year, but there are significant differences between them. Today, I’ll evaluate their 2011 projections for hitters and pitchers.

Firstly, lets peak at the candidates:

MARCEL: Tom Tango’s free projection system, intentionally using a simple formula as a challenge to forecasters.
PECOTA: Baseball Prospectus’ projection system available by subscription, run by Colin Wyers.
OLIVER: The Hardball Times’ projection system available by subscription, run by Brian Cartwright.
ZIPS: Baseball Think Factory’s free projection system, run by Dan Szymborski.
CAIRO: Revenge of the RLYW’s free projection system, run by “SG.”
STEAMER: Free projection system, run by Jared Cross, and his former students, Dash Davidson and Peter Rosenbloom.
You can learn more about these projection systems here.


The projection systems differ significantly with respect to their standard deviations of wOBA, with some hitting projection systems being particularly more risky in estimating the performance of players. The more risky a projection system, the more likely it will be wrong by a lot, which hurts its performance, particularly with respect to its Root Mean Square Error. Thus, riskier projection systems may be right more often, but when they’re wrong, they’re very wrong. So, before we do anything, let’s rank the projection systems in terms of how risky they are:

Projection StDev of WOBA
Oliver .0309
Steamer .0289
ZiPS .0287
Cairo .0283
PECOTA .0278
Marcel .0234

Marcel is going to have fewer “big misses” than Oliver will, so we’ll want to look at both RMSE (which will punish risky guesses) and Correlation (which will reward better player rankings), as well as average absolute error (which will fall somewhere in between in terms of punishing and ignoring risky projections).

Here is the RMSE table, weighted by PA, and only including guys with at least 200 PA. As you see, PECOTA, a relatively safe projection comes out ahead, even further ahead than Marcel which is even safer. I’ll also include a row for “last year’s stats” to see how predict they are.

Projection RMSE
PECOTA .0317
ZiPS .0318
Oliver .0321
Steamer .0322
Marcel .0330
Cairo .0333
Last Year’s Stats .0388

Oliver fared pretty well, despite its risky nature. It takes a step forward when you look at absolute average error.

Absolute average error and root mean square error are differing in terms of how much they punish bad performance. Take System A that misses on Player X by 20 points of wOBA and misses on Player Y by the same amount. Take System B that guesses Player X exactly but misses on Player Y by 30 points. Average absolute error will favor System B, but RMSE will favor System A.

Projection AAE
ZiPS .0244
Steamer .0247
Oliver .0247
PECOTA .0248
Marcel .0257
Cairo .0264
Last Year .0303

ZiPS is the champion of AAE, with its somewhat risky projections. They may be wrong by more when they’re wrong, but they’re right more often.

If we then jump forward and look at correlation, we get a whole new winner. Correlation is going to be different because all correlation cares about is rankings for the most part. If you projected Ryan Braun to have a .530 wOBA and Adrian Beltre to have a .430 wOBA, you would have had a great projection year using correlations, despite the fact that Braun’s wOBA was closer to .430 and Beltre’s was closer to .380. Correlation just wants you to rank the guys well. Using correlation, we get the following rankings.

Projection Correl.
Oliver .6151
ZiPS .6139
PECOTA .6136
Steamer .6039
Cairo .5685
Marcel .5614
Last Year .4740

Oliver comes out in front if you use correlation. Despite having perhaps overly aggressive estimates of talent level, scaling back your Oliver projections might have been the best way to predict hitters.


What about pitchers? Well, the leaderboard will look quite different there. Following some of my previous work, I include some ERA Estimators among pitching projections. This time, I’ll convert them into projections by regressing ERA in 2011 against 2010 and 2009 versions of this ERA estimators. This produced the following formulas:

SIERA_proj = .59*SIERA(’10) + .26*SIERA(’09) + 0.47
xFIP_proj = .65*xFIP(’10) + .24*xFIP(’09) + 0.29
FIP_proj = .43*FIP(’10) + .30*FIP(’09) + 0.94
tERA_proj = .38*tERA(’10) + .29*tERA(’09) + 1.08

The projections now have the following standard deviations of ERA among all pitchers with 40 IP in 2011:

Projection StDev of ERA
ZiPS .7322
PECOTA .7238
Oliver .6356
Cairo .5314
Steamer .5207
Marcel .4453
SIERA_proj .4188
xFIP_proj .3854
FIP_proj .3829
tERA_proj .3807

Starting off with RMSE—which should punish riskier projections, we see that it does exactly that:

Projection RMSE
Steamer .8324
Cairo .8736
SIERA_proj .8746
xFIP_proj .9014
FIP_proj .9033
tERA_proj .9050
Marcel .9066
PECOTA 1.024
ZiPS 1.030
Oliver 1.042
Last Year’s Stats 1.282

ZiPS, PECOTA, and Oliver all had the riskier projections and all fared the worse. Interestingly, despite being more risky than scaled back ERA estimators, Steamer and Cairo outperformed them at RMSE.

What about average absolute error? The rankings look similar, though a few projections swap places.

Projection AAE
Steamer .7067
SIERA_proj .7281
Cairo .7331
FIP_proj .7333
xFIP_proj .7360
tERA_proj .7361
Marcel .7474
ZiPS .7749
PECOTA .7905
Oliver .8009
Last Year’s Stats .8766

Steamer again comes out ahead. Moving to correlation, we see the same type of thing, though surprisingly, Marcel does better and Oliver does worse with correlation, despite its punishment of conservative projections.

Projection Correl.
Steamer .4581
Cairo .4213
SIERA_proj .4089
xFIP_proj .3763
Marcel .3744
FIP_proj .3739
tERA_proj .3715
PECOTA .3705
ZiPS .3701
Last Year’s Stats .3265
Oliver .3163

But on all three, Steamer comes out ahead. I asked Jared Cross what was making his projections so good, and he explained that he was using velocity (as well as handedness) in his pitcher projections, and that was giving them a leg up. He wasn’t the only person to suggest doing something like this. I only started thinking seriously about it recently, but I think it really is the “next big thing” in pitcher projections. Unlike hitter projections which seem to come down to which metric you want to use to test them, pitcher projections come back Steamer in all three tests. Perhaps more interestingly, the better-known projections such as Oliver, PECOTA, and ZiPS, despite doing the best on hitters, they fare the worst with pitchers. Perhaps being good at projecting both pitchers and hitters is as rare as being good at doing both of them.

Of course, these are all just one-year tests, so there is a lot of luck involved for any of these. However, as each of these systems moves forward to their next race, this is where they stand.

Print This Post

Matt writes for FanGraphs and The Hardball Times, and models arbitration salaries for MLB Trade Rumors. Follow him on Twitter @Matt_Swa.

47 Responses to “Testing Projections for 2011”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. cobradc23 says:

    Adam Dunn hitting 50 homeruns in Comerica would be a feat indeed since he will only play 9 games there this season.

    +9 Vote -1 Vote +1

  2. Jesse says:

    Matt, one thing that would be interesting but very difficult would be to adjust each of these projections to match the same total run environment. Or alternatively to look at something different see if you can test their predictions on run environment.

    I think its very hard to interpret the role we should place on several of these tests in assessing our confidence in various systems projections of individual players. Correlation i guess is important because at the end of the day you want to be ranking players, for fantasy. But when a system makes a call that’s pretty surprising relative to others, thats where I really want to know what’s going on. Any ideas?

    Vote -1 Vote +1

    • Jesse says:

      also, really great work. Steamer really nailed it. I wonder if there isn’t some means of coming up with a similar big find for batters. The first that jumps to mind is, see how they respond to pitcher velocity? That would be an interesting investigation in and of itself.

      Vote -1 Vote +1

    • Matt Swartz says:

      You’re right– I normalized all projection systems’ weighted average of wOBAs to .330 and ERAs to 3.82. Otherwise, predicting run environment might be more important than predicting players!

      Vote -1 Vote +1

      • Jesse says:

        Oh interesting. How does that affect “last year”? Did you adjust it, or does this mean that major swings in run environment year to year will affect rmse of “last year”? And why did you pick those points?

        It’d be interesting to know what the “last year” stats were going back, because that’d be a good first order way to see if the models as a whole are getting better.

        Vote -1 Vote +1

      • Matt Swartz says:

        I adjusted last year to .330 & 3.82 as well, just by adding/subtracting. I picked .330 because that was something Tom Tango suggested when he gave me the basic wOBA formula he’d suggest using (which gave me something like .325 or so), and 3.82 because that was the league average ERA last year when exclude pitchers below 40 IP.

        Vote -1 Vote +1

  3. Hurtlockertwo says:

    Projections?? I scoff at your projections! Evidence? Zoilio Versalles, Darin Erstat, Chris Hoiles, Darren Daulton, Willie Magee, Sixto Lezcano, Darrell Porter, Bob Cerv……The list is huge. That’s why we love baseball so much.

    Vote -1 Vote +1

  4. Urban Shocker says:

    How did they compare to 3 year Averages?

    Vote -1 Vote +1

  5. Baltar says:

    Wow, this article had some really atrocious writing.
    “While frequently accused of being overly pessimistic about whoever your Home Nine are, on average, they land high about as often as they land low.”
    Who is accused? Who are “they”?
    “only including guys with at least 200 PA” actual, predicted? What year?
    Also, I’d like to point out for the 100th time that FIP is not an “ERA predictor,” it is an actual measure of fielding independent defense.

    -14 Vote -1 Vote +1

    • Matt H says:

      Dude, he created SIERA. I think he knows how FIP works.

      Vote -1 Vote +1

    • Barkey Walker says:

      ERA and FIP are operational measures of the same thing–pitcher performance. They take different stands on how to do that, but it is valid to see how they do at predicting one another.

      Vote -1 Vote +1

    • Matt Swartz says:

      Sorry if it wasn’t clear:
      they = projections
      200 PA = 200 PA in 2011 in real life

      Also, I called FIP an ERA Estimator, not an ERA Predictor. I try not to call things ERA Predictors unless they are predictions.

      Vote -1 Vote +1

      • Baltar says:

        Thank you for the clarifications.
        I should have known you were talking about actual 2011 results from the title, and I apologize for my error in confusing estimator with predictor.

        +7 Vote -1 Vote +1

  6. How many pitchers were included in your sample? I ran a similar test of ZiPS and one other projection system for 2011 which included 301 pitchers with a similar innings constraint. I got different results for ZiPS, but I take it your sample probably had less pitchers because they had to have been included in all the projection systems forecasts? Let me know, thanks!

    Vote -1 Vote +1

    • Matt Swartz says:

      I think it was 355 or 359 pitchers or something like that. It was everyone with 40 IP, and I filled in numbers for “blank” projections by doing a slightly below average wOBA for hitters (I think .310) and a slightly above average ERA for pitchers (I think 4.12, but I don’t remember), with the exception of Marcel, where I was told it’s supposed to be league averages. There weren’t many missing projections, though, IIRC.

      Vote -1 Vote +1

      • evo34 says:

        I don’t think it’s reasonable to fill in numbers for “blank” projections with anything. By definition, these guys got more playing time than expected in 2011 and therefore probably preformed closer to league-average (better) than expected. So you are giving an advantage to systems that didn’t project these players.

        A fairer approach would be to simply drop from your analysis those players that did not have projections from every system (excluding Marcel). I hope you will consider re-running the analysis, as I don’t think it fairly judges the systems in its present form.

        Vote -1 Vote +1

  7. Mark says:

    Getting to practicalities: if I’m in a very deep, NL-only, auction keeper league where you win predominantly based on buying low at auction on emerging talent that you can then lock up for years, I should use … what system for batters? Pitchers?

    Vote -1 Vote +1

    • Matt Swartz says:

      I think if you averaged the best projection systems for batters, that does a lot better IIRC. A long time ago, I checked projections on subsets of batters and found major differences. I think ZiPS used to be very good with older players and PECOTA used to be very good with very young players, but I think both systems have changed a lot since then. Maybe I’ll check that one day (or you can do it first and use it in your keeper league before I publish anything!).

      Vote -1 Vote +1

    • rainiers says:

      I’d just use your own baseball knowledge. That way you can take all the credit when you win your league.

      Vote -1 Vote +1

  8. Matt Swartz says:

    Just had it pointed out that the way that I fitted SIERA_proj, xFIP_proj, FIP_proj, and tERA_proj, I’m going to get positively biased results since I used 2011 data to test 2011 data. I’ll redo this analysis in the next few days and make note of it here.

    Vote -1 Vote +1

    • Matt Swartz says:

      The pitching projections don’t really change much with the fix I did, which is regressing 2010 ERA on 2009 & 2008 ERA estimators and re-centering:


      Steamer: .8324
      Cairo: .8736
      SIERA_proj: .8809 (was .8746)
      xFIP_proj: .9028 (was .9014)
      Marcel: .9066
      FIP_proj: .9103 (was .9033)
      tERA_proj: .9190 (was .9050)
      PECOTA: 1.024
      ZiPS: 1.030
      Oliver: 1.042

      (so FIP_proj and tERA_proj move behind Marcel at RMSE)


      Steamer: .7067
      SIERA_proj: .7322 (was .7281)
      Cairo: .7331
      FIP_proj: .7377 (was .7333)
      xFIP_proj: .7385 (was .7360)
      tERA_proj: .7463 (was .7361)
      Marcel: .7474
      ZiPS: .7749
      PECOTA: .7905
      Oliver: .8009
      Last Year’s Stats: .8766

      (so the ranking doesn’t change)


      Steamer: .4581
      Cairo: .4213
      SIERA_proj: .4026 (was .4089)
      xFIP_proj: .3746 (was .3763)
      Marcel: .3744
      PECOTA: .3705
      FIP_proj: .3665 (was .3739)
      tERA_proj: .3587 (was .3715)
      ZiPS: .3701
      Last Year’s Stats: .3265
      Oliver: .3163

      (So FIP_proj and tERA_proj move behind PECOTA in correlation)

      Vote -1 Vote +1

  9. johnnycuff says:

    any idea if/when the 2012 version of steamer will be available? or marcel here on fangraphs for that matter? i know ZIPS is rolled out one team at a time before dan releases the entire series and that he’s almost done.

    Vote -1 Vote +1

  10. John R. Mayne says:

    Great piece, Matt. Really helpful.

    I was really hoping someone would do something like this.

    The Steamer pitcher information strikes me as particularly heartening. I’m glad to hear my views on the subject are put to good use (my article is linked toward the end, and I’m calling one of my fantasy teams the Staggering Geniuses this year; if you hit both links there, you’ll get that.)

    I did talk talk to a few projections-related people about velocity-based projections (and I’ve been tinkering with my own pitcher projections for a bit.) Some dismissed the idea outright or close to it; Sean Smith (formerly of CHONE fame) explained to me prior to the article why he thought it wouldn’t help (and was exceedingly helpful and gracious when I explained why I thought it would.)

    Use of handedness is a critical piece of knowledge when using velocity. Much less critical, but still informative, is the use of percentage of fastballs.

    On an aside, I’m surprised PECOTA did as well as it did. They had some known breakages in their projections (the Kila-Bowker problem, and minor leaguers generally.) As I think about it, Kila and Bowker and other similarly situated folks didn’t meet the 200 PA cutoff – so they didn’t get factored in. So maybe I shouldn’t be as surprised.


    Vote -1 Vote +1

    • John R. Mayne says:

      For clarity: Sean Smith was convinced that velocity had an effect when I showed him a draft of the article (and continued to be very helpful.)

      Vote -1 Vote +1

  11. byron says:

    Buckets! How did they do in prediction the top 25% of players, etc.?

    Vote -1 Vote +1

    • byron says:

      Should clarify: this is so awesome I want more. I’m not saying you did an incomplete job.

      Vote -1 Vote +1

      • Jay says:

        I agree with this idea. I would imagine that a significant amount of the error in projections occurs in fringe players. Fantasy players want projections for the top 250 players or so, so it would make sense to judge projection systems based on (mostly) established players.

        Vote -1 Vote +1

  12. Bobby Mueller says:

    Great article. I’d be interested in seeing how a combination of all the projection sources would do. I’ve been using a “wisdom of the crowds” approach for the last few years and it seems to work well.

    Vote -1 Vote +1

  13. Vision says:

    Averaging projection systems for hitters has always been my approach, but this is great on pitchers. I’m going to use Steamer when it comes out.

    Great article…thanks.

    Vote -1 Vote +1

  14. Vision says:

    PS- I fell for PECOTA on Kila, and got really burned when I missed out on undervalued Konerko and Ortiz to fill my DH spot.

    Vote -1 Vote +1

  15. evo34 says:

    Awesome work. Please make this an annual study.

    1) Do have access to historical projections? If so, any chance of running this same analysis on 2009 and 2010 seasons?

    2) What do you think of the methodology employed here: http://www.fantasypros.com/about/faq/accuracy-methodology/ ? Would it be easy to run a similar analysis on the data you have? You’d have to assume some standard fantasy scoring system obviously to do so.

    Vote -1 Vote +1

    • Matt Swartz says:

      I have some of the older projections too, though not for every system. I’ll try to test some of the older systems. I’ll need to look at the link you posted, though I don’t know too much about football data.

      Vote -1 Vote +1

  16. evo34 says:

    Would you care to publish the AAE after adjusting for overall offense level? That is, if a system projected a run environment 0.10 runs too high, all of its projections would get reduced by 0.10 before being evaluated. Just trying to separate overall run environment prediction skill from indiv. player differentiation skill. I get that the correlation analysis provides some of the latter, but would like to see a bias-adjusted AAE of possible. THanks.

    Vote -1 Vote +1

    • evo34 says:

      My bad… I failed to see your comment where you say that you did adjust all projections for environment:

      “You’re right– I normalized all projection systems’ weighted average of wOBAs to .330 and ERAs to 3.82. Otherwise, predicting run environment might be more important than predicting players!”

      Vote -1 Vote +1

  17. Antonio Bananas says:

    Is there a way we can measure bat speed? If the velocity is a way to measure pitchers and predict, maybe we can look at bat speed similarly. The notion is that if you have a quick bat, you obviously hit the ball harder, but you also have longer to wait on a pitch and see it.

    Vote -1 Vote +1

  18. batpig says:

    impressive that using a simple weighted SIERRA estimate whoops nearly all pitching projection systems. Can you publish a spreadsheet for 2012 ERA estimates based on your SIERRA formula?

    one thing I’m curious about is if an average of the all systems performs better overall? perhaps by “smoothing” out the errors of the wildest projections by any “outlier” system. Can you add that to the data? (especially for hitters)

    this “grading” of projection systems is really a fertile topic that deserves a lot more digging into. Please keep more coming! Which systems perform best at which “types’ of players? old and established? young and with limited major league data? etc.

    Vote -1 Vote +1

    • evo34 says:

      Unfortunately, I don;t think it’s really a fair test bc the projection systems attempt to predict perf. of nearly everyone who had any chance of having 40 IP in the coming season, whereas the ERA estimators got a free pass on those pitchers with no prior major league data. Why is it a free pass? Because they got to skip the very hardest players to project. And those players who did actually do well enough to get 40 IP the next year, the ERA estimator systems were able to use a league-average projection, essentially *after* the fact.

      So unless I am misinterpreting Swartz’s methodology, he really should re-do the analysis after dropping any players not projected in all systems being preprared. I.e., do one comparison with estimators, Marcel and all others (of major league vets only); do one comparison of only projection systems (that includes virtually all players).

      Vote -1 Vote +1

      • Matt Swartz says:

        I actually included ERAs .20 higher than league average for any missing data, so it didn’t really give anyone a free pass in that case– it actually limited their projection numbers to have them all given the same ERA for those pitchers.

        Vote -1 Vote +1

      • evo34 says:

        Even average +0.20 is a free pass. Most of these players were not expected to get playing time in 2011, so the true projection systems would likely project really bad ERAs for them as a group. By allowing the ERA estimators to insert an average+0.20 “projection” after having the knowledge that the player earned significant playing time, is an unfair advantage. The test should be broken into two and should use different universes of players, depending on whether the system is a true projection system or simply an estimator of past performance by MLB veterans.

        Vote -1 Vote +1

    • Matt Swartz says:

      Sure, I’ll try to put together a spreadsheet of the SIERA formula. It was something like 70% SIERA last year, 20% SIERA two years ago, and 10% league average, but I’ll put that together soon.

      The average of Oliver, PECOTA, and Steamer worked a little better than any of them individually for batters. The correlation was .6230, the AAE was .0243, and the RMSE was .0314.

      If you average Marcel, Cairo, and Steamer for pitchers, you can do a little better too. The correlation was .4695, the AAE was .706, the MSE .820 (I actually think I published MSE instead of RMSE).

      Vote -1 Vote +1

  19. dashd says:

    I am one of the creators of Steamer Projections and we are working hard trying to get our 2012 projections to you guys as soon as possible. We are hoping that the steamer pitcher projections (with hitters soon to follow) will be out by the end of the week.

    Our website and projection downloads can be found here: http://www.steamerprojections.com

    thanks for the great work on this article Matt.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>