Projection vs Projection

It’s almost opening day, and it seems like everyone is talking about projections.

When considering a projection, there are really two questions to be answered – what is the player’s “True Talent Level” right now, and how will he perform next year? Between now and the end of next year, his talent level very well might change, as he’s a year older and might recover from or succumb to injuries. Even then, there’s still the random variance of a single season performance. In this article I’d like to explore how some of the major projection systems work when predicting different subgroups of players.

I tested the following projections: PECOTA (2006-2009), ZiPS (2006-2009) CHONE (2007-2009) and my own Oliver (2006-2009).


The first test was to group the yearly projections to the nearest .010 of wOBA, and then see how that group of players actually performed. There were 468 players who had projections from all four systems, and had at least 350 plate appearances in the major leagues in the following season. As 2009 is yet to be played, and CHONE is not available for 2006, these projections to next year comparisons are for the 2007 and 2008 seasons. All four projections were tested on the same 468 players. The observed results were unadjusted major league stats, so that the results of the test would not be influenced by which park factors or MLE formulas I chose to normalize stats.

To read the results, CHONE of the players would have a wOBA between .375 and .385, averaging .380, 25 of them had 350 or more PAs in MLB in the following seasons, and those 25 players had an average wOBA of .363, so at that level CHONE was .017 high. Oliver was .008 high on 21 projections, PECOTA .027 high on 26, and ZiPS .014 on 26. The last line of the table shows the root mean square error (weighted by number of players). Oliver had the lowest mean error at .006, followed by CHONE .011 and PECOTA and ZiPS at .012 each.

Obs Players Error Players Error Players Error Players Error
0.250 0 0.000 0 0.000 0 0.000 1 -0.067
0.260 0 0.000 0 0.000 1 -0.041 1 -0.018
0.270 2 -0.057 1 0.001 3 -0.013 1 -0.043
0.280 2 -0.018 4 -0.036 2 -0.045 4 -0.022
0.290 8 -0.033 9 -0.017 11 -0.030 13 -0.020
0.300 14 -0.005 23 -0.010 20 -0.013 20 -0.012
0.310 29 -0.006 33 -0.002 31 -0.007 19 0.003
0.320 44 -0.005 53 -0.005 37 0.002 51 0.000
0.330 74 0.004 81 -0.002 58 0.003 56 0.000
0.340 91 0.000 87 -0.003 66 0.004 66 0.002
0.350 57 0.004 68 0.001 80 -0.004 74 0.001
0.360 50 0.009 48 -0.003 56 0.011 55 0.012
0.370 34 0.011 21 -0.004 33 0.012 36 0.012
0.380 25 0.017 21 0.008 26 0.027 26 0.014
0.390 9 0.003 10 -0.002 17 0.014 19 0.020
0.400 13 0.019 5 0.020 15 0.019 7 0.017
0.410 7 0.017 2 0.011 5 0.017 4 0.019
0.420 4 0.037 1 -0.049 5 0.027 6 0.029
0.430 2 0.047 1 0.001 1 -0.035 5 0.041
0.440 2 -0.009 0 0.000 1 0.018 3 0.023
0.450 1 0.025 0 0.000 0 0.000 1 0.026
rms 468 0.011 468 0.006 468 0.012 468 0.012

By Age

The same 468 players, same rules, but now the players are grouped by age. The combined rms error is about the same for all, at .007 for Oliver and .008 for the other three. CHONE and ZiPS are a few points of wOBA high for most ages. Oliver under projects younger (pre-peak) players at .005-.010 points of wOBA, and over projects older players about the same amount. PECOTA is the opposite, being a little high for the younger players and a little low for the older ones. Oliver shows the lowest total error (bias) of -.002, but because of it’s error correlating with age, Oliver shows the highest r2 correlation factor of .206 (for ages 21-35, which have 12 or more players each).

Age Players PA CHONE Oliver PECOTA ZiPS
19 1 411 -0.004 -0.007 0.016 -0.022
20 3 1485 0.017 0.022 0.026 0.014
21 12 6587 0.003 -0.002 0.005 0.006
22 23 12205 0.002 -0.006 0.011 0.004
23 38 21423 0.001 -0.009 0.001 0.001
24 36 20677 -0.002 -0.009 0.002 0.002
25 37 20538 0.000 -0.006 0.001 0.002
26 39 21891 0.011 0.005 0.014 0.016
27 44 23580 0.003 -0.004 0.007 0.007
28 35 19038 -0.010 -0.011 -0.008 -0.005
29 34 17434 0.010 0.001 0.006 0.008
30 32 18491 0.007 -0.006 0.003 0.004
31 37 19013 0.020 0.008 0.011 0.015
32 24 13975 0.002 -0.004 -0.004 0.000
33 18 9702 0.003 0.004 -0.001 0.003
34 17 8545 0.005 0.004 0.012 0.014
35 13 7063 -0.001 -0.003 -0.003 -0.002
36 7 3714 0.000 0.001 -0.004 0.001
37 5 2295 0.007 0.010 -0.011 0.005
38 5 2580 -0.010 0.009 0.008 0.000
39 6 2699 0.009 0.008 0.023 0.007
40 1 548 0.026 0.031 0.037 0.036
41 1 434 0.016 -0.061 -0.005 0.011
rms 468 254328 0.008 0.007 0.008 0.008
bias 468 254328 0.004 -0.002 0.004 0.005
r2 468 254328 0.031 0.206 0.037 0.033

In the final part of this series, I’ll look at how minor league performances are evaluated.

Print This Post

Brian got his start in amateur baseball, as the statistician for his local college summer league in Johnstown, Pa, which also hosts the annual All-American Amateur Baseball Association. A longtime APBA and Strat-o-Matic player, he still tends to look at everything as a simulation. He has also written for StatSpeak and SeamHeads You can contact him at

19 Responses to “Projection vs Projection”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Paul Scott says:

    Just a few suggestions:

    I think two other “projection” systems should always be added to these “projection surveys.” The first is just a mean of each players last three years. This would serve as a baseline test. Secondly, Marcels should be included, given it’s simplicity. My feeling on this one is that if you can’t do better than Marcels by a reasonable degree then you really need to evaluate whether it is worth your time to do the projection.

    Finally, though I understand this is much harder, I like the look you are taking on your age correlation. Since all of these systems are very likely to be close in overall accuracy, the most interesting and meaningful factor is bias. You should consider breaking out more categories – a recent (if dense) study (linked to from and discussed on The Book Blog) indicated PECOTA likely has a bias overvaluing speed. I’d like to see a lot more done in evaluating biases of projection systems as this sort of thing could lead to an understanding of some real effect in baseball.

    Vote -1 Vote +1

  2. Nacho says:

    There error by wOBA looks rather bell shaped to me, if we just look at the absolute value. The middle of the pack, ~.320-.350, has pretty low error across the board, and it generally gets worse as you move higher or lower. The lower sample size per category has something to do with that, but that seems somewhat intuitive. Projections are going to be good at hitting the middle of the pack, but have a very hard time projecting dramatic drops or dramatic increases, which is what many of the sub-.300 or over-.380 are likely to be. At least when considering the sample. When you only have a category of ~10, one player falling off the map (say Howard) or making a huge jump (Quentin) is going to kill your error.

    Vote -1 Vote +1

    • I agree with your observation, and I think that’s an effect of adding in regression to the league mean, causing all of these to resist projecting an extreme value. We definitely want to include regression when there is a shortage of data on a player, to avoid extreme results that would lead to improper conclusions, but when we have a full set of data on a player, do we need the same amount of regression, pulling these established players to the center?

      Vote -1 Vote +1

  3. TangoTiger says:

    I agree that Marcels should be a part of any study. I told Brian as much, and I’m glad at least one other reader thinks so. The entire point of the Marcels is to serve as THE benchmark that all others need to beat. The first poster said it perfectly. More readers should speak up about this if they agree.

    And for any player that is not forecast, Marcel comes with this rider:

    FAQ: “But, what about a player who’s never played MLB? Where’s his forecast?” That’s simple. His forecast is the league mean over 200 PA, 60 IP (starter) or 25 IP (reliever). If you want to know what the league mean is, just take the average of anyone forecast with a reliability of 0.00. So, Marcel’s official forecast for anyone coming over from Japan is that.

    Obviously, it makes no sense for me to generate an extra 7000 records of identical forecasts, when I can just make the above statement.

    Vote -1 Vote +1

  4. Peter Jensen says:

    To read the results, CHONE of the players would have a wOBA between .375 and .385, averaging .380, 25 of them had 350 or more PAs in MLB in the following seasons, and those 25 players had an average wOBA of .363, so at that level CHONE was .017 high.

    I am not sure I understand what you are actually doing in your tables. Take CHONE’s predicted.340 wOBA players for instance. If I understand correctly from the text quoted above the 91 players in this category could have had actual wOBA’s with 40 of the them at .140, 1 at .340, and 40 of them at .540 and still had an error of .000. Is that correct? If so what you have done is meaningless. If not please explain further because I am missing something.

    Vote -1 Vote +1

    • The table lists only the mean error. Your example would indeed also show a mean error of .000, but what I did calculate, but not publish, was that for all groupings with at least 9 players, CHONE had an rms of between .025 and .033, which shows the variance of the comparisons. Oliver’s rms on the wOBA chart were .024 to .033, PECOTA .025 to .040, and ZiPS .021 to .034. I probably should have explicitly stated that the variances in the subgroups were no different than the overall variances I found in the prior article, and they were pretty consistent between the projections.

      Once I had established that total (root mean square) error was never going to be less than about .030 because of variance in the sample the projections were being compared to, I wanted to find if there were any biases in subgroups of data.

      Vote -1 Vote +1

      • Peter Jensen says:

        Brian – Finding whether a projection system has any bias’s is an important internal check on that metric’s internal consistency but is not really a good measure to compare different projection system’s with one another. What would be really interesting to know would be if one projection system were better than others at a:projecting players with less than a year’s MLB experience, b:projecting players 32 years of age or older, c:projecting players that change teams, d: projecting players who do NOT get 350 PAs. You can’t use the average value for measure of success for the reason I stated above. RMS, or average absolute value of difference, or percentage of player’s whose wOBA is within a fixed value of the projection would all be acceptable measures.

        I had trouble understanding what your tables were measuring in the first article of the series as well and had asked for further explanation from you which I never received. I wish you would spend more time defining exactly what questions you are trying to answer and what methodologies you are using.

        Vote -1 Vote +1

  5. NadavT says:

    By looking at only one year of data here, are you worried that you might be ignoring any effect of overspecification of the model used to generate the projections?

    Vote -1 Vote +1

    • There’s two seasons – 2007 and 2008 projections for all 4 systems, compared to those season’s actual stats. I also ran 2006-2008 without CHONE, without any noticeable difference in the results.

      Vote -1 Vote +1

  6. Bearskin Rugburn says:

    Like the two posters above, I am somewhat confused by the absence of Marcels from your assessment, particularly given their free and easy availability. Is there a particular reason they were left out?

    Also, I am not sure i udnerstand what you are saying here
    “All four projections were tested on the same 468 players. The observed results were unadjusted major league stats, so that the results of the test would not be influenced by which park factors or MLE formulas I chose to normalize stats.”

    What I understand this to mean is that the results are not park adjusted while the projections (and I could be way off the mark here because I don’t remember the specifics of ZiPS, CHONE etc) are park neutral. This seems like a bad idea to me, and while I understand that there are lots of ways to do park factors which can be problematic the best thing to do here would be to make whatever adjustments each system makes when evaluating that system, if they are available.

    Vote -1 Vote +1

    • Park adjustments go into most of the projections, (Marcel excluded). Different folks have different factors, although they are generally close. I felt that it would be a bias if I judged somone else’s projection on the basis of my park factor.

      Vote -1 Vote +1

      • Bearskin Rugburn says:

        Yes, yes, of course but what I was suggesting was to correct the performances by using the park factors particular to each system if they are available (I imagine CHONE and ZiPS would be though PECOTA’s may be proprietary). I know it sounds time consuming but it also seems like the most appropriate way to do things.

        Vote -1 Vote +1

      • Dan Szymborski who does ZiPS also publishes his park factors. I’m not sure if Sean Smith of CHONE does, and I don’t think PECOTA does. I considered the park factors used to be part of the projection, something that would contribute to the accuracy of the particular projection.

        I prefer play by play based factors, and I will be updating my MLB factors soon, and publish here at FanGraphs. I soon will have GameDay data for 2005-2008 for the minor leagues, and will rerun minor league park factors based on that pbp, using the same methods I do for MLB.

        Vote -1 Vote +1

  7. Ben says:

    Brian – where can we find Oliver projections from previous (2006-2008) years?

    Vote -1 Vote +1

  8. The Ancient Mariner says:

    I’ll second Tango and Bearskin Rugburn on this one (my disagreement with Paul Scott is that I think Marcel ought to be the baseline) — I think having that baseline for comparison is important.

    Vote -1 Vote +1

  9. Joel says:

    What about the Bill James projection? His always seem way too optimistic, so it would be interesting to see how they fared (if they are available)

    Vote -1 Vote +1

    • FanGraphs has purchased rights to republish Bill James, but I would have needed to enter hundreds if not thousands of players by hand. I don’t have my own subscription to James, but I do have a Baseball Prospectus subscription, which allows me to download and evaluate the PECOTAs, but not redistribute them. ZiPS and CHONE are freely distributed.

      Vote -1 Vote +1