How Good Is That Projection?

FanGraphs is now presenting five different player projections systems: Bill James, ZiPS (Dan Szymborski), CHONE (Sean Smith), Marcel (Tom Tango) and Oliver (Brian Cartwright). The natural question from you the readers is “Just how good is this projection?”

First, we need to understand the importance of sample size. Season statistics are just a sample of a player’s true talent. You might catch a player during a hot or cold streak, and without enough data, be misled into forming an incorrect perception of the player. On one hand, the more data, the better. On the other, we’re trying to capture a moving target. By the time we get a sufficient sample, the player’s true ability might well have changed.

My first test for a sufficient sample size was to compare wOBAs in all consecutive single seasons of unadjusted major league batting stats in my projections database, weighted by the smaller of the two plate appearances. I could presumably get more accurate results by normalizing with park factors, but I did not want to bias these results by using any of my proprietary formulas.

Sample Players vOBA
0 94 0.150
50 105 0.117
100 186 0.072
150 331 0.063
200 563 0.058
250 704 0.053
300 519 0.047
350 1044 0.041
400 1114 0.041
450 1122 0.039
500 1063 0.038
550 592 0.034
600 314 0.033
650 185 0.031
700 30 0.028

350 is the minimum necessary for a decent wOBA calculation, with a mean error of .041. This is where the graph begins to flatten, with the reduction in variance between 300 and 350 PAs being almost the same as between 350 and 550. 600 is preferable, even as the error continues to drop at a higher number of PAs. This means that even with 600 plate appearances in both years, any player has a 1 in 3 chance of having a wOBA the next year at least 33 points higher or lower than this year.

Only a few players manage to get 600 PAs in a single season, and only rarely get to 700. In order to get a better sample of the batter’s true talent, more than one season is necessary. The more seasons that are used, the further back into time we reach, increasing the probability that the number we are trying to measure has changed since then. One method to minimize this effect is to give a diminishing weight to past seasons. Tom Tango’s Marcel uses three seasons, weight 5-4-3. Dividing each by 5, so that the most recent year is weighted as 1.0, gives 1.0, 0.8, 0.6. If all three seasons have an equal number of PAs, then last year is 42% of the total, year 2 33%, and year 3 25%. When developing Oliver, I ran tests which showed that when using an unlimited number of years, a factor of 0.7 (1.0, 0.7, 0.49, 0.31, 0.22, etc) minimized the error of each year’s projection compared to the next season’s actual stats. However, with many years last years stats can be as little as 30% of the total sample. I did not feel that this allowed Oliver to be responsive enough to meaningful changes in a player’s yearly stats, and have since lowered my weighting factor of past seasons to 0.5 (1.0, 0.5, 0.25, etc), which puts last year at approximate 50% of the sample.

Comparing season to season projections results in the following mean errors

Sample Players vOBA
0 214 0.026
100 620 0.026
200 993 0.024
300 1188 0.022
400 1011 0.021
500 1166 0.020
600 1343 0.017
700 1524 0.016
800 1400 0.015
900 1163 0.014
1000 652 0.014
1100 367 0.014
1200 252 0.014
1300 108 0.012
1400 14 0.010

The year to year errors for the projections are only half that of comparing actual stats, but that is understandable because half of the projection is last year’s stats. The error curve starts flattening at 600 PAs, and after 900 there is virtually no reduction in error.

Presumably, the addition of data from previous years was to give us a more accurate estimate of a player’s “true talent” at this point in time. That is not the same question as what that player’s “true talent” will be at the end of next year, or what next year’s stats will be. The data can be further massaged by taking a player’s upward or downward trends the past few years combined with the average change for a player of his age to estimate where he will be a year from now.

So, to combine these two, I took all Oliver projections with a sample size of 900 or more PAs, and compared them to the following single season of 350 or more PAs. The first set of numbers are copied from the table above, showing the mean error of one single season to the next; the second set is the projections of size 900 or more compared to the next season.

Sample Players vOBA Players vOBA
350 1044 0.041 68 0.042
400 1114 0.041 101 0.040
450 1122 0.039 117 0.036
500 1063 0.038 166 0.035
550 592 0.034 177 0.033
600 314 0.033 194 0.037
650 185 0.031 171 0.035
700 30 0.028 62 0.038

This seems like a problem – the mean errors of comparing Oliver to the following season were no better than comparing just the previous season to the next! Does this mean that all the work in developing the projection was for nothing? Fortunately, the answer is NO. What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement. That’s why Marcel the Monkey looks so smart – this test cannot show if any other system is better.

Besides, when using only major league statistics, the major tools available to build a projection are past season weighting, park factors, aging, and regression to the mean. I think you could expect any competent system to give virtually the same results. Most projections also incorporate minor league data, which necessitates having to calculate the difference in level of competition from each minor league to the majors, and that’s likely where more variance will be seen between systems.

Next time, comparing accuracies of Oliver, ZiPS, CHONE and PECOTA.

Print This Post

Brian got his start in amateur baseball, as the statistician for his local college summer league in Johnstown, Pa, which also hosts the annual All-American Amateur Baseball Association. A longtime APBA and Strat-o-Matic player, he still tends to look at everything as a simulation. He has also written for StatSpeak and SeamHeads You can contact him at

7 Responses to “How Good Is That Projection?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. bikozu says:

    Cool. Some good stuff in here that I haven’t read/seen previously. That factor of .7 thing is pretty interesting too.

    When will we get park factored wOBA (the one used on this site that includes baserunning etc) on the projection page? This information could be useful.

    Vote -1 Vote +1

    • Brian Cartwright says:

      Reminds me of one of the points I need to lead the comparative article with – Oliver’s wOBAs are park adjusted, representing the player in a neutral park. Each batting component (SI, DO, TR, HR, BB, SO) is normalized by ballpark, league and age, then reassembled into a batting line, then finally calculate the new rate stats, including wOBA. To the best of my knowledge, ZiPS and CHONE are adjusted to the player’s home park at the time the projection was made. Marcel does not park adjust. Oliver will have lower wOBAs than ZiPS or CHONE for players in Colorado, Milwaukee, Philadelphia, Cincinnatti, etc, but I am also working on a new formula for HR factors which will not dock the premier power hitters as much, but the medium and lower guys more so, when they plau in a small park.

      Vote -1 Vote +1

  2. Peter Jensen says:

    What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement.

    This statement seems ridiculous to me. If you have x – y = z where x is year 1 performance and y is year 2 performance and you know the error range for a single years performances is +-10 then the error range for z is going to be +-20. If you instead use multiple years, and adjust for aging to try and more closely have x represent true talent and you are successful so that your error range of x is decreased to +-5, then the error range of z is going to be +-15. The actual mean error will depend on the distribution curves for each equation, but you should end up with a mean error of the 2nd example being about 75% of the first. That you did not is not a result of “the increased accuracy being masked by the noise inherent in a single season measurement”. It is a result a result of the particular manipulations that you performed to give you your projection having failed to provide you with any better predictor than the previous year’s stats.

    Vote -1 Vote +1

    • Brian Cartwright says:

      What I did not explicitly state was that in the first two charts, when comparing two rates, the sample size is the lower of the two. So it was 350 or more PA in year 1 compared to 325-375 PA in year 2, then 400 or more PA in year 1 compared to 375-425 PA in year 2, etc. This is standard procedure to express the results by the lower of the two sample sizes. General matched pairs methodology does it the same way, the counting stats of the larger sample are scaled down to the size of the smaller sample, or to the harmonic mean of the two sample sizes, which will not be too much larger than the smaller. This is because the level of unreliability of the smaller of the two samples will determine the outcome. I was saving comparing projections to a later article, but I could have shown that ZiPS and PECOTA gave just about the same results in chart 3. For the various subgroup tests that I have run, the rms error is always just about .030 wOBA, regardless of which of the three projections. Therefor, I’ve concentrated on total error, looking for any biases.

      Vote -1 Vote +1

  3. Peter Jensen says:

    Brian – Perhaps I have misunderstood what are you are doing. Could you explain again what exactly you are comparing in chart 2 and how it differs from chart 3?

    Vote -1 Vote +1

  4. Ben says:

    Are Oliver projections available for previous seasons? If so, where can I find these?

    Vote -1 Vote +1

  5. Brian Cartwright says:

    Oliver projections for previous seasons have not been posted on the Internet, but I can run them for all the players currently in my database, generally to 1998. However, the further back, there are a decreasing percentage of the players represented. Soon (weeks) I will have a complete set of minor league stats from 2005-2008 (GameDay data), and will add that to complete major league records 1998-2008, with scattered records for both before those dates.

    I will be glad to send you an email with projections for previous seasons.

    Vote -1 Vote +1