FanGraphs is now presenting five different player projections systems: Bill James, ZiPS (Dan Szymborski), CHONE (Sean Smith), Marcel (Tom Tango) and Oliver (Brian Cartwright). The natural question from you the readers is “Just how good is this projection?”

First, we need to understand the importance of sample size. Season statistics are just a sample of a player’s true talent. You might catch a player during a hot or cold streak, and without enough data, be misled into forming an incorrect perception of the player. On one hand, the more data, the better. On the other, we’re trying to capture a moving target. By the time we get a sufficient sample, the player’s true ability might well have changed.

My first test for a sufficient sample size was to compare wOBAs in all consecutive single seasons of unadjusted major league batting stats in my projections database, weighted by the smaller of the two plate appearances. I could presumably get more accurate results by normalizing with park factors, but I did not want to bias these results by using any of my proprietary formulas.

Sample | Players | vOBA |

0 | 94 | 0.150 |

50 | 105 | 0.117 |

100 | 186 | 0.072 |

150 | 331 | 0.063 |

200 | 563 | 0.058 |

250 | 704 | 0.053 |

300 | 519 | 0.047 |

350 | 1044 | 0.041 |

400 | 1114 | 0.041 |

450 | 1122 | 0.039 |

500 | 1063 | 0.038 |

550 | 592 | 0.034 |

600 | 314 | 0.033 |

650 | 185 | 0.031 |

700 | 30 | 0.028 |

350 is the minimum necessary for a decent wOBA calculation, with a mean error of .041. This is where the graph begins to flatten, with the reduction in variance between 300 and 350 PAs being almost the same as between 350 and 550. 600 is preferable, even as the error continues to drop at a higher number of PAs. This means that even with 600 plate appearances in both years, any player has a 1 in 3 chance of having a wOBA the next year at least 33 points higher or lower than this year.

Only a few players manage to get 600 PAs in a single season, and only rarely get to 700. In order to get a better sample of the batter’s true talent, more than one season is necessary. The more seasons that are used, the further back into time we reach, increasing the probability that the number we are trying to measure has changed since then. One method to minimize this effect is to give a diminishing weight to past seasons. Tom Tango’s Marcel uses three seasons, weight 5-4-3. Dividing each by 5, so that the most recent year is weighted as 1.0, gives 1.0, 0.8, 0.6. If all three seasons have an equal number of PAs, then last year is 42% of the total, year 2 33%, and year 3 25%. When developing Oliver, I ran tests which showed that when using an unlimited number of years, a factor of 0.7 (1.0, 0.7, 0.49, 0.31, 0.22, etc) minimized the error of each year’s projection compared to the next season’s actual stats. However, with many years last years stats can be as little as 30% of the total sample. I did not feel that this allowed Oliver to be responsive enough to meaningful changes in a player’s yearly stats, and have since lowered my weighting factor of past seasons to 0.5 (1.0, 0.5, 0.25, etc), which puts last year at approximate 50% of the sample.

Comparing season to season projections results in the following mean errors

Sample | Players | vOBA |

0 | 214 | 0.026 |

100 | 620 | 0.026 |

200 | 993 | 0.024 |

300 | 1188 | 0.022 |

400 | 1011 | 0.021 |

500 | 1166 | 0.020 |

600 | 1343 | 0.017 |

700 | 1524 | 0.016 |

800 | 1400 | 0.015 |

900 | 1163 | 0.014 |

1000 | 652 | 0.014 |

1100 | 367 | 0.014 |

1200 | 252 | 0.014 |

1300 | 108 | 0.012 |

1400 | 14 | 0.010 |

The year to year errors for the projections are only half that of comparing actual stats, but that is understandable because half of the projection is last yearâ€™s stats. The error curve starts flattening at 600 PAs, and after 900 there is virtually no reduction in error.

Presumably, the addition of data from previous years was to give us a more accurate estimate of a player’s “true talent” at this point in time. That is not the same question as what that player’s “true talent” will be at the end of next year, or what next year’s stats will be. The data can be further massaged by taking a player’s upward or downward trends the past few years combined with the average change for a player of his age to estimate where he will be a year from now.

So, to combine these two, I took all Oliver projections with a sample size of 900 or more PAs, and compared them to the following single season of 350 or more PAs. The first set of numbers are copied from the table above, showing the mean error of one single season to the next; the second set is the projections of size 900 or more compared to the next season.

Sample | Players | vOBA | Players | vOBA |

350 | 1044 | 0.041 | 68 | 0.042 |

400 | 1114 | 0.041 | 101 | 0.040 |

450 | 1122 | 0.039 | 117 | 0.036 |

500 | 1063 | 0.038 | 166 | 0.035 |

550 | 592 | 0.034 | 177 | 0.033 |

600 | 314 | 0.033 | 194 | 0.037 |

650 | 185 | 0.031 | 171 | 0.035 |

700 | 30 | 0.028 | 62 | 0.038 |

This seems like a problem – the mean errors of comparing Oliver to the following season were no better than comparing just the previous season to the next! Does this mean that all the work in developing the projection was for nothing? Fortunately, the answer is NO. What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement. That’s why Marcel the Monkey looks so smart – this test cannot show if any other system is better.

Besides, when using only major league statistics, the major tools available to build a projection are past season weighting, park factors, aging, and regression to the mean. I think you could expect any competent system to give virtually the same results. Most projections also incorporate minor league data, which necessitates having to calculate the difference in level of competition from each minor league to the majors, and that’s likely where more variance will be seen between systems.

Next time, comparing accuracies of Oliver, ZiPS, CHONE and PECOTA.