How Good Is That Projection?
FanGraphs is now presenting five different player projections systems: Bill James, ZiPS (Dan Szymborski), CHONE (Sean Smith), Marcel (Tom Tango) and Oliver (Brian Cartwright). The natural question from you the readers is “Just how good is this projection?”
First, we need to understand the importance of sample size. Season statistics are just a sample of a player’s true talent. You might catch a player during a hot or cold streak, and without enough data, be misled into forming an incorrect perception of the player. On one hand, the more data, the better. On the other, we’re trying to capture a moving target. By the time we get a sufficient sample, the player’s true ability might well have changed.
My first test for a sufficient sample size was to compare wOBAs in all consecutive single seasons of unadjusted major league batting stats in my projections database, weighted by the smaller of the two plate appearances. I could presumably get more accurate results by normalizing with park factors, but I did not want to bias these results by using any of my proprietary formulas.
| Sample | Players | vOBA |
| 0 | 94 | 0.150 |
| 50 | 105 | 0.117 |
| 100 | 186 | 0.072 |
| 150 | 331 | 0.063 |
| 200 | 563 | 0.058 |
| 250 | 704 | 0.053 |
| 300 | 519 | 0.047 |
| 350 | 1044 | 0.041 |
| 400 | 1114 | 0.041 |
| 450 | 1122 | 0.039 |
| 500 | 1063 | 0.038 |
| 550 | 592 | 0.034 |
| 600 | 314 | 0.033 |
| 650 | 185 | 0.031 |
| 700 | 30 | 0.028 |

350 is the minimum necessary for a decent wOBA calculation, with a mean error of .041. This is where the graph begins to flatten, with the reduction in variance between 300 and 350 PAs being almost the same as between 350 and 550. 600 is preferable, even as the error continues to drop at a higher number of PAs. This means that even with 600 plate appearances in both years, any player has a 1 in 3 chance of having a wOBA the next year at least 33 points higher or lower than this year.
Only a few players manage to get 600 PAs in a single season, and only rarely get to 700. In order to get a better sample of the batter’s true talent, more than one season is necessary. The more seasons that are used, the further back into time we reach, increasing the probability that the number we are trying to measure has changed since then. One method to minimize this effect is to give a diminishing weight to past seasons. Tom Tango’s Marcel uses three seasons, weight 5-4-3. Dividing each by 5, so that the most recent year is weighted as 1.0, gives 1.0, 0.8, 0.6. If all three seasons have an equal number of PAs, then last year is 42% of the total, year 2 33%, and year 3 25%. When developing Oliver, I ran tests which showed that when using an unlimited number of years, a factor of 0.7 (1.0, 0.7, 0.49, 0.31, 0.22, etc) minimized the error of each year’s projection compared to the next season’s actual stats. However, with many years last years stats can be as little as 30% of the total sample. I did not feel that this allowed Oliver to be responsive enough to meaningful changes in a player’s yearly stats, and have since lowered my weighting factor of past seasons to 0.5 (1.0, 0.5, 0.25, etc), which puts last year at approximate 50% of the sample.
Comparing season to season projections results in the following mean errors
| Sample | Players | vOBA |
| 0 | 214 | 0.026 |
| 100 | 620 | 0.026 |
| 200 | 993 | 0.024 |
| 300 | 1188 | 0.022 |
| 400 | 1011 | 0.021 |
| 500 | 1166 | 0.020 |
| 600 | 1343 | 0.017 |
| 700 | 1524 | 0.016 |
| 800 | 1400 | 0.015 |
| 900 | 1163 | 0.014 |
| 1000 | 652 | 0.014 |
| 1100 | 367 | 0.014 |
| 1200 | 252 | 0.014 |
| 1300 | 108 | 0.012 |
| 1400 | 14 | 0.010 |

The year to year errors for the projections are only half that of comparing actual stats, but that is understandable because half of the projection is last year’s stats. The error curve starts flattening at 600 PAs, and after 900 there is virtually no reduction in error.
Presumably, the addition of data from previous years was to give us a more accurate estimate of a player’s “true talent” at this point in time. That is not the same question as what that player’s “true talent” will be at the end of next year, or what next year’s stats will be. The data can be further massaged by taking a player’s upward or downward trends the past few years combined with the average change for a player of his age to estimate where he will be a year from now.
So, to combine these two, I took all Oliver projections with a sample size of 900 or more PAs, and compared them to the following single season of 350 or more PAs. The first set of numbers are copied from the table above, showing the mean error of one single season to the next; the second set is the projections of size 900 or more compared to the next season.
| Sample | Players | vOBA | Players | vOBA |
| 350 | 1044 | 0.041 | 68 | 0.042 |
| 400 | 1114 | 0.041 | 101 | 0.040 |
| 450 | 1122 | 0.039 | 117 | 0.036 |
| 500 | 1063 | 0.038 | 166 | 0.035 |
| 550 | 592 | 0.034 | 177 | 0.033 |
| 600 | 314 | 0.033 | 194 | 0.037 |
| 650 | 185 | 0.031 | 171 | 0.035 |
| 700 | 30 | 0.028 | 62 | 0.038 |
This seems like a problem – the mean errors of comparing Oliver to the following season were no better than comparing just the previous season to the next! Does this mean that all the work in developing the projection was for nothing? Fortunately, the answer is NO. What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement. That’s why Marcel the Monkey looks so smart – this test cannot show if any other system is better.
Besides, when using only major league statistics, the major tools available to build a projection are past season weighting, park factors, aging, and regression to the mean. I think you could expect any competent system to give virtually the same results. Most projections also incorporate minor league data, which necessitates having to calculate the difference in level of competition from each minor league to the majors, and that’s likely where more variance will be seen between systems.
Next time, comparing accuracies of Oliver, ZiPS, CHONE and PECOTA.
Cool. Some good stuff in here that I haven’t read/seen previously. That factor of .7 thing is pretty interesting too.
When will we get park factored wOBA (the one used on this site that includes baserunning etc) on the projection page? This information could be useful.
Reminds me of one of the points I need to lead the comparative article with – Oliver’s wOBAs are park adjusted, representing the player in a neutral park. Each batting component (SI, DO, TR, HR, BB, SO) is normalized by ballpark, league and age, then reassembled into a batting line, then finally calculate the new rate stats, including wOBA. To the best of my knowledge, ZiPS and CHONE are adjusted to the player’s home park at the time the projection was made. Marcel does not park adjust. Oliver will have lower wOBAs than ZiPS or CHONE for players in Colorado, Milwaukee, Philadelphia, Cincinnatti, etc, but I am also working on a new formula for HR factors which will not dock the premier power hitters as much, but the medium and lower guys more so, when they plau in a small park.
What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement.
This statement seems ridiculous to me. If you have x – y = z where x is year 1 performance and y is year 2 performance and you know the error range for a single years performances is +-10 then the error range for z is going to be +-20. If you instead use multiple years, and adjust for aging to try and more closely have x represent true talent and you are successful so that your error range of x is decreased to +-5, then the error range of z is going to be +-15. The actual mean error will depend on the distribution curves for each equation, but you should end up with a mean error of the 2nd example being about 75% of the first. That you did not is not a result of “the increased accuracy being masked by the noise inherent in a single season measurement”. It is a result a result of the particular manipulations that you performed to give you your projection having failed to provide you with any better predictor than the previous year’s stats.
What I did not explicitly state was that in the first two charts, when comparing two rates, the sample size is the lower of the two. So it was 350 or more PA in year 1 compared to 325-375 PA in year 2, then 400 or more PA in year 1 compared to 375-425 PA in year 2, etc. This is standard procedure to express the results by the lower of the two sample sizes. General matched pairs methodology does it the same way, the counting stats of the larger sample are scaled down to the size of the smaller sample, or to the harmonic mean of the two sample sizes, which will not be too much larger than the smaller. This is because the level of unreliability of the smaller of the two samples will determine the outcome. I was saving comparing projections to a later article, but I could have shown that ZiPS and PECOTA gave just about the same results in chart 3. For the various subgroup tests that I have run, the rms error is always just about .030 wOBA, regardless of which of the three projections. Therefor, I’ve concentrated on total error, looking for any biases.
Brian – Perhaps I have misunderstood what are you are doing. Could you explain again what exactly you are comparing in chart 2 and how it differs from chart 3?
Are Oliver projections available for previous seasons? If so, where can I find these?
Oliver projections for previous seasons have not been posted on the Internet, but I can run them for all the players currently in my database, generally to 1998. However, the further back, there are a decreasing percentage of the players represented. Soon (weeks) I will have a complete set of minor league stats from 2005-2008 (GameDay data), and will add that to complete major league records 1998-2008, with scattered records for both before those dates.
I will be glad to send you an email with projections for previous seasons.