Comparing 2010 Hitter Forecasts Part 1: Which System is the Best?

There are a number of published baseball player forecasts that are freely available and online.  As Dave Allen notes in his article on Fangraphs Fan Projections, and what I find as well, is that some projections are definitely better than others.  Part 1 of this article examines the overall fit of each of six different player forecasts: Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans.  What I find is that the Marcel projections are the best based on average error, followed by the Zips and CHONE projections.  However, if we control for the over-optimism of each of these projection systems, each of the forecasts are virtually indistinguishable.

This second result is important in that it requires us to dig a little deeper to see how much each of these forecasts is actually helping to predict player performance.  This is addressed in Part 2 of this article.

The tool that is generally used to compare the average fit of a set of forecasts is Root Mean Squared Forecasting Error (RMSFE).  This measure is imperfect in that it doesn’t consider the relative value of an over-projection versus and under-projection; for example, in earlier rounds of a fantasy draft we may be drafting to limit risk while in later rounds we may be seeking risk.  That being said, RMSE is pretty easy to understand and is thus the standard for comparing average fit of a projection.

Table 1 shows the RMSFE of each of the projection systems in each of the main five fantasy categories for hitters.  Here, we see that each of the “mechanical” projection systems (Marcel, Zips, and CHONE) are the best compared to the three “human” projections.  The value is the standard deviation of the error of a particular forecast.  In other words, 2/3rds of the time, a player projected by Marcel to score 100 runs will score between 75 and 125 runs.

Table 1. Root Mean Squared Forecasting Error

Marcel 24.43 7.14 23.54 7.37 0.0381
Zips 25.59 7.47 26.23 7.63 0.0368
CHONE 25.35 7.35 24.12 7.26 0.0369
Fangraphs Fans 29.24 7.98 32.91 7.61 0.0396
ESPN 26.58 8.20 26.32 7.28 0.0397
CBS 27.43 8.36 27.79 7.55 0.0388

Another measure that is important is bias.  Bias occurs when a projection consistently over or under predicts.  Bias inflates the MSFE, so a simple bias correction may improve a forecast’s fit substantially.  In Table 2, we see that the human projection systems exhibit substantially more bias than the mechanical ones.

Table 2. Average Bias

Marcel 7.12 2.09 5.82 1.16 0.0155
Zips 11.24 2.55 11.62 0.73 0.0138
CHONE 10.75 2.67 9.14 0.61 0.0140
Fangraphs Fans 17.75 4.03 23.01 2.80 0.0203
ESPN 13.26 3.78 11.59 1.42 0.0173
CBS 15.09 4.08 14.17 2.05 0.0173

We can get a better picture about which forecasting system is best by correcting for bias in the individual forecasts. Table 3 presents the results of bias corrected RMSFEs. What we see here is a tightening in the results of the forecasts across each of the forecasting systems.  Here, we see that each forecasting system is about the same.

Table 3. Bias-corrected Root Mean Squared Forecasting Error

Marcel 23.36 6.83 22.81 7.28 0.0348
Zips 22.98 7.02 23.52 7.59 0.0341
CHONE 22.96 6.85 22.33 7.24 0.0341
Fangraphs Fans 23.24 6.88 23.53 7.08 0.0340
ESPN 23.03 7.27 23.62 7.14 0.0357
CBS 22.91 7.29 23.90 7.27 0.0347

So where does this leave us if each of these six forecasts are basically indistinguishable?  As it turns out, evaluating the performance of individual forecasts doesn’t tell the whole story.  It may be true that there is useful information in each of the different forecasting systems, so that an average or a weighted average of forecasts may prove to be a better predictor than any individual forecast. Part 2 of this article examines this in some detail. Stay tuned!

Print This Post

18 Responses to “Comparing 2010 Hitter Forecasts Part 1: Which System is the Best?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Sky says:

    Love the bias adjustment. Every analysis of forecasts needs it!

    Vote -1 Vote +1

  2. Rui says:

    I’d really like to see the Bill James projections added in. Those are generally perceived as the most optimistic, which should show up in the bias, yeah?

    Also dividing hitters and pitchers up may provide even more insights to the accuracy of the projection systems

    Vote -1 Vote +1

  3. Will Larson says:

    @Rui: I’d love to put in Bill James’ projections, but I’d need to buy them first. My goal was to only use free and online projections. When I have time, I’d like to do the same as I’ve done here with pitchers.

    Vote -1 Vote +1

  4. Will Larson says:

    @Sky: Yes!!! Also, forecast encompassing, which part 2 of this article will cover.

    Vote -1 Vote +1

  5. Colin Wyers says:

    When you say you adjusted the bias, what do you mean by that?

    Vote -1 Vote +1

  6. Will Larson says:

    @Colin: In my bias adjustment, I first calculate the average bias for each stat for a projection system. Then, I subtract this bias from each original statistic for each player, and re-calculate the error. For example, if hitter A was projected to have 30 home runs and actually had 27, the error would be +3. But, if the projection system had a bias of +4, then I’d first subtract the bias from the 30 home runs, leading to a “bias adjusted” error of -1.

    Vote -1 Vote +1

  7. Colin Wyers says:

    So when you figure the average bias, do you prorate based on playing time?

    Vote -1 Vote +1

  8. Will Larson says:

    @Colin: Nope. I literally take what the other systems use for their projections and calculate how much they were over/under the actual numbers on average. This number is the bias. In numbers, for N players, bias=1/N*sum_{n=1}^{N} {stat_projection_n – stat_actual_n}

    Vote -1 Vote +1

  9. Jeremy says:

    Are bias’ consistent from year to year? Otherwise including them is useless from a predictive standpoint. However, if they are consistent, then why don’t the propagators of each system favor use it before the season starts?

    Vote -1 Vote +1

  10. Will Larson says:

    @Jeremy: Great question. I’m not sure if they’re consistent from year to year. That is definitely something that needs to be looked into. I wouldn’t say it’s useless to calculate it for one year though. If we see similar relative projections in other years, I think it’s safe to infer that the bias will be similar as well.

    In terms of bias corrections: virtually all forecasts will end up being biased upwards at all times and it’s likely due in part to selection bias (players with forecasts are replaced by people who aren’t forecasted). Because players who aren’t forecasted usually don’t have starting gigs, this introduces a positive bias into the forecast errors. However, some are more biased than others, and it’s a great question as to why this happens. In theory, you’re right–you should be able to notice that your forecasts are consistently biased and should thus adjust your projections accordingly. I know Marcel is constructed in a manner that tries to adjust for this, but I’m not sure about the others. This is probably why Marcel has among the lowest biases of all the forecasting systems.

    Vote -1 Vote +1

  11. Zubin says:

    Really enjoyed this. Question: Did you compute RMSE based on the entire sample of projections that a system gave or did you limit the sample by some filter (like players w/ greater than 350 ABs)?

    Vote -1 Vote +1

  12. Will Larson says:

    @Zubin: Thanks! All of the stats are based on the sample of players that is forecasted by ALL of the forecasting systems, which is 266 players. There isn’t a separate AB cutoff.

    Vote -1 Vote +1

  13. evo34 says:

    Can you look at OBP and SLG errors? Not sure that counting stats are the best metric for evaluating forecast quality, even after correcting for bias.

    Vote -1 Vote +1

  14. Will Larson says:

    @evo34: I didn’t look at OBP or SLG (actually OPS is often the metric compared if you’re using just one); sorry! It would definitely be interesting to consider, especially since counting stats are basically the rate X plate appearances (e.g. HR = HR_per_plate_appearance X plate_appearances), so the forecast can go wrong on either part.

    Vote -1 Vote +1

  15. Interesting comparison. Was wondering what you thought of the free forecasts at

    Vote -1 Vote +1

  16. Ross Gore says:

    It would be interesting to see how the average compares to the weighted average stats computed by AggPro. Cameron Snapp, TJ Highley and I developed a methodology to compute weights based on prior seasons that showed the computed weighted average applied to upcoming season projections was more accurate than any of the constituent projections. Full paper that was published in the SABR journal is here:

    Vote -1 Vote +1

  17. MDL says:

    Will, the BPP website looks like a fantastic resource… definitely going to explore around some more this weekend!

    I just wanted to point out something about the footnote you linked to on the BPP website: the Prospectus Projections Project was co-authored by one “David Cameron”. Assuming that this is FG’s own, just contact him about any potential naming issues.

    Vote -1 Vote +1

  18. MDL says:

    Whoops, wrong article! Meant to post in the 2012 version :-P

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>