Community - FanGraphs Baseball - Comments on Comparing 2010 Hitter Forecasts Part 1: Which System is the Best?
RSS feed for comments on this post.

## Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: `<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> `

0.168 Powered by **WordPress**

Love the bias adjustment. Every analysis of forecasts needs it!

Comment by Sky — January 26, 2011 @ 3:29 pm

I’d really like to see the Bill James projections added in. Those are generally perceived as the most optimistic, which should show up in the bias, yeah?

Also dividing hitters and pitchers up may provide even more insights to the accuracy of the projection systems

Comment by Rui — January 26, 2011 @ 3:34 pm

@Rui: I’d love to put in Bill James’ projections, but I’d need to buy them first. My goal was to only use free and online projections. When I have time, I’d like to do the same as I’ve done here with pitchers.

Comment by Will Larson — January 26, 2011 @ 3:36 pm

@Sky: Yes!!! Also, forecast encompassing, which part 2 of this article will cover.

Comment by Will Larson — January 26, 2011 @ 3:37 pm

When you say you adjusted the bias, what do you mean by that?

Comment by Colin Wyers — January 26, 2011 @ 3:49 pm

@Colin: In my bias adjustment, I first calculate the average bias for each stat for a projection system. Then, I subtract this bias from each original statistic for each player, and re-calculate the error. For example, if hitter A was projected to have 30 home runs and actually had 27, the error would be +3. But, if the projection system had a bias of +4, then I’d first subtract the bias from the 30 home runs, leading to a “bias adjusted” error of -1.

Comment by Will Larson — January 26, 2011 @ 4:25 pm

So when you figure the average bias, do you prorate based on playing time?

Comment by Colin Wyers — January 26, 2011 @ 10:12 pm

@Colin: Nope. I literally take what the other systems use for their projections and calculate how much they were over/under the actual numbers on average. This number is the bias. In numbers, for N players, bias=1/N*sum_{n=1}^{N} {stat_projection_n – stat_actual_n}

Comment by Will Larson — January 26, 2011 @ 10:56 pm

Are bias’ consistent from year to year? Otherwise including them is useless from a predictive standpoint. However, if they are consistent, then why don’t the propagators of each system favor use it before the season starts?

Comment by Jeremy — February 7, 2011 @ 9:22 am

@Jeremy: Great question. I’m not sure if they’re consistent from year to year. That is definitely something that needs to be looked into. I wouldn’t say it’s useless to calculate it for one year though. If we see similar relative projections in other years, I think it’s safe to infer that the bias will be similar as well.

In terms of bias corrections: virtually all forecasts will end up being biased upwards at all times and it’s likely due in part to selection bias (players with forecasts are replaced by people who aren’t forecasted). Because players who aren’t forecasted usually don’t have starting gigs, this introduces a positive bias into the forecast errors. However, some are more biased than others, and it’s a great question as to why this happens. In theory, you’re right–you should be able to notice that your forecasts are consistently biased and should thus adjust your projections accordingly. I know Marcel is constructed in a manner that tries to adjust for this, but I’m not sure about the others. This is probably why Marcel has among the lowest biases of all the forecasting systems.

Comment by Will Larson — February 7, 2011 @ 12:30 pm

Really enjoyed this. Question: Did you compute RMSE based on the entire sample of projections that a system gave or did you limit the sample by some filter (like players w/ greater than 350 ABs)?

Comment by Zubin — February 9, 2011 @ 12:46 pm

@Zubin: Thanks! All of the stats are based on the sample of players that is forecasted by ALL of the forecasting systems, which is 266 players. There isn’t a separate AB cutoff.

Comment by Will Larson — February 9, 2011 @ 12:53 pm

Can you look at OBP and SLG errors? Not sure that counting stats are the best metric for evaluating forecast quality, even after correcting for bias.

Comment by evo34 — February 12, 2011 @ 11:59 pm

@evo34: I didn’t look at OBP or SLG (actually OPS is often the metric compared if you’re using just one); sorry! It would definitely be interesting to consider, especially since counting stats are basically the rate X plate appearances (e.g. HR = HR_per_plate_appearance X plate_appearances), so the forecast can go wrong on either part.

Comment by Will Larson — February 16, 2011 @ 8:08 am

Interesting comparison. Was wondering what you thought of the free forecasts at baseballguru.com

Comment by Craig Tomarkin — March 5, 2011 @ 11:35 am

It would be interesting to see how the average compares to the weighted average stats computed by AggPro. Cameron Snapp, TJ Highley and I developed a methodology to compute weights based on prior seasons that showed the computed weighted average applied to upcoming season projections was more accurate than any of the constituent projections. Full paper that was published in the SABR journal is here: http://www.cs.virginia.edu/~rjg7v/AggPro.pdf

Comment by Ross Gore — March 13, 2011 @ 2:39 pm

Will, the BPP website looks like a fantastic resource… definitely going to explore around some more this weekend!

I just wanted to point out something about the footnote you

linked toon the BPP website: the Prospectus Projections Project was co-authored by one “David Cameron”. Assuming that this is FG’s own, just contact him about any potential naming issues.Comment by MDL — March 2, 2012 @ 12:25 pm

Whoops, wrong article! Meant to post in the

2012 version:-PComment by MDL — March 2, 2012 @ 12:30 pm