Community - FanGraphs Baseball


RSS feed for comments on this post.

  1. “3) A weighted average of the forecasts performs much better than any individual forecast.”

    Well, duh. We shouldn’t expect a model to ourperform all other models in all five categories, which is necessary for this statement to have any chance of being untrue. Otherwise, you could just weight the models that are best in each category at 1.00 and make the same assertion.

    What you really want to know, though, is not whether a blended model would have performed better after the season, but whether a blended model can consistently outperform the individual models, with the weights having been set prior to the season.

    Comment by Marver — January 27, 2011 @ 3:33 pm

  2. @Marver: Easy there! You’re exactly right. That’s why I’m working on a set of forecasts for 2011 that are in part based on the weights from 2010. Then we’ll see if the approach works or not.

    Comment by Will Larson — January 27, 2011 @ 3:40 pm

  3. Wouldn’t it be a good idea to run this sort of analysis back several years to see if there’s any consistent weighting of the different systems, or if its just noise?

    Comment by Everett — January 27, 2011 @ 4:13 pm

  4. @Everett: I haven’t had time to dig up all of the old forecasts, but that would be good to do. The problem is that the fangraphs fan projections, which is one of the top performers in my analysis, is only 1 year old. For older forecasts and forecast systems, check out Nate Silver’s work at

    Comment by Will Larson — January 27, 2011 @ 4:17 pm

  5. @Will…then just include fangraphs’ projections with extremely smaller weights for 2011, or build two models: one with and one without.

    Ultimately, the exercise you’re trying to do will prove very difficult due to year-to-year variations in ideal weights, plus the fact that many projection systems incur tweaks to the logic, coding, etc. that further distort its year-to-year weights.

    I ran this analysis last year on a few sources and came to the conclusion that the weights were unstable year-to-year, producing an edge that was negligible and ultimately not worth the time/assets that went into it. That’s not to say it isn’t a good article to write!

    Comment by Marver — January 27, 2011 @ 4:30 pm

  6. @Marver: You raise some good points. Weights will vary from year to year; that is certain. I also am sure that the methods of a particular forecast (ESPN and CBS especially) will vary from year to year. Those are shortcomings, for sure.

    However, there is quite a bit of formal study on forecast averaging and this is the general result:*

    1) forecast averages computed using previously optimal weights are better than
    2) forecast averages computed using a simple average of other forecasts, which are better than
    3) any single forecast

    Again, this is something that we should be investigating, and the next step is to get some forecasts based on this procedure that we can start to look at next year.

    *see Stock and Watson (2004), “Combination forecasts of output growth in a seven country data set,” Journal of Forecasting, 23, 405-430 and follow their citations if you want to look into this more.

    Comment by Will Larson — January 27, 2011 @ 4:51 pm

  7. I absolutely agree; they are better. The problem is that it is certainly more true in some fields than others, and baseball projecting is a relatively untested in comparison to other fields in which projection models are prevelant, like weather.

    I’ve done basically the exact same thing you’re about to replicate and while I found that the result is a better projection system than any of its constituent parts, the difference was small in terms of added applied value to fantasy baseball teams. The difference was especially small when comparing the time put into developing/grading the system to other studies that could have been completed in the same amount of time.

    Comment by Marver — January 27, 2011 @ 5:02 pm

  8. @Will I think your articles are very interesting. I don’t have a statistics background and I’ve wondered for years why people didn’t take the useful (unique?) data from the various projection systems to develop a weighted “super” system.

    Do you plan on looking at Pitchers as well? What about expanding the hitter categories, K’s, BB’s, XBH’s?)

    Comment by Jeremiah — January 27, 2011 @ 6:00 pm

  9. @Marver: Isn’t fangraphs awesome? We wouldn’t be having this conversation on any other site. Maybe we should be doing some work together.

    @Jeremiah: Thanks!! There are 2 things that I’d like to do now that these articles are out. First, I want to get some hitter forecasts on record for next year so I can see if this whole idea works in practice. Second, I’d like to do the same thing I’ve done for hitters to pitchers. I don’t have plans right now to expand the hitter categories, but that’s a pretty natural extension of what I’ve done here if someone else wants to take a look at it.

    Comment by Will Larson — January 27, 2011 @ 6:06 pm

  10. For the 2010 season, I forecasted averages computed using a simple average of forecasts (Zips, Marcel, Chone and ESPN) and it worked rather well. For the 2011 season I plan on adding a simple weighting to my forecasts.

    1). What do you feel is a good way to weight the various projections? I had initially thought of ranking the 6 projections, giving 6 to Marcel, 5 to FG Fans, 4 to Chone etc. Denominator would be the sum of 21, so Marcel would be weighted 6/21 and CBS 1/21. Is this too simple?

    2) Once I had created my projections, I want to do an ESPN-like-player rater calculationto give weights to marignal production in each roto categorey. I usually play in a points H2H league where such a calculation is easy. Do you have any experience performing such calculations? Any insight.

    3) Does anyone know of any sites where rotoheads can contribute or co-develop such projection resources?

    Comment by Brett — January 27, 2011 @ 7:08 pm

  11. @Brett: Your intuition was right in doing a simple average of each of those forecasting systems. That’s usually pretty tough to beat. You’re also on the right track trying to weight the forecasts by the ones that have historically performed the best and have the most useful information in them.

    My article here says that you should use different weights depending on the category. For example, when you want to forecast HRs, it’s best to do about 50% marcel and 50% fangraphs fans and ignore the other systems because they don’t add anything beyond those two. For SBs, it’s best to do 1/3 marcel, 1/3 fangraphs fans, and 1/3 ESPN.

    If you had a limited amount of time, what I would do is take the marcel projections and the fangraphs fan projections and do a simple average of the two. It’s tough to go wrong there.

    As for your questions about 2), I don’t. I’m sure you can find some stuff here or at I know they do some point share analysis over there.

    As for 3), yes! Fangraphs! Upload your projections in the fan projections page. If you posted a link to an excel file of your projections somewhere online, then this time next year I could see how well you did relative to the other systems. I plan on doing this as well. Frankly, this is the only way to figure out which methods of forecasting work and which ones don’t.

    Comment by Will Larson — January 27, 2011 @ 7:49 pm

  12. Will,

    I saw your weighting by source in the article above. I was under the impression that these weights were for 2010. Is there any reason to believe that any system is better at projecting a given category from year-to-year?

    Also, do you plan to do the same review for pitching forecasts?


    Comment by Brett — January 27, 2011 @ 9:54 pm

  13. @Brett: see ^^^^^^^:

    “However, there is quite a bit of formal study on forecast averaging and this is the general result:

    1) forecast averages computed using previously optimal weights are better than
    2) forecast averages computed using a simple average of other forecasts, which are better than
    3) any single forecast”

    I plan on doing the same thing for pitcher forecasts in the next couple of months.

    Comment by Will Larson — January 28, 2011 @ 11:31 am

  14. @Will – since CHONE is off the free market, how would you suggest a simple weighted average of these three systems: fangraph fans, ZIPS and Marcel?



    Comment by Jeremiah — January 28, 2011 @ 1:55 pm

  15. @marver: i have forecasted that you need more fiber in your diet.

    Comment by jaywrong — January 29, 2011 @ 7:36 am

  16. @jaywrong: Marver had some good points. It’s your comment that has no place here.

    @Marver: keep ‘em coming! Do you have any forecasts for 2011? I think we should compile forecasts somewhere and do a big comparison at the end of the year. Maybe like 20 or so, including the main ones, then a bunch of personal forecasts from different people trying different things (average, weighted average, subjective, etc).

    @fangraphs crew: Is there anyway to upload forecasts en masse as opposed to manually entering individual stats for individual players? Then, is there any way to get access to the individual forecasts done by the users here so we can see who did the best?

    Comment by Will Larson — January 29, 2011 @ 10:40 am

  17. @Jeremiah: It’s not as simple as just removing the weight from CHONE and splitting it all amongst the remainder because CHONE is potentially duplicating info in the encompassed projections. I’d have to re-specify and run my routines in order to CHONE-less weights. In the absence of this, I’d just do a 1/3, 1/3, and 1/3 average between Fans, ZIPS, and Marcel, or 50/50 Fans and Marcel.

    Comment by Will Larson — January 29, 2011 @ 10:43 am

  18. Will,

    How does this help you in any way shape or form?

    Pick any of the RMSFE numbers for runs as an example. Knowing that the projects are going to be plus or minus 20-some runs doesn’t help a whole lot does it? That means if a player is predicted to score 90 runs, he could score anywhere from less than 70 to more than 110 runs.

    Sure, now you’ve done the statistical analysis to know how accurate the projections are and you have basically shown that all of the projections have a big enough error that they really can’t be trusted. But we need something to base our draft picks on, do we not?

    I guess my question is, how do you use any of this information as an advantage come draft day?

    Comment by Brian — February 2, 2011 @ 1:48 pm

  19. @Brian: I’ve gone through periods where I ask the same question.

    What it comes down to though, is that any sort of projection, ranking, draft order, etc, is going to be uncertain. The real question is, despite this uncertainty, can we rank players based on their expected performance? The answer to this question is yes.

    While we may have difficulty getting any single particular player right, we can do much better, on average, by having a solid draft list constructed using solid projections. Any increase in the accuracy of our forecasts will make our draft lists better. If by bias correcting, weighting our forecasts, and averaging them, we can make a forecast that’s 5%-10% better, then I think that’s worth it, even if the individual forecasts are still pretty random.

    Comment by Will Larson — February 2, 2011 @ 2:11 pm

  20. @Will: I’m totally convinced, that sounds like a great idea to compile a bunch of forecasts and create your own to gain that advantage over everyone else because in the end, isn’t that what we are all looking for? A way to dominate our friends so we can boast about being the best.

    Are you going to have these projections somewhere to share so that the rest of us can see them, or are you just describing a way for us to do our own, more accurate projections?

    Comment by Brian — February 3, 2011 @ 1:22 pm

  21. @Brian: Both. I’m putting a website together where I’ll gather the main online projections and allow users to submit their own projections. Then, when the season is over, we can see what systems did the best.

    The beta is up at

    If you have any other ideas (or anyone else!!!) let me know what you’d like out of a site like that.

    Comment by Will Larson — February 3, 2011 @ 1:51 pm

  22. I found using ~2/3 Bill James and ~1/3 Marcel produced highest pearson, and lowest RMSE to actual results for HITTERS.

    I did this a little over a year ago with the 2009 and 2008 stats. I’m not really good enough with SQL to go back any further.

    The PECOTA was best for projecting pitchers if I do recall.

    Keep up the good work Will,


    Comment by Matt Goldfarb — February 14, 2011 @ 11:12 pm

  23. @Matt: I’m surprised you found that. I also ran Bill James’ numbers for hitters (and pitchers, for that matter) and I found it to be a pretty poor performer, and not adding any new information beyond the 6 freely available systems. What stats were you using to compare?

    Comment by Will Larson — February 15, 2011 @ 8:44 am

  24. Will, I’m just not sure you understand sample size, and in-sample data vs. out-of-sample data. You cannot try to find optimal weights using one year of data. There is way too much variance in baseball performance to think that the stat-specific weights you mention (below) are anything but noise. The only way to prove otherwise is to generate your optimized weights on a set of training data, and check the performance on a (completely different) set of test data.

    So to say you “should do” or “it’s best to do” various stat-specific system weightings based on this extremely limited study can only do more harm than good.

    “My article here says that you should use different weights depending on the category. For example, when you want to forecast HRs, it’s best to do about 50% marcel and 50% fangraphs fans and ignore the other systems because they don’t add anything beyond those two. For SBs, it’s best to do 1/3 marcel, 1/3 fangraphs fans, and 1/3 ESPN.”

    Comment by evo34 — February 19, 2011 @ 6:52 pm

  25. @evo34: Evo, I’m very clear about the limitations of my work. This article is looking at 2010 hitter forecasts and is thus a purely ex post analysis. Weighted averages using historic weights are useful in forecasting in other areas, so I’m presenting the hypothesis that this weighted forecast will be better than an average forecast. Your hypothesis that “there is way too much variance in baseball performance to think that the stat-specific weights you mention (below) are anything but noise” is testable as well. Before you make bold statements about my work ((such as “this extremely limited study can only do more harm than good”), please realize the limitations of your statements as well.

    I frankly don’t care whose hypothesis is correct or not–I just want to figure out how to make better baseball player forecasts. In the future, I hope that you approach my work in this spirit as opposed to the adversarial posture that you’ve chosen to take.

    Comment by Will Larson — February 21, 2011 @ 1:12 am

  26. Well done, Will.

    Obviously there will be year-to-year deviation, but since the factors underlying the mechanical predictions should remain consistent, a historical accumulation of projection data should be helpful. Even the fan projections should be consistent on some level…

    Comment by Joel — February 27, 2011 @ 11:07 am

  27. Will, if you’ve done any prediction work whatsoever (stocks, sports, weather, anything), you should know that you CANNOT optimize parameters of a system on the same data you are using to test said system, and expect it to be successful. This is Data Mining 101.

    What you have done is this article is described what has occurred in the past. By itself, that would be fine. Not very useful, but fine. But you take the reckless step of claiming that you have found the best system to use to predict the future: “My article here says that you should use different weights depending on the category.”

    That’s a completely inappropriate conclusion based on the (lack of ) analysis you have done.

    Comment by evo34 — June 12, 2011 @ 7:34 am

  28. Evo,

    Both of ours are testable hypothesis. We will see after this season. If my weighted forecast averaging approach is worthless, we will be able to clearly see it in the data!


    Comment by Will Larson — June 12, 2011 @ 9:48 am

  29. This is exactly the same mentality that led your original erroneous conclusions. You cannot test any model creation hypothesis on one season of data.

    It’s critical to understand this when you are in the business of prediction.

    Comment by evo34 — July 3, 2011 @ 7:22 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *

Close this window.

0.132 Powered by WordPress