FanGraphs Logo

Is It Bad to Have an Optimistic Forecast?

Just because you are optimistic overall doesn’t mean you are doing anything good or bad.

It may reduce the absolute error by guessing the median higher than the mean. For example, what’s the better guess:

Guess1	Guess2	Actual
490	575	700
490	575	675
490	575	650
490	575	625
490	575	600
490	575	550
490	575	500
490	575	400
490	575	200
490	575	0

Guess1 represents the actual average, while Guess2 represents the median.

Here is the absolute error for each pick:

Guess1	Guess2	Actual	Error1	Error2
490	575	700	210	125
490	575	675	185	100
490	575	650	160	75
490	575	625	135	50
490	575	600	110	25
490	575	550	60	25
490	575	500	10	75
490	575	400	90	175
490	575	200	290	375
490	575	0	490	575

The average of Error1 is 174 and the average of Error2 is 160. So, guessing higher reduces the overall average error.

Guess1 totaled 4900 PA while Guess2 totaled 5750 PA. The actual PA was 4900. So, in this particular illustration, if it represents anything resembling reality, fans are justified in guessing PA 10% above what the group total would suggest.

Basically, fans are not only justified but are probably correct: they are not guessing on random events; they are not guessing randomness by distributing things evenly to everyone.

Indeed, in this particular illustration, the Fans could have set the forecast at anything between 550 and 600, and the average error would remain at the (minimum) of 160.

So, it is NOT a requirement that things actually add up at the league or team level. Indeed, being optimistic may very well be the right thing to do.

At the same time, you now have to be careful in trying to take things out of context. You can’t add up all the team’s forecasted WAR or forecasted HR and think that’s what is the best forecast at the team level. If we add up all the individual forecasted WAR, we’re going to get a total like 1500, when in reality, it’s going to come in at 1000. Just be careful in taking things out of context.



Print This Post

11 Responses to “Is It Bad to Have an Optimistic Forecast?”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. Bluebird says:

    I’d work on making those charts a little more readable, but regardless nice premise. This is pretty much what was being discussed at THT not long ago

    Vote -1 Vote +1

  2. Xeifrank says:

    I have said this before, but I would prefer to see projections be efficient as possible in each of the following three categories.

    1. Accurate rate stats. ie – HR%, 2B%, 3B% etc…
    2. 50/50 using PA as a vegas over/under.
    3. Total PA, IP and other counting stats matching what the empirical overall total should be.

    A good system will do well in all three of these. My guess is that a system like Chone would fail #2 and be waited on the under. But it would probably do very well in #3.

    vr, Xei

    Vote -1 Vote +1

  3. Charlie says:

    So should I not be counting on Joba Chamberlain to start 39 games for my fantasy team?

    I think part of the problem might be people making unrealistic forecasts, even Hallady and Greinke at 37 GS is a joke and I’m a bit curious how these numbers came out as an average. Are there people not taking it seriously at all and project 100 GS for these guys? Are those projections being weeded out?

    Vote -1 Vote +1

    • The GS are not technically part of the fan projections since IP is the actual projected stat. There are some problems with the code that extrapolates GS, especially when the projections are split between starter/reliever.

      The other category that still needs to be fixed is Hits for pitchers, so while these are both a problem, they have nothing to do with how the fans projected anything.

      Vote -1 Vote +1

  4. tangotiger says:

    At some point, there will be weeding. Right now, it’s jsut a straight average.

    As for GS, I don’t think fans are asked for those, right? Just IP? I think David, for now, is just giving everyone 6 IP per GS, when in reality, it should be proportional to IP.

    So, someone with 220 IP forecasted is probably a 7IP per GS pitcher, while someone with 160 IP is more like 5.5 or something.

    It’s important to note that fans have limited patience/time. You can’t ask them to forecast too much, otherwise instead of forecasting 15 players, they might only forecast 10. Is it really that important to get the GS forecast at the cost of losing one player being forecasted (just as a fer instance).

    Vote -1 Vote +1

  5. notdissertating says:

    I don’t know a whole lot about the specific techniques used in most projection systems, but my guess is that most are variations on multivariate mean least squares regression. If it is true that the distribution of true (unobserved) outcomes for players is skewed upwards (and thus leading to the median being higher than the mean), then why not use median (rather than mean) regression in projections? Has anyone tried this? Seems like a pretty simple tweak that could lead to more accurate projections…

    Vote -1 Vote +1

    • mcrawford620 says:

      Of course it all depends on what your conception of “correct” is, but for general and fantasy purposes I think that’s probably the way to go.

      The mean (PA, for example) is really brought down by the blowouts and injuries. In fantasy you can just drop those guys — i.e., it doesn’t matter that your projection is way off. So using a mean squared error doesn’t make as much sense, because it’s penalizing you more for being way off. Then reducing the absolute error, with a median projection, is better.

      This is great stuff. Love reading about this.

      Vote -1 Vote +1

  6. tangotiger says:

    I don’t know that it would be more “accurate”. The forecasting systems should be treated as their own universe. So, as long as the ordinal rankings are where you want them, then does it really matter if the mean HR forecast is 20 instead of 17?

    The total UZR forecasted is 200 runs (when it should obviously be 0). Do we really need to knock out 0.5 runs from each player’s forecast to get it to 0?

    Vote -1 Vote +1

  7. Jon says:

    A very interesting and informative post. With all the hubbub on this site about the fan’s projections not adding up as far as total WAR or total PA (and therefore less valuable than other projections), it’s good to see an alternate take. I feel like the fan’s usually try to project the most likely scenario for any given player, as opposed to the weighted average of all possible scenarios (injury, prolonged slumps, etc). While a weighted average sounds very sabermetric, it really doesn’t provide a true “projection” in my mind. For instance, does anyone really think Albert Pujols is more likely than not to score less than 100 runs for just the second time in his career next year? Marcel and CHONE both predict a career low in runs scored for Albert, never mind the fact that he hasn’t had better hitters around him since 2004. CHONE predicts Albert’s lowest WAR since 2002 by a significant margin.

    I feel like projections should always be of the most likely scenario for each individual player. If you think there’s a 10% chance a player gets injured and misses significant time, you probably shouldn’t project him missing much if any time, because a significant injury is not the most likely scenario. Hedging your bets by using a weighted average to project player performance (like CHONE and Marcel seem to do) will yield a sum of predictions that looks good overall, but on a player-by-player basis it will look like you consistently low-ball your predictions.

    Vote -1 Vote +1

    • Xeifrank says:

      I agree! Projections should be like a Vegas over/under on playing time. If the Chone projections were used to set the line, Vegas would lose their shirt from people betting the over.
      vr, Xei

      Vote -1 Vote +1

  8. Jimbo says:

    For as much as we quantify things, has there been much work done to understand differences between projected and actual at bats?

    I could see a model-based adjustor to fan projections.
    age is an automatic contributor
    expected team strength (more competitive teams will give fewer cups-o-coffee)
    injury risk (1-10 scale for fan input?)

    Take Carlos Lee vs Matt Kemp. In guess 1, both get 490. In guess 2, both get 575. However, Lee being older and on a team not expected to compete…What if he lost 15 at bats and Kemp gained 15. Then it is 590 vs 560, and I’d be willing to bet those would beat either guesses.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Player Linker - Contact Us - Advertise - Terms of Service - Privacy Policy