Is It Bad to Have an Optimistic Forecast?
Just because you are optimistic overall doesn’t mean you are doing anything good or bad.
It may reduce the absolute error by guessing the median higher than the mean. For example, what’s the better guess:
Guess1 Guess2 Actual 490 575 700 490 575 675 490 575 650 490 575 625 490 575 600 490 575 550 490 575 500 490 575 400 490 575 200 490 575 0
Guess1 represents the actual average, while Guess2 represents the median.
Here is the absolute error for each pick:
Guess1 Guess2 Actual Error1 Error2 490 575 700 210 125 490 575 675 185 100 490 575 650 160 75 490 575 625 135 50 490 575 600 110 25 490 575 550 60 25 490 575 500 10 75 490 575 400 90 175 490 575 200 290 375 490 575 0 490 575
The average of Error1 is 174 and the average of Error2 is 160. So, guessing higher reduces the overall average error.
Guess1 totaled 4900 PA while Guess2 totaled 5750 PA. The actual PA was 4900. So, in this particular illustration, if it represents anything resembling reality, fans are justified in guessing PA 10% above what the group total would suggest.
Basically, fans are not only justified but are probably correct: they are not guessing on random events; they are not guessing randomness by distributing things evenly to everyone.
Indeed, in this particular illustration, the Fans could have set the forecast at anything between 550 and 600, and the average error would remain at the (minimum) of 160.
So, it is NOT a requirement that things actually add up at the league or team level. Indeed, being optimistic may very well be the right thing to do.
At the same time, you now have to be careful in trying to take things out of context. You can’t add up all the team’s forecasted WAR or forecasted HR and think that’s what is the best forecast at the team level. If we add up all the individual forecasted WAR, we’re going to get a total like 1500, when in reality, it’s going to come in at 1000. Just be careful in taking things out of context.

31


I’d work on making those charts a little more readable, but regardless nice premise. This is pretty much what was being discussed at THT not long ago
I have said this before, but I would prefer to see projections be efficient as possible in each of the following three categories.
1. Accurate rate stats. ie – HR%, 2B%, 3B% etc…
2. 50/50 using PA as a vegas over/under.
3. Total PA, IP and other counting stats matching what the empirical overall total should be.
A good system will do well in all three of these. My guess is that a system like Chone would fail #2 and be waited on the under. But it would probably do very well in #3.
vr, Xei
So should I not be counting on Joba Chamberlain to start 39 games for my fantasy team?
I think part of the problem might be people making unrealistic forecasts, even Hallady and Greinke at 37 GS is a joke and I’m a bit curious how these numbers came out as an average. Are there people not taking it seriously at all and project 100 GS for these guys? Are those projections being weeded out?
The GS are not technically part of the fan projections since IP is the actual projected stat. There are some problems with the code that extrapolates GS, especially when the projections are split between starter/reliever.
The other category that still needs to be fixed is Hits for pitchers, so while these are both a problem, they have nothing to do with how the fans projected anything.
At some point, there will be weeding. Right now, it’s jsut a straight average.
As for GS, I don’t think fans are asked for those, right? Just IP? I think David, for now, is just giving everyone 6 IP per GS, when in reality, it should be proportional to IP.
So, someone with 220 IP forecasted is probably a 7IP per GS pitcher, while someone with 160 IP is more like 5.5 or something.
It’s important to note that fans have limited patience/time. You can’t ask them to forecast too much, otherwise instead of forecasting 15 players, they might only forecast 10. Is it really that important to get the GS forecast at the cost of losing one player being forecasted (just as a fer instance).
I don’t know a whole lot about the specific techniques used in most projection systems, but my guess is that most are variations on multivariate mean least squares regression. If it is true that the distribution of true (unobserved) outcomes for players is skewed upwards (and thus leading to the median being higher than the mean), then why not use median (rather than mean) regression in projections? Has anyone tried this? Seems like a pretty simple tweak that could lead to more accurate projections…
Of course it all depends on what your conception of “correct” is, but for general and fantasy purposes I think that’s probably the way to go.
The mean (PA, for example) is really brought down by the blowouts and injuries. In fantasy you can just drop those guys — i.e., it doesn’t matter that your projection is way off. So using a mean squared error doesn’t make as much sense, because it’s penalizing you more for being way off. Then reducing the absolute error, with a median projection, is better.
This is great stuff. Love reading about this.
I don’t know that it would be more “accurate”. The forecasting systems should be treated as their own universe. So, as long as the ordinal rankings are where you want them, then does it really matter if the mean HR forecast is 20 instead of 17?
The total UZR forecasted is 200 runs (when it should obviously be 0). Do we really need to knock out 0.5 runs from each player’s forecast to get it to 0?
A very interesting and informative post. With all the hubbub on this site about the fan’s projections not adding up as far as total WAR or total PA (and therefore less valuable than other projections), it’s good to see an alternate take. I feel like the fan’s usually try to project the most likely scenario for any given player, as opposed to the weighted average of all possible scenarios (injury, prolonged slumps, etc). While a weighted average sounds very sabermetric, it really doesn’t provide a true “projection” in my mind. For instance, does anyone really think Albert Pujols is more likely than not to score less than 100 runs for just the second time in his career next year? Marcel and CHONE both predict a career low in runs scored for Albert, never mind the fact that he hasn’t had better hitters around him since 2004. CHONE predicts Albert’s lowest WAR since 2002 by a significant margin.
I feel like projections should always be of the most likely scenario for each individual player. If you think there’s a 10% chance a player gets injured and misses significant time, you probably shouldn’t project him missing much if any time, because a significant injury is not the most likely scenario. Hedging your bets by using a weighted average to project player performance (like CHONE and Marcel seem to do) will yield a sum of predictions that looks good overall, but on a player-by-player basis it will look like you consistently low-ball your predictions.
I agree! Projections should be like a Vegas over/under on playing time. If the Chone projections were used to set the line, Vegas would lose their shirt from people betting the over.
vr, Xei
For as much as we quantify things, has there been much work done to understand differences between projected and actual at bats?
I could see a model-based adjustor to fan projections.
age is an automatic contributor
expected team strength (more competitive teams will give fewer cups-o-coffee)
injury risk (1-10 scale for fan input?)
Take Carlos Lee vs Matt Kemp. In guess 1, both get 490. In guess 2, both get 575. However, Lee being older and on a team not expected to compete…What if he lost 15 at bats and Kemp gained 15. Then it is 590 vs 560, and I’d be willing to bet those would beat either guesses.