## More Optimistic Forecasts

I looked at all pitcher forecasts with at least 8 fan votes. There are 329 pitchers, which is 11 per team.

The total wins-losses is 2410-2036. Seeing that there are 2430 wins and losses available, Fans pretty much nailed the wins column. But, there are many losses unaccounted for. The win% comes in at .542, which is 7 losses too few per 162 games. (This is a similar story as with the position players.)

The average ERA is 3.98, which is pretty optimistic compared to the 4.3 that is the norm. Total runs per 9 IP is 4.28, which is 10% too low.

Total IP is 39310, which works out to 146 9-inning games. IP estimates are actually low by 10%.

Total WAR is 570, which is a similar story to the position players: multiply by 75% in order to get the number to make sense.

Print This Post

I think there probably should be some wins, losses, and innings unaccounted for – minor league call-ups and such will account for some of these, no? Having the existing pitching pool account for about 90% of the IP sounds pretty spot on (although I haven’t attempted to verify this). Probably a few too many wins accounted for, but too few losses.

Agreed that 90% sounds just about right for pitchers (and 95% for nonpitchers).

If you take away 138 wins and count them as losses, that leaves you with 2410-138=2272 wins (93.5% of all wins) and 2036+138=2174 losses (89.5% of all losses), that sounds just about right.

The total WAR for the pitchers not in the forecast pool should come in at 0, just by definition/expectation. And, by definition/expectation, total WAR for pitchers should be around 430 or so, which is 140 too many forecasted.

This 140 is pretty much inline with the extra 138 wins in the W/L forecast.

There’s no way around it: fans are very optimistic.

Fans are fans of particular teams and have limited time. Thus many fans, like this one, project their favorite players, or just the players on their favorite teams, etc. Those projections are optimistic, and the optimism for a particular team corrects itself across the board as the sample size gets larger and appears to be widespread, unbiased optimism.

I project Jon Lester to go 29-4 with a 1.93 ERA and 310 Ks. Sue me.

I was thinking more like 33-1 with a 1.02 ERA and 400Ks.

While curing cancer.

I agree. Unless every single fan who has entered projections has entered projections for every single pitcher in the sample of 329, aggregating the projections is meaningless.

You’re right Joe. Jon will have a cure-for-cancer lightbulb moment during a mound session with V-Mart.

How are fans supposed to pick out the Chien Ming Wangs of 2010?

I think an interesting measure of fan optimism would be getting a run estimation per 9 innings of all the hitting fan projections, and compare to the fan-projected run average of all the pitchers.

Of course, first you’d need to account for the call-ups… you’d fill extra innings with replacement-level performance (unless there’s a significant number of good un-projected players, in which case you’d need above-replacement-level performance). For the extra PAs I’d find the number of extra PAs needed for each position and fill in replacement-level hitting at each (hitting at a level that if combined with an average fielder at that position would yield replacement-level performance).

You could also do this with other projection systems, of course. Or maybe projection systems do something like this internally to check sanity? I don’t know a lot about how projections are typically constructed, but I’ve thought before that if I ever went insane and tried to make one I’d include a sanity check like this.

Reminds me of the over/under lines on team Wins in Vegas before the season starts. The average is normally around 81.5 wins. I’m assuming it’s because the books get more action from people backing their favorite team instead of shorting another squad.

They should have wins over/under projections on Fangraphs.

On a human level I’m not really surprised. We are (usually) optimistic by nature, no?

Is there some place that defines exactly what these projections are meant to represent? Sure they don’t fit the seasonal norms, just wondering why they should.

I’m curious about how the aggregate fan forecasts compare to the aggregate Bill James/CHONE/marcel forecasts?

Oh and while we’re at it, I wonder how much less optimistic the “other fan” projections are compared to the “team fan” projections.

I’ll bet those projections end up being more accurate, in spite of lack of expertise/personal experience.

My thoughts exactly.

I’ve always thought of the Bill James numbers as very optimistic as well. Is that forcasting model internally accurate (I would assume it is)?

Although certainly more technically challenging on the back-end for FanGraphs, moving to a Baseball Prospectus-like system would probably dramatically remove a lot of this fan optimism.

What I mean is that rather than presenting each player individually, BP lists the players by team. Each team has the nine batting spots listed and each batting spot has the player’s projected playing percentage at the batting spot. For instance for next year it might list Matt Kemp in the 3-spot for 60% of the time and in the 7th spot for 35% of the time (no player can get over 95% time to account for injuries, off-days. etc). This has the advantage of allowing the fans to put players in two (or more) spots if we’re not sure of the batting order yet; for example I just heard the Kinsler might bat 5th/6th next season; it allow results in a system similar to double-entry accounting, i.e. all playing time is accounted for and you can’t give a team too much playing time since it is all restricted by playing time. Any unused playing time can be assumed to be taken by replacement-level payers.

Similarly, a team’s inning projections would be limited to ~1458 innings.

Second, the projections could be improved if they only asked for rates rather than counting stats. So rather than asking how many HRs a player will hit it should simply ask at what rate they will hit HRs, e.g. 6% (obviously Fangraphs should present the prior year’s stats in rate format to provide context). This way the projection system can automatically determine how many HRs a player will hit based on how many PAs he will get which is in turn based on the fan’s projection for batting order, playing time %, and team quality.

I would think that these two changes in combination (obviously to late to do this year) would make projections easier and ultimately more accurate.

I agree that the forecasting could stand to change a bit with regards to rates. If fans are asked to project rates, then the counting stats should be calculated and presented to them based on the estimated PA before the user submits their projections. On the other hand, if the rates are shown prior to the user actually submitting the projection, it may help avoid situations like Kenji Johjima’s projected .800 SLG.

As David noted, he didn’t put in an error check when people went back and set his playing time to zero, but still let him forecast HR.

That’ll be fixed at some point.

That said, I agree with you that the forecasts should be done on a rate basis.

The problem is that people are more used to forecasting counting numbers. And if you add the extra step (forecast rate stats, calculate the counting stats, let user fix his rate stats) it becomes a longer process, which means fewer forecasts.

It’s a tradeoff in process.

How about the fact that replacement level players are likely not apart of the fans projections? There’s no fun in projecting Garrett Olson’s 2010.

Yeah, lack of Garrett Olsens and his ilk probably mean there is an “all-star effect” happening. If Tangotiger’s analysis is not accounting for this, each individual projection could be pretty close to reasonable. Although if there are 11 pitchers per team, maybe people are submitting Garrett Olsen projections.

As I’ve been browsing the various projections, its the Bill James projections that seem quite optimistic to me. Anyone have info on his track record?

I think much of it has to do with predicting injuries. As I go through each player I project what I’d consider a reasonable season for each player. What I don’t do is think, “he’s probably going to get hurt, so I’ll take off 200 PAs.” I assume that players without known injuries are going to play a near full season. This isn’t what happens in real life though. People get injured and don’t produce to their expectations. However, I’m not going to try to make the numbers add up in total by discounting a reasonable season by X%.

On an individual player prediction basis I find myself agreeing with the fans’ projections more so than any of the other system. Looking at the top 10 WAR projection pitchers CHONE and Marcel project about 20 – 30 innings less per pitcher than the fans’ projections. In aggregate CHONE and Marcel are probably more accurate, but I’d rather be closer on 9 of 10 projections and horribly wrong on one than just chop off 10% of each players’ productivity.

I agree that for playing time, people are forecasting the mode (over/under), and so, the 10% chance of collapse simply is not factored in.

***

People ARE forecasting the callups. I didn’t check, but there are at least 30 players per team being forecasted. If I included all those as well, we’d be way over the limit, especially for the batters.

As with the position players, fangraphs is again wrong and fans are correct. A significant number of pitchers will get 0 or very few wins due to injuries, while their replacements will have losing records.

Should we divide these injuries out among all players making all projections inaccurate? Or should we assume no injuries and be more accurate on most pitchers, less accurate on injured pitchers?

Also, for this study to be worthwhile, they should do starters and relievers separate. RP W/L projections are meaningless predictions; lumping them in with starters delegitimizes everything.

Same question I posed a few weeks ago. I guess it all depends on what you are using the projections for. If you are using them to project team WAR, you want to put the injury factor in for all players and they will all be innacurate (low on playing time slightly). If you want to use them for a fantasy baseball draft, then you want to project playing time such that all the PA appear as a vegas over/under on PA – half above and half below. I prefer the Vegas o/u method for what I do – which is different than what Chone does to some extent.

vr, Xei

We aren’t doing too good at this, are we Tom?

Oh well.

Much of the analysis that is in the comments should have been in the original article.

I added a new blog post.