## The Fans Versus The Algorithms

Here on FanGraphs, we host several different projection systems, most of which are algorithms that take a player’s performance history and then mix in things like regression and aging curves to develop a forecast for 2013 production. But, we have one set of projections that is very different from the rest – the Fans Projections.

Instead of being based on any kind of mathematical model, these are simply crowdsourced from our readers, with you guys creating the projections with your various opinions about player performances for next year. While there are certainly some imperfections with any kind of crowdsourcing project, the widsom of the crowds has also shown to do pretty well in situations like this, and over the years we’ve done the Fans Projections, we’ve seen that the system actually holds its own when stacked up against the algorithms, though it does require one manual adjustments in order to make the system work properly: deflation.

Put simply, you guys are just too darn optimistic — I guess that’s why you’re called fans — and annually overproject total WAR by something like 20%. So, if you look at the data from the Fans Projections next to something like ZIPS or Steamer, you’ll see some huge discrepancies, but a lot of those simply have to do with the scale, and once the Fans Projections are deflated to create a more accurate overall total, many of the variances go away.

Not all of them, though. There’s one clear type of player where the Fans and the algorithms disagree, and it’s probably a telling area, given what we know about the fine line between hope irrational exuberance. That type of player? Prospects.

To illustrate this point, I pulled the hitter projections from both the Fans and the Steamer system off of the site, dumped them side by side, and then added a column for player age. I deflated the Fans totals in order to put them on the same scale as Steamer, then sorted by the age column. I then broke the 326 players on the spreadsheet into five bour bins: 24 and under, 25-28, 29-33, and then 34 and up. This gives us approximately even sized groups (both in number of players and projected plate appearances) for both young and old, and then also in pre-prime and post-prime.

Here are the total WAR projections from both Steamer and the Fans for all four groups.

Group Number PA Fans WAR Steamer WAR Fans WAR/600 Steamer WAR/600
< =24 45 23,298 115 100 3.0 2.6
25-28 118 60,827 250 245 2.5 2.4
29-33 122 66,810 298 308 2.7 2.8
>=34 40 20,064 71 79 2.1 2.4

While Steamer projects each group to produce at roughly the same rate regardless of age — keep in mind, fans generally don’t project bench players and scrubs, so we’re only dealing with the ~11 best position players on each team — the fans believe that the best performing group is actually those with the least amount of Major League service time, and the worst players are the ones with the largest data samples to pull from.

During the prime ages of 25-33, the fans and Steamer don’t really disagree much at all, at least in the aggregate. Sure, there are some established players like Miguel Cabrera where the two sides differ, but the big gaps are almost all found among the very young. Here are the 16 players that the Fans project for at least +1 WAR more than Steamer, even after deflating the total projections to put them on the same scale:

Name PA Fans WAR Steamer WAR Diff Age
Bryce Harper 678 5.1 3.0 2.1 20
Jurickson Profar 476 2.9 0.9 2.0 20
Franklin Gutierrez 518 2.4 0.7 1.7 30
Dustin Ackley 693 3.7 2.2 1.5 25
Andrelton Simmons 608 3.9 2.4 1.5 23
Desmond Jennings 673 4.2 2.9 1.3 26
Michael Saunders 586 2.1 0.8 1.3 26
Manny Machado 579 3.1 1.9 1.2 20
Jean Segura 528 2.1 0.9 1.2 23
Yonder Alonso 638 2.1 0.9 1.2 26
Alex Gordon 696 5.1 4.0 1.1 29
Brett Lawrie 609 4.8 3.7 1.1 23
Jason Kipnis 681 3.8 2.7 1.1 26
Joey Votto 654 6.7 5.7 1.0 29
Billy Hamilton 344 1.6 0.6 1.0 22

The average age of those 16 players? 24.5 years old. Gutierrez is the only guy on the list not in his twenties, and then after Votto and Gordon, nobody is over 26. Harper, Profar, and Machado are the only three 20-year-olds that the Fans projected, and all three show up on the list of guys that the fans like far more than Steamer. In fact, of the 25 players in the projection that are listed as 23 or younger, the fans are higher on 20 of them, and the entire difference between the two systems in the < =24 crowd can actually be found in those 23-and-under players, as the two systems are in almost perfect agreement on total WAR for players headed into their age-24 season. The gap is most pronounced when talking about the game's elite prospects. On a per 600 plate appearances basis, the fans project Mike Zunino to be every bit as good as Albert Pujols. They see Manny Machado as the equal of Jose Reyes, Jurickson Profar able to match the performance of Jay Bruce, and Oscar Taveras is already as good as B.J. Upton. Steamer is not nearly as bullish on those four, grading them out as no better than average players, falling well short of the expected production of established stars.

Part of being a fan is dreaming about what could be in the future. It is much easier to dream about improvement from a talented young star-in-the-making than it is to dream about positive regression to the mean from an aging player coming off his worst season. Only one of those two things is exciting. But, as much fun as it might be to dream about how good Profar could be, it’s also useful to have a reality check like Steamer around to tell everyone to not get too carried away.

The same goes for players on the decline as well. A bad year from a player over age-34 is often taken as a sign of a marked loss of skills, and the fans expect that kind of age related decline to continue into the future. You look at the list of guys that Steamer likes more than the fans, and you find guys like Derek Jeter, Marco Scutaro, Lance Berkman, and Michael Young. While fans jump off the bandwagon when a player passes 35, Steamer still sees value in formerly good players who just aren’t quite as good as they used to be. And, again, I think Steamer is correct here.

Overall, I think the Fans Projections look very reasonable once you just take out the across-the-board optimism that inflates the overall total, but it’s also worth noting where the differences lie. It’s great to be excited about prospects, but the evidence suggests that prospect hype has probably gone a bit too far, and we should rein in our expectations of how even elite young talents are going to do in 2013.

Print This Post

Dave is the Managing Editor of FanGraphs.

### 39 Responses to “The Fans Versus The Algorithms”

You can follow any responses to this entry through the RSS 2.0 feed.
1. LuckyStrikes says:
FanGraphs Supporting Member

I’ve read in the past that Oliver does a solid job of projecting prospects/rookies. Wondering if Oliver would be the happy medium between Fans and Steamer…?

• Dave Cameron says:
FanGraphs Supporting Member

Oliver likes minor leaguers even more than the fans do. There is no projection system in the world that likes minor leaguers more than Oliver. I don’t put a lot of stock into the MLEs it produces.

• Steve 1 says:

OLIVER BURN

• evo34 says:

Dave, When will the 5-year Oliver projections be posted on the site?

2. geefee says:

Granted I’m a fan and not an algorithm, but the supposedly irrationally bullish fan projections on that table almost invariably make more sense.

• algorithm says:

[sternly] *bee-boop-beep*

#### +31

• Eric says:

Keep in mind the fan projections listed by Dave have already been deflated from the actual fan projections. For example:

Brett Lawrie
Fan Projection 5.7
Fan Projection (Deflated) 4.8
Steamer 3.7

3. Crumpled Stiltskin says:

What is missing is an examination as to which projections have proven right. Ie. I’m betting fans would have been closer on trout and Harper last year, wouldn’t they have? Or heyward or Stanton? Given that they are more optimistic.

• brendan says:

that’s cherry picking. plenty of top prospects never work out, or disappoint at first, e.g. montero, smoak, ackley, etc.

• It’s not cherry picking. It’s suggesting that at times the “overly optimistic projections” are not at all overly optimistic, and that there might be specific types of players fans are in fact better at projecting than analytical computer systems.

Montero, Ackley and Smoak just aren’t comparable to Stanton, Heyward, Trout and Harper. The latter two are older players (and that all three now play in a ball park that kills certain players swings-for one Beltre-makes them not ideal comparisons) having gone to college. Even Montero is slightly too old to fit the same age profile, but his hype relies mainly on power potential, the fact that he was originally a catcher (a position he can’t really play) and age relative to league and not on the numbers he was able to produce. As a career dh or defensively challenged 1b, I don’t think most people would think him as exciting a prospect.

Perhaps fans are better at predicting numbers for baseball prodigies because in these cases the computer programs are too conservative. Historically, positon players who make their major league debut prior to turning twenty are usually good to great in short order, especially when their minor league numbers back up their promotion. (And Giancarlo Stanton was 20.)

Or perhaps not. But it’s not cherry picking to suggest it as a possibility, especially when there’s no exploration of past projections and their viability as regarding individual player, which was really the main point of the statement.

• jesse says:

I guess this would be the question. If we could look at these numbers from the last few years, or just wait patiently for a year….

4. Delmon Youngs sprained left fat says:

At least fans know you’re supposed to suck worse as you get later into your 20’s.

• Chummy Z says:

Nice name

5. Argoyle says:

Do we know if these exuberant fan projections from years past were off the mark?

• Anon21 says:

He says right in the article that fans annually overpredict WAR (presumably just for the players they project) by 20%.

• Argoyle says:

I knew skimming would pay off

• futant462 says:

But after normalizing for that, the point stands, which is more accurate. Seems like an obvious ommision

• philosofool says:
FanGraphs Supporting Member

It would be a fun discovery to learn that twenty four and under players are on average half a win better than the next best age group, but it would also be pretty surprising.

• Argoyle says:

But the 25-28 group would include more mediocre and less talented players than the under 24 set, no?

• Anon21 says:

Less talented, yes; less productive, probably not.

• Argoyle says:

It’s at least an open question to my mind. The best of the 25-28 group will likely be better, but more of the under-24 group may congregate around 2-3 WAR, while more 25-28-yr old players are closer to replacement level.

Maybe not likely, but it’s possible.

• philosofool says:
FanGraphs Supporting Member

Remeber that talented 24 year o,ds graduate to being players in their prime, they survive and often improve while less talented players are selectd out of the pool.

• Anon21 says:

Possible, sure, but I think we’re on fairly firm ground thinking the relationship goes the other way. Young, unproductive, but toolsy players are more likely to receive playing time (particularly on noncontenders) than older nonproductive players, who are usually presumed to have reached whatever potential they have.

• Argoyle says:

It seems there are a fair amount of guys aged 26, 27 whose ceiling is around 2 WAR and are more likely to give you 0.5-1.5 WAR but are plugged into to fill holes in lineups and the back end of rotations, even on contending teams. These guys are “solid.”

So, you are not a fan anymore Dave?

• boss says:

#### +12

• smiley54663 says:

Dave’s a robot

• Steve 1 says:

Guess that’s what happens when you’re a Mariners fan that moves to North Carolina.

7. HAL 9000 says:

I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I’ve still got the greatest enthusiasm and confidence in the mission. And I want to help you.

8. Tom in Ohio says:

George W. Bush did 9-11

• Well-Beered Englishman says:

Barry Bonds did 756

I would assume that the sample of Fans projecting would probably be skewed towards projecting players on their favorite teams, as opposed to random samples, no? I certainly know of a certain subset of the general fanbase that believes that the Baltimore Orioles are favorites in the AL East this year, for example. Steve Johnson and Miguel Gonzalez don’t actually project to be good this year, right?

10. Ben says:

So if the fans overproject WAR by 20% every year, how close does Steamer get every year?

• Rudy Gamble says:

I reviewed 2012 Steamer, FanGraphs Fans, and a gaggle of other projections from a Fantasy perspective (http://razzball.com/fantasy-baseball-projections-review-2012/).

The nature of my test – correlating a projection source’s valuation of a fantasy team’s draft with their final standings points – should remove any systemic inflation by the Fans.

The 2nd test I did applies each sources’ rate stats against actual playing time to remove that variable (this is the first year that Steamer is publishing projections using FanGraph Fan projections). In that test, FanGraphs and Steamer tested equally for hitting while Steamer crushed it in pitching.

Based on my tests, I’d say that the fans are better at projecting hitting vs. pitching (even after accounting for the increased difficulty in projecting pitching).

11. wanderin says:

Great analyses. This is why I keep coming back to this site.

12. DBA says:

What am I missing here?:
“Sure, there are some established players like Miguel Cabrera where the two sides differ”.

The Steamer and Fans projections as listed look nearly identical!
Perhaps Dave’s comment reflects the fact that the fans haven’t inflated Cabrera’s value the way they have other players? IE, the Fans’ pre-deflation number is in line with projections while for most players it’s markedly higher?

13. James says:

Killjoy.

14. Paul says:

What is the impact of projecting defense in this comparison? I suspect it’s the case that defense is wildly over-projected moreso than offense. And this would change the analysis from “over-optimistic fans” to “we’re drawing conclusions based on limited data.” Fans are doing defensive projections for a 28 YO player based on an average 3+ years of UZR. Since it’s recommended to (more or less) average three seasons of UZR, wouldn’t we expect much larger error rates in defensive projections for young players?

When all you have to go on is a scouting report that says Eric Hosmer is a gold glove caliber 1B, then he posts a -8 UZR, seems a little unfair to just slap the “overly optimistic” tag on the fans.

15. kylemcg says:

Since we have had some discussion around here lately on the topic of error bars and probability distributions, I think maybe there’s a more nuanced approach to adjusting “fan exuberance”. Qualitatively, I feel that fans just tend towards a players “upside” in some fashion. If a player has less “upside” (i.e. a smaller spread of possible values to occupy on the WAR scale) they tend to be closer to the projection’s mean value. If a player (like a prospect) has more risk involved, they simply trend upward from the mean.

Maybe there’s a formula here for expected fan projection. Like projection_mean + (projection_standard_deviation * k), where k is some constant.

I don’t know how these projection systems work, but if you have access to a standard deviation that might allow you to make a better adjustment to the fans’ estimation.