Man vs. computer

A few days before the start of the 2009 season, I wrote a column here titled, “29 players I think the THT projections got wrong.” The title is pretty self-explanatory, but let me quote the introduction to that column so that you know where I was going with it:

Each of the past three years, we’ve released projections for thousands of players, and each year, I have received tons of e-mails relating to specific players readers think we have over- or underrated. Frankly, I’m with the readers—our system is very good, but it is not perfect. Sometimes, I think I know more than it does, and today, I’ve decided to test that thought.

What follows, then, is a list of 15 hitters and 14 pitchers who I think will either over- or underperform their projections, with my reasoning explained. I formed this list without looking at other projection systems, since the idea here is to figure out if human intuition can beat a computer-based system, rather than trying to find areas where some other projection system outperforms THT. At the end of the season, I will check in to see if my hunches were correct, or if the computer knows best.

To be clear, I selected only hitters projected to have at least 500 major league plate appearances and pitchers projected to have at least 100 major league innings pitched; I wanted to avoid, as best as possible, players who won’t play much in the major leagues in 2009.

The comments I got on that column were mostly skeptical; Mitchel Lichtman, a former senior advisor to the Cardinals, for example, put it bluntly: “I am always skeptical of these, ‘I can beat a good forecast system just by looking at the forecasts,’” he wrote. Fair enough.

A commenter on Baseball Think Factory was of the same opinion: “I expect the outcome will be that DSG (those are my initials) can’t beat the computer.” Frankly, I felt the same way. Still, intuitively, the 29 projections I challenged looked wrong to me, and I figured it was worth it to find out if my gut actually could see something that a computer cannot.

Now that the season is over, we can answer that question, so let’s get to the results.

First up are the hitters. Let’s start with those I thought would do better than their projections. Those were Justin Upton, Alex Gordon, Delmon Young, Robinson Cano, Ichiro Suzuki, Evan Longoria and B.J. Upton. Right away, it’s easy to see that some of these hitters indeed beat expectations while others actually went the opposite way.

Overall, however, though we projected this group to have a .779 OPS (weighted by the number of plate appearances they had this season). In actuality, they posted an .819 OPS, which amounts to a 40-point difference! (Actually, 41 after rounding.) So far, so good—I thought these hitters would beat their projections, and in sum, they sure did.

So what about the hitters I thought would do worse than we projected? That list consisted of Chipper Jones, David Ortiz, Miguel Cabrera, Mike Napoli, Carlos Delgado, Jack Cust, Ryan Howard and Chris Davis. Again, we have a fun mix of guys, and overall the THT projections had them posting a combined OPS of .934 this season. Instead, they’ve posted an .843 OPS, or 91 points below expectation. That’s another big win for me—I’m two-for-two!

Let’s move on to the pitchers. I thought that Dan Haren, Clay Buchholz, Rich Harden, Mark Buehrle, Edinson Volquez, Zach Greinke and Francisco Liriano all would beat their projections. Perhaps Greinke’s name tips you off as to how I did with this group—overall, our projections had them posting a 4.33 ERA, but they blew that out of the water, combining for a 3.63 ERA instead. That’s a huge difference, and I have to say, my predictions are looking good thus far.

We still have one more group to look at, though, and that’s the pitchers I thought our projections overrated. They were Derek Lowe, Fausto Carmona, Jeremy Bonderman, CC Sabathia, Justin Duchscherer, Dana Eveland and Joe Blanton. Our projections thought these guys would be good for a 3.69 ERA this season; instead, they came in at 4.58, a whole 89 points below expectation! Clearly, I’m a genius.

Or am I? After I wrote my column, some suggested that my predictions were indeed going to be right, but that rather than being a feature of my extraordinary brilliance, it was merely an indication that the THT projection system was not very good. That’s a double whammy for me—not only does it call into question my intelligence, but I also designed the guts of the THT projections system. I think it’s fair to ask whether there is some bug in the design of our projections that allowed me to beat them.

To answer that question I looked at what another projection system said about the four groups of players we just examined. Essentially, since I did not consult any other projection systems when making my predictions, the other system can be used as an independent control: If my predictions turned out to be right simply because I was taking advantage of some defect in the THT system, another system would have the players correctly projected. If, on the other hand, my gut was able to see something a computer could not, any computer-based system would have been off for these players.

I turned to CHONE, which has been shown to be one of the best projection systems over the past few years. CHONE is also a completely computer-based system, making it ideal for this test. Due to the nature of statistics, CHONE should do a better job projecting these players than THT did no matter what; if I had chosen the 29 CHONE projections I hated most at the beginning of the season, the THT projections too would have been closer to the truth.

Let’s start with the hitters I thought would beat their projections. CHONE predicted they would post an .796 OPS, a number they bested by 23 points, OPS’ing .819. As for the hitters I pegged to underperform their projections, CHONE thought they would combine for an .898 OPS as group; they actually finished with an .843 OPS, which is 55 points worse.

MLB’s Diversity Fellowship Is a Step in the Right Direction
It is not a perfect program, but it certainly counts as progress.

The pitchers I thought would out-perform expectations got a 3.97 ERA projection from CHONE; they managed to beat that by .34, at 3.63 ERA. CHONE gave an overall projection of 3.75 to the pitchers I thought would underperform; instead, they had a collective 4.58 ERA, a whole 83 points worse.

To recap:

Hitters     THT     CHONE   Actual
Better      0.779   0.796   0.819
Worse       0.934   0.898   0.843

Pitchers     THT     CHONE   Actual
Better       4.33    3.97    3.63
Worse        3.69    3.75    4.58

Overall, the CHONE results confirm that my predictions were spot-on! Though the CHONE projections were closer to the ultimate truth than THT’s, they still were too low for the players I thought would beat their projections and too high for those I saw faltering. In other words, I do appear to be some sort of genius.

Well, not really. For one, I have no idea why I was able to beat two very good projection systems at their own game. My expectation was that a computer would be much better at assimilating a lot of statistical information into one final prediction than the human brain, and while I still do believe that to be the case, it does appear that we humans can see something computers do not.

Looking at the hitters I thought would beat their projections, I saw a lot of special skills, most of them young, but all very talented. Not all have capitalized on their abilities (*cough* Delmon Young), but overall, I think this is the kind of situation in which a scouting eye can tell you something that cold, hard numbers cannot (not that I have a scouting eye, but even I can see insane talent like the Upton brothers).

The hitters I thought our projections overrated were mostly some combination of old, fat and strikeout-prone. That’s never been a good combination, and though the statistics should bear that through, perhaps the computers aren’t quite as quick to catch on to when a player is going to falter due to those factors as a human can be.

The pitchers are a little more difficult to classify. The only thing that really jumps out at me is that I liked a lot of high-strikeout guys, while a lot of the pitchers I didn’t like are below-average at whiffing hitters. I think it’s very possible that pitchers with big arms often can break out in a way their past statistics would not predict, while those with low strikeout rates walk a very fine line between successful major leaguer and batting practice tosser. Perhaps we humans are a little better at seeing that line than computers.

But maybe not. Honestly, though I am fairly convinced that computer projections are not perfect, and that a baseball-crazed human being can pick out some numbers that just aren’t right, regardless of their statistical validity, I can’t say at this point that I know why. The human mind is a complex machine, more complex than any supercomputer yet built, and so it is not simple to decipher what processes exactly allow us to better a computer projection with our gut.

The important lesson here, however, is that human analysis does indeed have something add in understanding a player’s abilities and talents beyond what a computer projection will tell you. Computer projections are very good, and 99 percent of the time, they’re as good as or better than what we can do, but that other 1 percent—well, that’s where we analysts come in handy.


Print This Post
Sort by:   newest | oldest | most voted
Jeff
Guest
Jeff
Not to quibble david, but while you clearly beat the forcasting system using groups of players from an individual point it doesnt seem as if you hit on a signigficantly higher percentage. Just to look at the pitchers for example: Grienke was better Buccholtz was worse Haren was better volquez got hurt and exceptionally lucky (given his walk rate) until then. Buehrle put up the second highest FIP of his career, same for harden, and well liriano was just plain shock the lead out of your pencil bad. As for Dan Haren, he had a great season, but it seems… Read more »
Marty Winn
Guest
Marty Winn

A computer model is only as good as the information it is given.  You obviously have some gut feelings about how players are likely to perform.  That needs to make it’s way into a test (like you did) to see if these characteristics (fat, age, pitch velocity) have an effect on the player beyond what base stats show.  If you get a correlation put it in the projection model.  Of course some of this is unquantifiable and might make the equation unworkably complex.

David Gassko
Guest
David Gassko

Hey Jeff,

I would argue that all we care about is how I did with each group overall, but to be clear, a poster on another site did the math and says that 21 out of 28 picks were correct (not counting Duchsherer since he didn’t pitch this season), which is a very high percentage.

Jeff
Guest
Jeff

David, that is indeed a great rate. It’s nice to know not everyone is lazy.

ecp
Guest
ecp
@Jeff – That was my first thought as well…Why reassess in groups when the original assessments were done on the individual level?  Also seems to me that those who did not meet the minimum innings pitched (100) or plate appearances (500) requirements stated in the beginning should have been tossed as not counting.  David, I haven’t looked at the results person by person, but first glance tells me that you were correct about 50% of the time.  Plus a guy like Greinke really skews the whole group of those pitchers you expected to outperform.  He was SO much better than… Read more »
ecp
Guest
ecp

Just saw your comments on 21 of 28 being right, so my 50% off-the-top-of-my-head assessment is wrong.  But I still think a guys like Edinson Volquez, Jeremy Bonderman, Carlos Delgado, and Alex Gordon should be tossed because their lengthly injuries severely curtailed their playing time.  As should guys who spent a large amount of time in the minors for effectiveness reasons, such as Dana Eveland and Chris Davis.

Bruce
Guest
Bruce

Claiming victory over CHONE is validating a prediction you didn’t make. While it’s interesting that the actual performances were “more” than predicted by CHONE, in the aggregate groups, CHONE’s over/under was as correct as yours.

David Gassko
Guest
David Gassko
I don’t think I really get your comments, Bruce. CHONE was used as an independent control, to see if the problem was not computer generated projections but THT projections specifically. It wasn’t. And like I said in the article, it is a mathematical fact that the CHONE projections would be closer to the truth than THT’s (provided THT’s were off) because I picked the worst THT projections I could find (worst of course being my own subjective, though now confirmed, opinion). If I had gone through the CHONE projections and done the same, using the THT projections as an independent… Read more »
David Gassko
Guest
David Gassko

“As should guys who spent a large amount of time in the minors for effectiveness reasons, such as Dana Eveland and Chris Davis.”

How does that make any sense? Why shouldn’t I be rewarded for pinpointing guys that would underperform their projections, just because they underperformed so badly they were sent down to the minors?

ecp
Guest
ecp
“Why shouldn’t I be rewarded for pinpointing guys that would underperform their projections, just because they underperformed so badly they were sent down to the minors?”  Point taken.  I was thinking that, however, because at the beginning you said you wanted to pick out only those guys who would have at least 100 IP or 500 PA, so in my mind anybody who doesn’t meet that threshold doesn’t get included.  Especially when one of them is a guy like Bonderman when he spent most of the year in the minors for reasons other than ineffectiveness and pitched a grand total… Read more »
Bruce
Guest
Bruce
Using CHONE as a control is an effective approach, and I agree with the broader idea that human input (intuitive or otherwise) can help us learn about and improve projection systems. I have a strong reaction to at least a handful of projections each year, and I enjoyed reading about yours. (which are surely more likely to prove correct than mine) My earlier comment is a reaction to… “For one, I have no idea why I was able to beat two very good projection systems at their own game.” …since you didn’t. That’s a separate conclusion from the fact that… Read more »
Jonathan
Guest
Jonathan

What happens if instead of downweighting low PA players like Alex Gordon, we add in a league average or replacement level player for the rest of the PAs?

David
Guest
David
DG: whilst most of the analysis is a bit of puffery, you hit the nail on the head in the final paragraph: i.e., that there are instances where statistical projection just misses the boat, and that is where the human eye is necessary. You did not actually “beat” the system at its own game; you looked through projections which are based on imperfect information, identified the handful that did not have face validity, and then predicted that x would be better than their projections, and y would be worse.  If you wanted to assess your ability to beat the system,… Read more »
Bill P
Guest
Bill P
Great post.  I’m not really sure what goes into THT or CHONE, but I assume the predictions are based almost entirely on past performance.  We humans read all kinds of baseball stuff, though, not just numbers-based stuff, and absorb it all as “common wisdom”.  A lot of the common wisdom is basically scouting.  Writers quote scouts saying “this guy’s skills are way better than what we’ve seen so far”, while for others we hear that they’ve been overperforming their skills.  Although I trust numbers more than scouting, I still think scouts can add information that can’t be quantified.  When you… Read more »
Swami
Guest
Swami
Thanks for a very interesting study. The computer is an excellent processor of objective data, while you are an imprecise processor of both objective and subjective data.  In most cases, the computer probably comes out better, because of the precision of its processing (you can test this by projecting random players).  However, in a few cases, the influence of subjective factors (and errors in estimating impact of objective factors) will be larger – in those cases the computer projection will be significantly off. In some of those cases, your imprecise mental processor that takes subjective factors into account shows a… Read more »
Alex Zelvin
Guest
Alex Zelvin
What’ll be really interesting is if you can figure out what’s missing from the projection models that could make them more accurate. There’s something that I’ve never heard anyone discuss that could explain some of these.  Assume you’ve got a 25 year old pitcher and a 39 year old pitcher.  Both have put up very similar stats for the past five years…in fact so similar that once adjusted before the expected improvement (due to age) of the 25 year old and expected decline of the 39 year, we have the exact same projection for them this year.  Let’s say they… Read more »
wpDiscuz