As a FanGraphs reader, you are no doubt familiar with the many projection systems we display statistical lines from on the player pages of the site. These are fantastic models and I have mentioned many times how big a fan of the Steamer pitcher forecasts I am. Unfortunately, it is thought that these systems only have around a 70% accuracy rate, which may or may not seem high to you. The belief is that any additional factors added to the computer models will barely move the needle, so hope for an 80%+ success rate should probably be given up. That said, there are many reasons why these models cannot become better at projecting players, and it doesn’t have anything to do with the fact that we are dealing with human beings and no matter how much data we have, sometimes that .001% event happens. Projection systems have weaknesses and they are pretty easy to identify. Let’s discuss them, shall we?
One of the biggest headaches in fantasy baseball is dealing with injuries. Forget about handling them on your team during the season. How about trying to project a player who was injured the previous season? Who could possibly know how that injured player is going to recover? Although I think that Matt Kemp will be hampered by the off-season shoulder surgery he underwent and his power might be down this year, no one knows. Of course, that just means we’re on the same playing field as the projections. However, the projection systems have no clue he even had the surgery to begin with and have no reason to suspect his power may be down. So they have absolutely no shot at being correct about his 2013 power production, except by accident.
Check out the playing time estimates for Jacoby Ellsbury. Of the non-Bill James systems, the highest projected AB total is just 541. This for a player who will be leading off and when healthy has garnered more than 600 at-bats. In both 2010 and 2012, his seasons were decimated by freak injuries. Is he more prone to freak injuries? Of course not. The very definition of a freak injury is that it is completely random and unpredictable with a low chance of occurrence. But all the systems see is 78 at-bats in 2010, followed by 660 in 2011 and just 303 at-bats last season. They assume this is an injury prone player and knock down his playing time estimate. And how about last year’s power? The shoulder injury likely affected him, though we obviously cannot be sure. All the systems do expect a healthy rebound as they factor in his 2011 breakout, but maybe it would have been an even larger rebound if they knew about last year’s shoulder injury.
And then you have the hitters you know were playing through injury last year. In Jeff Zimmerman’s amazingly insightful MASH Reports, he actually lists a bunch of hitters who played through injury last year and as a result, expects them to exceed their power projections. All it would take is some quick manual intervention to adjust these players’ projections and voilà, that accuracy rate rises.
Dealing with injuries is something a person could do a much better job of than blind computer models with no injury data to work with. Simply adding a manual injury layer could improve the accuracy rate of the systems measurably I believe.
Hitter Plate Changes
This lumps together changes in batting stance, mechanics, swing etc. Anything the hitter alters in the batter’s box will be under this umbrella. Remember when Jose Bautista went from mediocre Pirates outfielder to JOSE BAUTISTA? Of course you do. You also might recall all the articles that were written during his big breakout year with the Blue Jays about how he changed his mechanics at the plate. While it was impossible to know those changes would have the effect they did, they sure provide a darn good explanation for his sudden surge.
Did the projection systems read these same articles and think “hmmm, maybe Bautista’s breakout actually was for real, this time we won’t project a massive regression like in the typical post-career year type cases”. Of course not, computers aren’t people, they don’t read. But we do. We’re able to learn about these changes and have the ability to determine if the change was just random correlation or causation. If just correlation, the projection systems will luck into being right, since the hitter is more likely to revert back to career levels. If causation, like in the case of Bautista, they won’t be.
These include changes to a pitcher’s repertoire, pitch mix and velocity. If a pitcher learns a sinker or a cutter, those pitches have the potential to dramatically affect his results. A sinker could lead to an increased ground ball rate, while a cutter could turn a pitcher into the starting version of Mariano Rivera. A couple of weeks ago, I posted a Pod Projection for Jeff Samardzija. In the comments, many noted that he was experimenting with a curve ball in June, which coincided with his worst monthly performance (10.41 ERA! 15 walks in 23.1 innings!). Can we be absolutely positive that experimentation is what caused the poor results? Of course not. But it sure sounds like a legitimate explanation. Unfortunately, the projection systems haven’t a clue as to what happened and treat his entire season results as one data point.
Last season, Max Scherzer’s fastball velocity jumped 1.1 miles per hour, which ranked as the third highest spike among all starting pitchers with at least 100 innings pitched in both 2011 and 2012. Sure enough, his strikeout rate catapulted, as did his SwStk%. As usual, correlation does not imply causation, but we know this velocity increase data. The projection systems do not (well, most of them don’t, Steamer does). That might explain why the Fans’ K/9 projection is significantly higher than the others, or maybe it’s simply because they are just optimistic for everyone. Oddly, the one system that does take velocity into account, Steamer, is projecting the lowest K/9.
One last example, and the inspiration behind this post, is R.A. Dickey. He’s a guy that simply breaks the projection systems and if they had hands, would cause them to throw them up, while shaking their heads (if they had one of those as well). After just once posting a strikeout rate above 5.9, Dickey’s K/9 surged to 8.9 last year. Because that’s what they do, the projection systems are forecasting major regression, all settling in between the 6.0 and 7.0 mark. Yet again though, the Fans are by far the most optimistic with a 7.8 projection. Might they know more than the systems? You bet they do. Do the systems know Dickey is a knuckleballer? How about one that throws his knuckler harder than knucklers typically have in the past? While his knuckleball velocity was almost identical in 2010 and 2011, it jumped a mile per hour in 2012. While it might be a stretch to believe that just an extra mile per hour was the only factor that led to such a strikeout rate spike, it had to have helped.
So those are all the various ways that I can think of at the moment that we, as human beings, have an advantage over the projection systems. I am sure there are more and I am confident you will be sharing them with us in the comments. When a computer projection system is developed that factors in all these above variables into their models, well, I won’t be sharing that system with anyone because I will be using it to beat everyone in fantasy baseball!