ERA estimators estimate how well a pitcher pitched in the present, and pitcher projections estimate how well a pitcher is expected to pitch in the future. Naturally, we’d expect projections to more-accurately predict pitchers’ future performances, since that’s what they’re designed to do. But it appears that ERA estimators *can* figure future performance quite well — and SIERA, in particular, has actually done a better job projecting pitcher performance that than traditional projections.

Projecting pitchers is harder than projecting hitters. Not only do pitchers’ skill levels change often, but simply estimating a pitcher’s skill at any given time is challenging. Differentiating their performances from that of their fielders’ — and removing luck — are difficult tasks necessary to isolate pitchers’ true talent.

Testing an ERA estimator’s ability to predict future ERA is the most common method of assessing its reliability, because if it is similar to a pitcher’s future ERA, then it is probably picking up a pitcher’s true skill level. The most common metrics for testing an ERA estimator are correlation with future ERA or the Root Mean Square Error (RMSE) with future ERA. The difference between using correlation and RMSE is that the correlation studies how much two numbers move together, while RMSE studies how close they are.

Say you were a general manager a year ago, and you saw two pitchers on the trading block who were coming off sub-3.00 ERAs. Their names were Jaime Garcia and Mat Latos. You knew both were due for a reversion to the mean. If you wanted to target the superior pitcher, you would want the ERA estimator with the superior correlation with future ERA. But if you were more interested in pinpointing each of their future ERAs to determine how competitive your team would be if you got either, the ERA Estimator with the superior RMSE should be your focus. In other words, the pitching metric with a best correlation with future ERA ranks them correctly, while the pitching metric with the best RMSE is better at predicting their performance in an absolute sense (rather than relative to other pitchers).

But you might wonder: why not just use a future ERA projection? The implicit assumption is that if you only want to predict future performance, you might as well factor in several years of data, aging, park effects and changes in skill level that a more sophisticated system would recognize. But I would strongly argue that doing a projection requires two steps:

1) Figuring out how well a pitcher actually pitched

2) Figuring out how this will change

If you don’t do the first step well, what can you expect from the second step?

Consider the following test for 2011 that shows the correlation of future ERA with several ERA estimators and projection systems. ERA estimators are presented in red and pitcher projections are presented in blue.

Estimator(N=258 pitchers) |
Correlation of 2010 Statistic with 2011 ERA* |

SIERA | .480 |

ZiPS | .470 |

xFIP | .438 |

ERA | .424 |

Marcel | .420 |

PECOTA | .413 |

FIP | .402 |

tERA | .396 |

Oliver | .371 |

*For all tables, I use only pitchers with 40 IP in both years.

SIERA actually tops the three most commonly cited projection systems in correlation with future ERA. In fact, you would have done a better job just looking at park-neutral SIERAs in 2010 than all four projection systems’ park-adjusted ERAs. Interestingly, 2010 ERA itself didn’t fare that badly when ranking pitchers’ 2011 ERAs — and it actually topped Marcel, PECOTA and Oliver.

What if we look at RMSE? This will penalize ERA estimators that let luck play too large of a role, even if the estimators appropriately rank pitchers. But here, we again see that SIERA tops the pack, and xFIP actually beats all three projection systems — plus PECOTA and Oliver — rather handily.

Estimator(N=258 pitchers) |
RMSE of 2010 Statistic with 2011 ERA |

SIERA | 1.048 |

xFIP | 1.069 |

ZiPS | 1.071 |

Marcel | 1.076 |

FIP | 1.132 |

PECOTA | 1.155 |

Oliver | 1.168 |

tERA | 1.171 |

ERA | 1.221 |

Just to make sure that this discovery wasn’t a 2011 quirk, I looked at the previous six years. Note that PECOTA has changed architects several times, but this will aggregate their performances. Also note that I left out Oliver because it wasn’t available for all six years.

Estimator(N=1,576 pitchers) |
Correlation of Statistic with Next Year’s ERA(2006-2011) |

SIERA | .428 |

ZiPS | .402 |

PECOTA | .394 |

tERA | .384 |

Marcel | .377 |

xFIP | .375 |

FIP | .355 |

ERA | .328 |

Over time, it appears that SIERA is the best at ranking pitchers — though tERA does edge Marcel. ZiPS is best among projections, though PECOTA’s strong 2007 and 2008 performances kept the aggregate score close.

Estimator(N=1,576 pitchers) |
RMSE of Statistic with Next Year’s ERA(2006-2011) |

SIERA | 1.126 |

Marcel | 1.132 |

PECOTA | 1.141 |

ZiPS | 1.143 |

xFIP | 1.148 |

FIP | 1.212 |

tERA | 1.236 |

ERA | 1.387 |

The lesson learned from the last RMSE table is that, over time, projections do a better job being closer to future ERA than ERA estimators — other than SIERA. But SIERA remains on top. It’s worth noting that the reason Marcel tops PECOTA and ZiPS here is that Marcel has a far lower standard deviation (.527 for Marcel, versus .652 for PECOTA and .721 for ZiPS). It might not rank pitchers as effectively, but it predicts their performances better simply by assuming that all pitchers are more average than they appear. When Ubaldo Jimenez or Javier Vazquez fall back to earth, Marcel catches them; but Marcel also expected Tim Lincecum and Cliff Lee to regress significantly, as well, while ZiPS took a stronger stand on both. This made ZiPS better at correlation and Marcel better at RMSE.

What’s particularly interesting about this discovery is that ERA estimators should have far inferior correlations and RMSEs. Projection systems use several years of performance to understand a pitcher’s true talent level. ERA estimators only use one year of data. Ryan Vogelsong had a very low 2.71 ERA in 2011, but his SIERA was only 3.97. That’s well and good, but years of ERAs and SIERAs both hovering around 5.00 suggest an even harder crash in 2012 — unbeknownst to Vogelsong’s 2011 SIERA. Using more data and more highly regressed data should significantly benefit projection systems. But that extra information isn’t enough to overtake the benefit of SIERA.

The reason this is happening is because we still aren’t incorporating the lessons of ERA estimators properly into developing projections. ERA estimators give a truer estimate of how well the pitcher actually pitched. If we continue to ignore the interplay between different pitching skills and their effect on runs prevention, we will fall short in our ERA projection. First we need to understand the information contained in strikeout rate as it pertains to BABIP, HR/FB and situational pitching. We need to know what information is and isn’t contained in batted ball data. And we must comprehend how all of these statistics combine to affect run prevention. Once we know all of these, then we can understand how well a pitcher is pitching at the moment — and this can be carried forward to predict how well a player might pitch in the future.

When predicting the future, using several years of SIERA will do better than one year, and adding park effects and aging to previous years’ SIERAs will also help. Most importantly, regressing SIERAs towards the mean is necessary if we’re interested in approximate talent level, rather than just rankings. After all, the gap in RMSE is small between SIERA and projection systems, but SIERA’s correlation advantage is far larger. At this stage, using SIERA as a jumping off point appears to be the best method to project pitcher performance.