Predicting baseball statistics is a tough job, especially when it comes to pitchers.

For every Roy Halladay pitch machine, there are 10 James Shieldses – guys whose ERAs change a run or two every year. Basically, it’s a crapshoot when it comes to figuring out the next ace – or the former ace-in-waiting who’ll lose his job by the all-star break. Don’t believe me? Consider this: In the past 11 years, four hitters have led the major leagues in WAR; eight pitchers have led the majors in ERA.

So while the league-leading run producers might be predictable, league-leading run preventers change almost every year. But hope isn’t completely lost when it comes to figuring out the next pitching superstar – or dud. Some pitching stats are *very* predictable, and focusing on those few numbers might lead us to a better system to evaluate talent.

Enter Skill-Interactive ERA – or SIERA — which follows in the footsteps of xFIP by using statistics that don’t change much from year-to-year: strikeout rate, walk rate and ground-ball rate.

When Eric Seidman and I developed the original SIERA — which appeared at Baseball Prospectus more than a year ago — we didn’t totally appreciate why it worked. Essentially, since it uses regression analysis, it asks the question: What’s the typical ERA for pitchers with similar strikeout, walk and ground-ball rates in recent years?

Our thought was that the main reason SIERA worked so well was that it took into account the interplay between those three statistics. But on further analysis, it turns out that SIERA is successful mainly because it assumes a low BABIP and HR/FB for strikeout pitchers (and for fly ball pitchers, as well).

By including both of these effects, SIERA measures pitcher performance in unique ways. Unlike batters, pitchers generally participate in consecutive plate appearances. Metrics like wOBA can approximate hitter performance using linear weights — effectively assuming that the hitter isn’t responsible for the events immediately before or after his performance. Pitchers also have only a share of the responsibility for the outcome of a plate appearance.

Metrics like FIP and xFIP attempt to isolate the outcomes that pitchers *do* control — like strikeouts, walks and home runs — and then credit them only with those outcomes. Using linear weights and highlighting the run-preventing and run-creating effects of these defense-independent events, the numbers can approximate a pitcher’s contribution to wins and losses far better than a simple ERA formula.

Take American League ERA-leader, Jered Weaver, as an example. xFIP sees Weaver’s 7.81 K/9, 2.06 BB/9 and 48.3% fly ball rate and gives him a 3.32. SIERA sees the same numbers and puts Weaver at 3.15.

So why is SIERA such a Weaver fan? It’s pretty simple: Year-to-year correlation in strikeout rate, walk rate and home run rate is significantly higher for pitchers than BABIP. That implies pitchers have a greater ability to control those outcomes.

Understanding this, Tom Tango developed FIP, which credits a pitcher with the effects of strikeouts, walks and home runs, while assuming that the player had a league-average BABIP. Taking that a step forward, xFIP also assumes a league-average home-run-per-fly-ball rate. Nate Silver took another approach by creating QERA, which uses regression analysis to estimate ERA. And last year, Eric Seidman and I developed SIERA, which also used regression analysis to predict earned run average but made several tweaks to Silver’s original version.

As a result, SIERA gives more credit for a strikeout than FIP and xFIP and less blame for a fly ball. It not only credits pitchers with the run-dampening effect of a strikeout, but also assumes that they allow fewer hits and home runs than FIP and xFIP, since high-strikeout pitchers allow fewer hits and home runs. While xFIP and FIP both assumed that Weaver’s 2010 BABIP was .297, SIERA assumed Weaver had a similar BABIP as other high-K-rate and fly-ball rate pitchers – which was about .278. His actual BABIP was .277.

Weaver isn’t an exception. Last year’s strikeout-rate leader, Jon Lester, had a BABIP of .291. Tim Lincecum led the league in strikeout rate in 2009, and his BABIP was .288. And while Lincecum’s BABIP was .310 in 2008, the 2007 strikeout-rate leader was Erik Bedard, who had a BABIP of .284.

In fact, look at the BABIPs for the league-leaders in strikeout rate during the past nine years:

Year |
Pitcher |
SO/PA |
BABIP |

2010 | Jon Lester | 26.1% | .291 |

2009 | Tim Lincecum | 28.8% | .288 |

2008 | Tim Lincecum | 28.6% | .310 |

2007 | Erik Bedard | 30.2% | .284 |

2006 | Johan Santana | 26.5% | .271 |

2005 | Mark Prior | 26.8% | .281 |

2004 | Randy Johnson | 30.1% | .267 |

2003 | Kerry Wood | 30.0% | .275 |

2002 | Randy Johnson | 32.3% | .291 |

Pitchers who allow less contact see weaker contact from hitters. While the batter, the defense and luck have a larger influence on whether a batted ball lands for a hit, pitchers have a role, too. Pitchers who allow ground balls also allow more hits, but fewer extra base hits on balls in play.

Looking at all 3,328 pitcher-seasons (with at least 40 innings pitched) between 2002 and 2010, I sorted players into four groups by strikeout rate. The higher the strikeout rate, the lower the BABIP and HR/FB at each level of strikeout rate.

STRIKEOUT GROUP |
BABIP |
HR/FB |

HIGH | .286 | 9.1% |

MEDIUM-HIGH | .295 | 10.2% |

MEDIUM-LOW | .298 | 10.7% |

LOW | .301 | 10.7% |

Pitchers do have some control over their BABIPs, but there’s too much noise in their actual numbers to infer their true skill levels. Weaver has had BABIPs as low as .238 and as high as .316 during his career. Still, small sample sizes of peripherals say more about BABIP skill level in a way that BABIP alone cannot.

On the extreme end, SIERA assumed the lowest BABIPs in 2010 for:

Jered Weaver (.278 predicted, .277 actual)

Ted Lilly (.280 predicted, .254 actual)

Phil Hughes (.284 predicted, .275 actual)

Colby Lewis (.284 predicted, .277 actual)

Matt Cain (.285 predicted, .254 actual)

It assumed the highest BABIPs for:

Jon Garland (.307 predicted, .268 actual)

Paul Maholm (.306 predicted, .332 actual)

Fausto Carmona (.306 predicted, .284 actual)

Rick Porcello (.305 predicted, .308 actual)

Clay Buchholz (.305 predicted, .263 actual)

While Weaver’s strikeout-inducing prowess makes him an extreme example, SIERA is on the money more often than not. Time after time, a player’s future ERA is closer to SIERA than any similar pitching metric. SIERA does this not by factoring projection trends — like reversion to the mean and aging — but because it answers the most basic question that can be asked about a pitcher: How well did the guy actually pitch?

The next four stories will:

1. Discuss the previous research I’ve done on pitching and how SIERA utilizes it. Discuss the SIERA changes for FanGraphs. Introduce the new formula and explain why it works.

2. Discuss pitchers with large differences in their xFIPs and SIERAs and explain what they teach us about pitching.

3. Test SIERA against different ERA estimators.

4. Discuss attempted SIERA changes that didn’t work.