Poll: Which Group of Pitchers Performs Better?

If you are familiar with my feelings on pitchers, you know that I put little stock in ERA over smaller samples. Instead, I choose to largely ignore perhaps the most accepted metric to describe a pitcher’s performance by focusing on his peripherals and ERA estimators, my favorite of which is SIERA. Sure, over a long career, ERA is most certainly the better of the two to judge a player’s performance, but at the all-star break of a season, give me SIERA. With most starting pitchers having thrown only about 120 innings, the sample size remains far too small for ERA to provide significant predictive value over the remainder of the season.

Of course, it’s hard to ignore ERA. Do you think a Matt Cain owner wants to hear that his pitcher has simply suffered from some poor fortune? Of course not. It’s human nature to place a greater emphasis on the most recent past (recency bias) and since ultimately it’s the earned runs that count, not the imaginary expected runs SIERA believes a pitcher should have allowed, then ERA is the statistic that is focused on.

So as we sit here and wonder how Jeff Locke and Travis Wood parlayed fantastic defensive support into an All-Star appearance, I decided that it would be fun to play a little game. Below are two groups of pitchers. Group A is composed of the 10 pitchers whose ERA sits most below their SIERA marks. Group B, on the other hand, features the 10 pitchers whose ERA exceed their SIERA marks by the largest amount. The game is simple: vote for which group of pitchers posts a better ERA after the All-Star break through the remainder of the season. Don’t forget to also vote for which range each group’s ERA will fall into through the rest of the season.

I will be closing the polls right before the first pitch when games resume on Friday. At the end of the season, I will revisit this post and publish the results of the voting, as well as the performances of the pitcher groups. For the record, I vote for group B.

Group A – The SIERA Outperformers

Jeff Locke 109.0 16.7% 10.8% 0.228 83.3% 6.7% 2.15 4.56 -2.41
Travis Wood 122.2 17.6% 7.8% 0.227 76.5% 5.8% 2.79 4.45 -1.66
Bartolo Colon 126.2 14.0% 3.0% 0.287 80.2% 6.0% 2.70 4.19 -1.49
Mike Leake 117.0 15.0% 5.5% 0.260 79.6% 10.0% 2.69 4.11 -1.42
Jason Marquis 112.1 14.7% 13.2% 0.256 79.7% 19.6% 3.77 5.11 -1.34
Clayton Kershaw 145.1 24.8% 6.3% 0.238 78.7% 5.4% 1.98 3.24 -1.26
Patrick Corbin 130.1 21.2% 6.4% 0.246 81.9% 7.8% 2.35 3.61 -1.26
Hiroki Kuroda 118.2 17.7% 5.1% 0.252 82.6% 9.8% 2.65 3.88 -1.23
Jorge de la Rosa 109.1 16.6% 8.5% 0.294 76.4% 6.7% 3.21 4.32 -1.11
Bronson Arroyo 123.2 13.7% 4.6% 0.254 78.9% 11.6% 3.42 4.41 -0.99
Average 121.2 17.4% 7.0% 0.253 79.6% 8.8% 2.74 4.15 -1.40

Group B – The SIERA Underperformers

Joe Blanton 112.1 18.2% 5.1% 0.343 70.6% 18.1% 5.53 3.85 1.68
Wade Davis 94.2 19.9% 9.4% 0.381 66.3% 13.5% 5.89 4.21 1.68
Rick Porcello 99.1 19.4% 4.6% 0.317 65.4% 15.7% 4.80 3.15 1.65
Edinson Volquez 109.2 18.8% 10.2% 0.342 63.3% 8.8% 5.74 4.31 1.43
Edwin Jackson 100.1 19.5% 8.1% 0.320 62.3% 10.6% 5.11 3.83 1.28
Roberto Hernandez 108.1 18.2% 5.4% 0.304 69.7% 21.2% 4.90 3.63 1.27
Matt Cain 112.0 22.1% 7.9% 0.257 63.4% 12.7% 5.06 3.84 1.22
Ian Kennedy 108.0 19.1% 8.4% 0.298 67.1% 12.6% 5.42 4.26 1.16
Jeremy Hellickson 117.2 20.3% 5.5% 0.296 66.9% 10.8% 4.67 3.74 0.93
Yovani Gallardo 113.2 18.3% 8.7% 0.310 65.7% 12.5% 4.83 4.10 0.73
Average 107.2 19.3% 7.3% 0.315 65.9% 13.6% 5.17 3.88 1.29



Print This Post

Mike Podhorzer produces player projections using his own forecasting system and is the author of the eBook Projecting X: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

33 Responses to “Poll: Which Group of Pitchers Performs Better?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Sky Kalkman says:

    After the year, will you also look at how their proposals have changed, please? I have this pet theory that K BB rates are in somewhat of an equilibrium with BABIP and HR rate. perhaps as BABIP regresses for group A, their peripherals will improve, so that they don’t fully regress to their current SIERAs…

    Vote -1 Vote +1

    • Giovani says:


      Vote -1 Vote +1

      • Sky Kalkman says:

        Throwing more strikes reduces walks, but increases BABIP. Throwing more borderline breaking stuff increases walk and also increases strikeouts. There are trade-offs for where and what you throw. My theory is that some pitchers haven’t reached their equilibrium. SIERA overachievers are probably lucky, but perhaps they are also the ones showing a bit of a (mostly unsustainable) BABIP skill. Going forward, they’ll adjust slightly, both seeing their BABIP rise, but also seeing improvement in peripherals. My guess is that this happens more with MLB rookies, as they adjust to the tougher level. Guys who come up and show nice peripherals but get killed in BABIP need to be willing to sacrifice peripherals in order to achieve a MLB-level BABIP.

        Vote -1 Vote +1

      • Yes, I’ll include as many of the stats in the tables from this article in the final one.

        Vote -1 Vote +1

  2. pudieron89 says:

    There aren’t enough good name pitchers in Group B for it to really be competitive with Kershaw/Colon/Corbin/Kuroda in Group A. The SIERA underperformers that do have name value (Cain/Kennedy/Jackson/Gallardo) are all pitchers that seem to be dealing with mechanical issues that are leading to an inflated BABIP and I’m not sure it’s just regression that they need to bring them back to career norms.

    I didn’t mention Hellickson as name value in group B because I find it funny that just as we were ready to put him in the same class as Cain as a serial sabermetric outperformer, both of them have a down year wrt traditional metrics. Not as peculiar for Cain, since he his FIP is inflated and he’s giving up a lot more homers. But Hellickson is posting his best strikeout and walk rates since his 2010 callup when he flashed promise over ~40 innings, and conversely, his ERA matches his FIP of the past two years since then when he’s gotten by with low-ERA, low BABIP, pitch-to-contact stuff.

    Vote -1 Vote +1

    • MLB Rainmaker says:

      Thats one of the problems with sabermetric analysis — it too often relies on the assumption that a given player has a talent baseline and will perform to that baseline.

      Vote -1 Vote +1

  3. Wobatus says:

    I voted group A, with a projected era between 3.75 and 3.99 and group b with 4.25-4.49. Just a bunch of guys in B with a history of mid 4 ERAs or struggling this year (ok, there’s the recency bias lurking). B may get it down to 4.20 or so but i decided to go with the worse scenario.

    Vote -1 Vote +1

  4. jim says:

    Group B will rely on Porcello reaching that super low SIERA. I think ROS his peripherals will be closer to his usual,4 SIERA, but if I am wrong group B should win out.

    Vote -1 Vote +1

  5. Giovani says:

    This is fun.

    Vote -1 Vote +1

  6. Jordan says:

    It would help put everythin in perspective if we knew the average era of each group and average Siera of each group.

    Vote -1 Vote +1

  7. Mister says:

    I feel wrong for doing this, but I’ve got to go with group A. I believe in SIERA, but when you pick the 20 biggest ERA/SIERA disparity guys I suspect you will end up with many pitchers who have real reasons for over or underperforming their SIERAs (defense, home park. etc.). And 3.88 to 4.15 is not a large SIERA difference between the 2 groups. If it was more like 3.7 to 4.2, I’d have gone with group B.

    That said, it ought to be close, and both groups 2nd half ERAs should be much closer to their 1st half SIERAs than their 1st half ERAs. I think group A finishes in 3.90-4.00, and group B finishes 4.00-4.10.

    Vote -1 Vote +1

  8. Nate says:

    Quick question. How would I discuss SIERA with someone in person? In my mind I say ‘sierra’ as in Sierra Mist, but should it be ‘S-I-E-R-A’?

    Vote -1 Vote +1

  9. Sky Kalkman says:

    Also, what do ZiPS and Steamer have to say about each group’s rest-of-season performance? When you have a projection, why not use a projection?

    Vote -1 Vote +1

    • Ooohhh, good question Sky. I’ll look into this.

      Vote -1 Vote +1

      • Wobatus says:

        Group B’s ZiPS ros ERA is 4.12.

        Vote -1 Vote +1

      • Wobatus says:

        Group A’s ZiPS ROS ERA is 4.16. Pretty close.

        ZiPS just seems off on some of these. Marquis and Leake are both projected to have worse ERAs by a good margin than their career and year to date ERAs, xFIPs and SIERAs. Leake projected ROS ERA: 4.46. Season/career ERA 2.69/3.93, xFIP 3.96/3.88 and SIERA 4.11/4.03. Marquis And Kuroda’s projected ERA seems a tad high on that scale too.

        Vote -1 Vote +1

      • Thanks for saving me the work! Interesting how close the two groups are projected at.

        Vote -1 Vote +1

  10. jfree says:

    The biggest problem I have with pitching projections is that a season itself is nowhere near large enough a valid sample size to expect regression towards a single data point (the mean of some estimator). At best, SIERA estimates (BABIP too) have a range of variance around them (the real-world difference between actual ERA and the SIERA estimator – measured over whatever time-period).

    That range is almost certainly tighter over the course of a season than it is over the course of a half-season. But any expected regression over a measurement period will imo tend to be towards the outer limits of that range rather than towards the single-point mean.

    Basically, it is a different form of bias — treating a probabilistic data point as a deterministic one. You see this bias all the time at work. Give an executive a spreadsheet with nice discrete numbers — and watch the executive’s confidence grow to the point of hubris. The mere presence of numbers creates absolute truthity once those numbers hit the human brain.

    Vote -1 Vote +1

    • Wobatus says:

      You’ve got over a thousand innings in each group. That’s a little larger data sample.

      Vote -1 Vote +1

  11. Ruki Motomiya says:

    I voted for Group A and here is why:

    Clayton Kershaw is in Group 1 and he seems to have an established ability to reduce his BABIP and Home Runs below the average: His ERA over 1089.1 innings is .30 below his FIP and .60 below his xFIP, he has a career BABIP of .270 and a career HR rate of 6.6%. Because of this, Kershaw strikes me as someone likely to signifcantly beat their SIERA and other predictors, especially if his K/9 goes back up over 9 (He lowered his walk rate!)

    Jeff Locke’s K/9 is significantly worse than his K/9 in the Minor Leagues and could see positive improvement.

    Hiroki Kuroda has a career BABIP of .278 over 1037.1 innings, so his BABIP could stay reduced compared to league average most of the year. In addition, his SIERA would put him at the average of Group B, so he is likely to not hurt Group A at least.

    Patrik Corbin strikes me as the kinda guy who could keep his rate numbers a little below predictors: Solid Ks, low walks, that sort of thing, even if he is not THIS good.

    Bronson Arroyo is another player who has actually demonstrated BABIP controlling abilities: Compared to what I hear is the expected .300, he has .281 over 2200 innings. His career ERA is .40 lower than his FIP/xFIP. And his ERA is right in line with last year’s and should only see a minimal rise.

    Jason Marquis sucks, though.

    And as for Group B…

    Yovani I always felt was kinda overrated, so I have bias. His FB is gone, his K/9 is plummeting and his control is shot. Even predictors don’t say he’ll be necessarily /good/, just better than his 4.83 ERA.

    I’ve generally felt pitchers with a higher walk rate seem more likely to underperform their peripherals (More baserunners = more likely that sequencing can go poorly and score more runs than predicted) and Volquez has no control. SIERA predicts him to do worse than the average of Group A!

    Rick Porcello is showing strikeout power we’ve never seen from him before. In addition, he has never had a high LOB% aside from way back in 2009: His career LOB% is 68.9% in 791.0 innings. So he seems likely to slightly underperform his peripherals and he has questions about his K/9. (His career ERA is 4.59 compared to 4.17 FIP and 3.97 xFIP)

    Wade Davis has not established his ability to strikeout so many players as a starter, even if I want to believe it is real.

    E-Jax has always been bad when he walks too many guys and good when he keeps it under 3. With a career 3.50 BB/9, plus the fact he has underperformed his peripherals since 2010, I see him as not fully meeting his SIERA.

    I’m feeling bullish on some players in Group A as well (I think De La Rosa could pick it up in the second half), so I predicted a total ERA of 3.50 to 3.74. I went with 4.00 to 4.24 for Group B.

    As an aside, another reason people might look at the ERA is because it means a player might not provide value even if they do good. For example, if Matt Cain does a 3.32 ERA for the second half like ZIPs says…his total ERA contribution to your team was still a high 4.31. And if you’re in H2H, having so many bad weeks could really hurt you! So even if SIERA better predicts the future, the ERA of the first half changes their ultimate value.

    Vote -1 Vote +1

    • Wobatus says:

      Paragraph 10 is dead on. Pitchers with higher walk rates underperform their peripherals. Which means the formulas need to be tweaked. Higher walk rates are worse than SIEREA and xFIP account for. Lower walk rates are great unless coupled with high babips or homer rates.

      Of course Ihave no actual data to back this up.

      Vote -1 Vote +1

      • Ruki Motomiya says:

        That sounds like an interesting idea for an article: Do players with a higher walk rate underperform their peripheral numbers at a higher rate than players with an average or better walk rate? I doubt I have the skill to do such a thing, but it is an interesting idea.

        Vote -1 Vote +1

  12. Dingbat says:

    Hmm, I voted for group B, but then I decided to check out last year’s data:

    1st Half ERA (league-wide): 4.00
    1st Half SIERA (league-wide): 3.91
    2nd Half ERA (league-wide): 4.03

    Looking team-by-team, it looks like 1st half ERA outperforms 1st half SIERA as a predictor for 2nd half ERA for 17 teams, SIERA does better for 11 teams, and they perform the same for two teams.

    For what it’s worth, first half xFIP last year was 4.04, suggesting that it would do better than SIERA as a predictor. Of course, we’d need to look at more than just one year’s worth of data to compare the predictive value of these stats, but what I saw for 2012 is making me reconsider my vote.

    Vote -1 Vote +1

    • Dingbat says:

      Okay, so I went back and looked at five years of data, comparing the success of FIP, xFIP, tERA, SIERA, and ERA in the “first half” at predicting ERA in the “second half.” For each stat, I took the absolute value of the difference between the 1st-half stat and 2nd-half ERA and then took the average difference over five years, 2008-2012. Here’s what I found:

      FIP: 0.11
      xFIP: 0.09
      tERA: 0.20
      SIERA: 0.12
      ERA: 0.15

      Based on this (still limited) sample, it looks like xFIP is the best predictor of 2nd-half ERA, followed by FIP, SIERA, ERA, and tERA. For the purposes of this article, it should be noted that SIERA outperformed ERA (by between 0.07 and 0.11) in three out of the five years, while ERA only outperformed SIERA in one year.

      Vote -1 Vote +1

  13. Dingbat says:

    Happy to do it. I’ve done some follow-up work on this that I think I’ll turn into a Community post, but here’s a preview of my results. I basically reproduced what you did here, selecting the 10 top underperformers (Under) and overperformers (Over) based on 1st-half SIERA and 1st-half ERA (ERA1), and then compared each measure to 2nd-half ERA (ERA2). Here’s what I found for the past three years (all are average values for each group:

    2012 ERA1, SIERA1, ERA2
    Under: 5.42, 3.89, 3.94
    Over: 2.63, 4.22, 4.30

    2011 ERA1, SIERA1, ERA2
    Under: 5.21, 4.07, 3.90
    Over: 2.70, 4.25, 4.55

    2010 ERA1, SIERA1, ERA2
    Under: 5.32, 4.01, 4.29
    Over: 2.71, 4.23, 4.13

    Hopefully the formatting showed up okay. In 2012 and 2011, the Unders did better in the 2nd half than the Overs did, while in 2010, the Overs still outperformed the Unders in the second half. In all three years, though, the 2nd-half ERAs much more closely resembled the 1st-half SIERAs than the 1st-half ERAs. This is what you’d expect when comparing a statistic that accounts for regression to the mean to one that does not. Once I work through a few more years, I’ll make a full Community post out of this and share some more interesting observations.

    Vote -1 Vote +1

    • Awesome, looking forward to it. Looks like these results illustrate exactly what I was hoping they would.

      Vote -1 Vote +1

      • Dingbat says:

        Yeah – it’s not too surprising that when you look at two groups of outliers in the 1st half, they converge in the 2nd half. I think the key is that the difference between the 1st-half SIERAs for the two groups is fairly small, so given expected variance between observed 2nd-half ERA and predicted ERA (as predicted by SIERA), sometimes you’ll get the Unders outperforming the Overs, and vice-versa.

        Vote -1 Vote +1