Poll: Which Group of Pitchers Performs Better? – The Results

During the all-star break, I decided to undergo a little experiment. I took two groups of 10 starting pitchers comprised of those whose ERAs outperformed and underperformed their SIERA marks by the largest margins. There were 437 of you who answered the question “Which Group Posts a Lower ERA RoS?” and 61.1% of you voted for Group A, the SIERA outperformers. Despite this group actually posting a higher SIERA than Group B, you felt that the magic would continue. Let’s find out the results and if the majority was correct.

I’ll first start by reviewing how the SIERA outperformers did in both halves:

Group A – The SIERA Outperformers, 1st Half

Bartolo Colon 126.2 14.0% 3.0% 0.287 80.2% 6.0% 2.70 4.19 -1.49
Bronson Arroyo 123.2 13.7% 4.6% 0.254 78.9% 11.6% 3.42 4.41 -0.99
Clayton Kershaw 145.1 24.8% 6.3% 0.238 78.7% 5.4% 1.98 3.24 -1.26
Hiroki Kuroda 118.2 17.7% 5.1% 0.252 82.6% 9.8% 2.65 3.88 -1.23
Jason Marquis 112.1 14.7% 13.2% 0.256 79.7% 19.6% 3.77 5.11 -1.34
Jeff Locke 109.0 16.7% 10.8% 0.228 83.3% 6.7% 2.15 4.56 -2.41
Jorge de la Rosa 109.1 16.6% 8.5% 0.294 76.4% 6.7% 3.21 4.32 -1.11
Mike Leake 117.0 15.0% 5.5% 0.260 79.6% 10.0% 2.69 4.11 -1.42
Patrick Corbin 130.1 21.2% 6.4% 0.246 81.9% 7.8% 2.35 3.61 -1.26
Travis Wood 122.2 17.6% 7.8% 0.227 76.5% 5.8% 2.79 4.45 -1.66
Average 121.2 17.4% 7.0% 0.253 79.6% 8.8% 2.74 4.15 -1.40

Group A – The SIERA Outperformers, 2nd Half

Bartolo Colon 63.2 17.5% 5.2% 0.307 79.7% 6.0% 2.54 4.13 -1.59
Bronson Arroyo 78.1 17.3% 3.5% 0.288 76.3% 18.3% 4.37 3.75 0.62
Clayton Kershaw 90.2 26.7% 4.9% 0.272 83.3% 6.7% 1.59 2.76 -1.17
Hiroki Kuroda 82.2 18.9% 5.4% 0.324 68.5% 11.0% 4.25 3.67 0.58
Jason Marquis 5.1 0.0% 11.1% 0.333 45.5% 0.0% 10.13 7.23 2.90
Jeff Locke 57.1 18.9% 13.5% 0.365 67.0% 14.7% 6.12 4.53 1.59
Jorge de la Rosa 58.1 14.2% 9.0% 0.320 73.9% 9.3% 4.01 4.67 -0.66
Mike Leake 75.1 15.6% 6.7% 0.321 75.3% 13.9% 4.42 4.26 0.16
Patrick Corbin 78.0 19.9% 6.1% 0.337 68.6% 13.7% 5.19 3.67 1.52
Travis Wood 77.1 17.4% 8.4% 0.280 78.6% 8.4% 3.61 4.58 -0.97
Average 66.2 18.6% 6.7% 0.310 74.6% 11.2% 3.98 3.96 0.02

Remember that magic this group benefited from that resulted in the trio of a lucky BABIP, LOB% and HR/FB ratio in the first half? Yeah, that good fortune disappeared. Their BABIP and HR/FB marks jumped right back up to the second half league average, but they did sustain an above average LOB%, even though it dropped dramatically from the first half. Over this relatively small sample of 10 pitchers, in aggregate, they do not actually have any special abilities. As a group, their ERA was essentially the same as their SIERA in the second half, a far cry from the 1.40 runs they outperformed their SIERA by in the first half.

I asked another question in my original post, and that was “Which Range Will Group A’s ERA Fall Into RoS?”. The 3.50-3.74 range garnered the highest percentage of votes at 24.4%, while the correct range of 3.75-3.99 earned the third highest percentage at 21.4%. It seems pretty clear that everyone assumed regression, but not as much as actually occurred.

Interestingly, this group’s strikeout and walk rates improved from the first half, which pushed its SIERA below 4.00. In the comments, Sky Kalkman, man of many saber-friendly Internet sites, shared his theory that perhaps as BABIP regresses like we saw in this group, their peripherals will improve. That is exactly what happened. It’s still too small a sample to conclude anything, but this theory has me intrigued now.

Group B – The SIERA Underperformers, 1st Half

Edinson Volquez 109.2 18.8% 10.2% 0.342 63.3% 8.8% 5.74 4.31 1.43
Edwin Jackson 100.1 19.5% 8.1% 0.320 62.3% 10.6% 5.11 3.83 1.28
Ian Kennedy 108.0 19.1% 8.4% 0.298 67.1% 12.6% 5.42 4.26 1.16
Jeremy Hellickson 117.2 20.3% 5.5% 0.296 66.9% 10.8% 4.67 3.74 0.93
Joe Blanton 112.1 18.2% 5.1% 0.343 70.6% 18.1% 5.53 3.85 1.68
Matt Cain 112.0 22.1% 7.9% 0.257 63.4% 12.7% 5.06 3.84 1.22
Rick Porcello 99.1 19.4% 4.6% 0.317 65.4% 15.7% 4.80 3.15 1.65
Roberto Hernandez 108.1 18.2% 5.4% 0.304 69.7% 21.2% 4.90 3.63 1.27
Wade Davis 94.2 19.9% 9.4% 0.381 66.3% 13.5% 5.89 4.21 1.68
Yovani Gallardo 113.2 18.3% 8.7% 0.310 65.7% 12.5% 4.83 4.10 0.73
Average 107.2 19.3% 7.3% 0.315 65.9% 13.6% 5.17 3.88 1.29

Group B – The SIERA Underperformers, 2nd Half

Edinson Volquez 60.2 17.4% 9.4% 0.293 67.1% 17.2% 5.64 4.39 1.25
Edwin Jackson 75.0 14.5% 7.0% 0.324 64.6% 9.2% 4.80 4.34 0.46
Ian Kennedy 73.1 22.6% 10.4% 0.290 72.2% 14.3% 4.17 4.04 0.13
Jeremy Hellickson 56.1 14.6% 9.2% 0.328 65.2% 11.1% 6.23 4.94 1.29
Joe Blanton 20.1 14.9% 7.9% 0.361 60.1% 24.0% 8.85 4.26 4.59
Matt Cain 72.1 18.8% 6.1% 0.264 84.5% 8.1% 2.36 4.04 -1.68
Rick Porcello 77.2 19.1% 7.1% 0.312 75.1% 12.1% 3.71 3.68 0.03
Roberto Hernandez 42.2 15.9% 7.1% 0.318 72.4% 20.0% 4.85 3.73 1.12
Wade Davis 40.2 15.0% 9.4% 0.316 71.0% 4.5% 3.98 4.65 -0.67
Yovani Gallardo 67.0 19.2% 8.3% 0.278 79.1% 10.9% 3.09 3.96 -0.87
Average 58.2 17.7% 8.1% 0.303 72.2% 12.3% 4.39 4.17 0.22

This group was terrible in the first half, hampered by a high BABIP and HR/FB ratio and an inability to strand runners. But, most of those problems suddenly disappeared in the second half and the group went from underperforming their SIERA marks by 1.29 runs to just 0.22 runs. Yes, they still underperformed, but 0.22 is much less significant and within a reasonable error range. In fact, the group actually posted a lower BABIP than Group A in the second half! The other luck metrics weren’t much worse than Group A either.

The leading vote-getter to the question of “Which Range Will Group B’s ERA Fall Into RoS?” was 4.00-4.24, with 36.2% of the vote. This proved to be a bit too optimistic and surprisingly only 9.8% of you guessed the correct range of 4.25-4.49, which garnered the fourth highest percentage of votes.

For a second time, we observe a change in peripherals, this time a decline, as the strikeout rate dropped and walk rate increased. This was on the heels of a BABIP decline, once again giving some early credence to Sky’s theory mentioned above. Ultimately, the worse skills led to a higher SIERA in the second half versus the first half.

Now let’s directly compare the average lines of each group in the second half:

SIERA Outperformers 66.2 18.6% 6.7% 0.310 74.6% 11.2% 3.98 3.96 0.02
SIERA Underperformers 58.2 17.7% 8.1% 0.303 72.2% 12.3% 4.39 4.17 0.22

So the 267 of you who voted that Group A, the SIERA outperformers, would post a lower rest of season ERA, give yourself a pat on the back, as you were correct. But I bet it would still surprise many to learn how much the gap narrowed between the two groups. Group A posted the better peripherals and SIERA, so they should have posted a better ERA. But the story is that they essentially matched their SIERA after significantly outperforming it in the first half. It’s always tempting to fish for an explanation and try to justify the outperformance with unique visual observations and scouting type analysis. Nobody wants to shrug their shoulders and say they don’t know. But I’m here to tell you that it’s okay, you are allowed to simply explain it off as luck over a small sample size.

In the comments of the original post, Wobatus was kind enough to figure out each group’s rest of season ZiPS projections. He calculated Group A’s as 4.16 and Group B’s as 4.12. So, essentially the same. With that context, we do note that Group A slightly outperformed their RoS projection, while Group B underperformed it. However, we don’t know what peripherals the ZiPS projections were projecting, so we’re missing crucial information necessary for a complete analysis.

Obviously over just two halves of a season, any pitcher could outperform or underperform their SIERA marks like we saw with Bartolo Colon and Jeremy Hellickson. But knowing ahead of time which pitchers are going to do that is a fool’s errand. If you know that as a group, the outperformers will regress, while the underperformers will improve, then you have to try your hardest to ignore ERA and rely on the underlying skills and SIERA marks.

Print This Post

Mike Podhorzer produces player projections using his own forecasting system and is the author of the eBook Projecting X: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

25 Responses to “Poll: Which Group of Pitchers Performs Better? – The Results”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Mister says:

    Cool. This is pretty close to what I thought would happen, but I’m surprised to see that Group A ended up winning by such a large margin. I figured it would be very close in the end.

    I can’t come up with a good reason as to why peripherals moved to push SIERAs in the direction of ERAs. BABIP regression would affect per 9 peripherals, but shouldn’t affect per PA peripherals, and SIERA is calculated using K% and BB%, correct?

    Vote -1 Vote +1

    • Mister says:

      I guess the moral of the story is that everything regresses. Any time you have significantly above or below average performances, you should expect regression to the mean. More stable parameters like those that go into SIERA will regress less than metrics like ERA, but they will still regress.

      And now I am preaching to the choir.

      Vote -1 Vote +1

      • some.guy says:

        “Any time you have significantly above or below average performances, you should expect regression”

        Patently untrue, the way you worded it.

        I only mention this because as a Braves fan, I still hear people still thinking that Kimbrel’s 90% LOB% will regress to the mean, when instead, that number is the mean which accurately represents his talent. Deviations from that mean of 90% should regress, but not to an average performance of around 72%, as you seem to suggest.

        Vote -1 Vote +1

      • Mister says:

        Well, I didn’t say what I meant by “average.” It could mean league average, or it could mean a player’s career average from previous seasons. I almost put in a caveat there to say “league average, unless a player has a long track record of being better/worse than league average,” or something like that, but I couldn’t get the English right so I just left it slightly vague.

        And the context here implied that I wasn’t referring to specific individuals, but rather to groups of players. A single player can always be an exception, but probably not a group of 10. Basically, if you pick the top 10 players in ANY stat at the halfway point of the season and then track them the rest of the season, you should find that they move in the direction of the league average in that stat, either to a large or small degree.

        Vote -1 Vote +1

      • Mister says:

        Nevermind though, I’m wrong about this theory applying to this case. The league average SIERA this year was 3.87, and Group B’s first half SIERA was 3.88. So according to my previous explanation, Group B’s SIERA shouldn’t have changed much during the 2nd half. Either this is just random variation, or there really is something about underperforming your SIERA that leads to your SIERA increasing somewhat.

        Vote -1 Vote +1

      • Giovani says:

        You’re sure using a lot of words and comments section space to essentially say nothing and show you don’t understand regression.

        Vote -1 Vote +1

      • Mister says:

        You’re using a small number of words to show that you’re an asshole.

        Vote -1 Vote +1

      • A mustachioed business tycoon says:

        Hey a lot of us have hard work to do reading through comment boards around here!

        Vote -1 Vote +1

  2. Skeptic says:

    SIERA still strikes me dramatically under-theorized. Any particular reason that you chose it, instead of a more comprehensible metric like xFIP or FIP?

    Vote -1 Vote +1

    • SIERA is significantly more comprehensive than xFIP or FIP. Those two are the most simple. I automatically throw out FIP because it assumes that the HR/FB rate is 100% in the pitcher’s control, which is wrong.

      Vote -1 Vote +1

    • CPT says:

      I recall reading an article that found SIERA to be the most predictive of the four for future ERA.

      Vote -1 Vote +1

  3. murphym45 says:

    With Group B, it’s possible that the poor “luck-based” results (BABIP, LOB%) in some way affected the skills decline? If a pitcher is pitching normally and putting up solid peripherals but getting bad results, they might try to mix things up – go away from their strengths, nibble more to avoid hard hit balls, etc. – which could lead to a decline in K% and increase in BB%.

    Vote -1 Vote +1

    • I think that’s part of what Sky’s theory suggested.

      Vote -1 Vote +1

    • Sky Kalkman says:

      And this could hold whether their poor early-season BABIPs were real, partially real, or not real at all.

      My theory is that perhaps the high BABIP is at least partially real, and sacrificing a bit of K and BB in order to bring it down is a worthwhile trade-off.

      On a larger scale, I think one reason MLB BABIPs tend to cluster is that there are tradeoffs between all of BABIP, K, BB, and HR. Sacrificing K and BB to reduce BABIP is worth it to a certain extent (down to the .300 range), but not farther than that. This equilibrium is different for different pitchers and different types of pitchers.

      *** ***

      Does this mini-study prove anything? No, but it’s kinda fun to see SIERA and K/BB shifts, though. Maybe it motivates someone to do a more rigorous study? Please?

      Vote -1 Vote +1

      • What do you think pitchers can do to choose between K and BB rates versus BABIP? I would think they would be related, throwing more pitches that are difficult to hit (mostly based on location of the pitch I’d assume) would both reduce BABIP and increase strikeouts.

        Vote -1 Vote +1

      • Ruki Motomiya says:

        I’m not a smart enough guy to really provide much help here, but I think it could be related to picking a strikeout pitch vs. a hittable one, but one more likely to be hit weakly. And some pitchers (Kershaw) can mix in both to great effect.

        Vote -1 Vote +1

      • slash12 says:

        K% has shown to have a reverse correlation to BABIP. More K’s, lower BABIP. This is even when considered in the same regression as FB% (which has an even stronger relationship).

        Vote -1 Vote +1

      • slash12 says:

        Here’s an equation I came up with for calculating pitcher BABIP:


        Where GB% and K% are written in decimal form. I found this to be a pretty descent predictor, when I put park and team defense factors on top.

        Vote -1 Vote +1

      • slash12 says:

        the +.006 at the end is an adjustment for starting pitchers (took me a few to remember that)

        Vote -1 Vote +1

  4. Andrew says:

    I would guess that the difference between the final results of the two groups is largely due to the significant difference in the size of each “half.” If the innings pitched in each half were equal, I suspect that we would see different results.

    Vote -1 Vote +1

    • Jason B says:

      But you’re aggregating results across pools of ten pitchers, so you’re looking at 2nd half samples of over 660 and 580 innings respectively. Which are quite large samples indeed.

      Vote -1 Vote +1

  5. Wobatus says:

    I got both A and B correct, that A would be 3.75-3.99 and B 4.25-4.49 rest of way. I think I went A mostly because of Kershaw. He and Colon salvaged the group.

    Vote -1 Vote +1

    • Wobatus says:

      Also, that beats ZIPS, since the ROS projections for group A was an e.r.a. of 4.16 and it came in at 3.98, and for B the ROS was 4.12 and it came in at 4.39. Although ZIPS seemed to get more right than wrong (I thought the projection for Leake seemed off but it was very close). Again, Kershaw seems like he is just too good even for projection systems. :)

      Vote -1 Vote +1

  6. Kinanik says:

    How much is the change in Group A’s peripherals due to selection bias? Those whose peripherals were worse became worse pitchers, which caused them to lose playing time, which improved the peripherals of the group. (And vice-versa, the best were leaned on more heavily as the season came to a close). You would expect the same selection bias in both groups, of course…

    Vote -1 Vote +1

    • Yeah, there is certainly some selection bias. It was only 10 pitchers, so this shouldn’t be used as some exhaustive study to prove anything. It was more a fun little exercise, though did essentially accomplish what I was hoping for – that Group A really never possessed any magical SIERA-beating abilities in the aggregate.

      Vote -1 Vote +1