Cain and Regression

When I told the writing staff here of my desire to write a post on regression to the mean, using Matt Cain as the proxy of sorts, Dave replied: “Admit it, you just want to write about Cain again…”. He wasn’t really wrong, as many of you know that Cain happens to be my favorite non-Greg Maddux pitcher, but his 2009 season has been so interesting to date that he seems like the perfect subject for a discussion of regression. I have seen it happen countless times, but fans interested in developing their statistical knowledge tend to go through a few stages with regards to the evolution of a particular metric.

First, they are very skeptical, wary to accept something new as meaningful. Next, they grasp the underlying meaning and begin to incorporate the stat into analysis. Finally, basking in the fact that they understand the benefits of the stat, it gets tossed around whenever possible and treated like the gospel. Unfortunately, when this last part occurs, the true understanding is not fully developed and definitive claims are often a bit off course. This is in no way a criticism, as I myself have gone through the same stages at one time or another, but rather an observation.

With regards to Cain, I have seen way too many analyses discussing his ERA-FIP disconnect and how an ugly regression causing his ERA to balloon was inevitable. I profiled this over at Baseball Prospectus, begging for those making such claims to dig deeper and find out what pitchers are doing differently, if anything, before jumping to conclusions. After all, not everyone regresses, and not all regressions are bad. The problem is that the term regression takes on such negative connotations these days that it seems odd for it to portend anything positive. Regression is in fact a two-way road, though, and deserves to be treated that way.

No, Matt Cain is not very likely to sustain an 88% strand rate, but he is also unlikely to post walk and strikeout rates that drastically stray from his true talent level. A pitcher with strikeout rates ranging from 7.4-8.4 has a pretty low likelihood of suddenly whiffing hitters at a rate closer to 6.0 per nine innings; likewise, one with an established unintentional walk rate around 3.5 probably will not finish the next season closer to 4.3 barring unforeseen circumstances. Despite these assertions, after Cain’s 8th start, when his 2.65 ERA supremely bested an FIP built upon a 4.24 UBB/9 and 6.0 K/9, nobody really thought to suggest that those rates would regress (in this case a positive regression). The moral here I suppose is that even though his strand rate will not stay that high, he is going to allow fewer baserunners that need to be stranded.

Five starts later, Cain has reduced his UBB/9 to 3.44 and increased his strikeout rate to 7.10. The ERA is still quite low thanks to the extraordinary strand rate, but his FIP is regressing itself towards levels of the recent past. If I had to bet money on it, I would agree that Cain’s ERA is more likely to increase than his FIP is to decrease, but regression does not occur in just one metric. If his strikeout rate continues to regress and his walk rate either improves or holds true, combined with a regression in stranding runners, Cain could conceivably have an ERA around 3.25 with an FIP at the 3.70 mark. At that point, the disconnect between the two stats isn’t that vast.

In fact, ZiPS sees Cain finishing the season in a similar fashion to the aforementioned numbers, with a 3.28 ERA and 3.83 FIP. An FIP of 3.83 is certainly very solid, as is a 3.28 ERA, and the main reasons the disconnect would reduce involve regression towards established talent levels in walks and strikeouts, that have not yet been experienced this season.

These are certainly big “if’s” but I really just wanted to hammer home two points: the numbers beneath the numbers really need to be analyzed in order to find out why certain rates are where they are, and that regression works both ways, meaning we should not ignore the areas bound to experience a positive regression, which in turn could reduce the amount of negative regression inherent in a dataset.

Print This Post

Eric is an accountant and statistical analyst from Philadelphia. He also covers the Phillies at Phillies Nation and can be found here on Twitter.

14 Responses to “Cain and Regression”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. It’s funny, when Matt Cain was posting records like 7-16, we were all running around “proving” he was a much better pitcher than that. Now he’s 9-1 and we are all running around “proving” he’s a much worse pitcher!

    Thanks for the level-headed analysis. It is nice to see Cain get some run support and rack up some W’s. He seems to be more mature and confident this year, and taking a page from the Book of Maddux, he seems to be allowing guys to get themselves out, not just trying to blow them away. He seems to have made better use of his off-speed stuff, too. Of course, those are all emotional fan-boy observations about my favorite Giant–I will not be held responsible for their lack of statisitcal validity!

    Vote -1 Vote +1

  2. Davidceisen says:

    I was ridiculed here before the season started for arguing that Matt Cain was better than his FIP would indicate.

    If Cain truly is better than his FIP, doesn’t that invalidate the WAR system this site uses for pitchers?

    Vote -1 Vote +1

    • B says:

      One statistical outlier doesn’t invalidate the whole system. The authors noted a while ago Javier Vaszquez is a player that typically underperforms his FIP, and he may just be an outlier on the other end. Almost everyone else, however, falls into the category where FIP accurately values them. As a Giants fan, I hope Cain truly is an outlier where he consistently outperforms his FIP.

      That said, I’m not the hugest fan of using win values this early in the season to try to describe a pitchers performance.

      Vote -1 Vote +1

  3. Lefty Malo says:

    In Cain’s last start, the complete-game gem Sunday vs. Oakland, he was throwing his change-up as a strikeout pitch seemingly at will, the first time I’ve seen it so dominant. This is where scouting needs to mix with stats — no stat system can take into account a “light bulb” moment where a player suddenly figures something out (a hitch in his swing, a new grip on a new pitch, etc) or adds a new weapon.

    Vote -1 Vote +1

  4. Great analysis! As usual.

    I would add that Cain has been soaking up advice from Randy Johnson, who has learned the art of changing from a fireballer to a pitcher and is sharing this knowledge base with Cain, and whoever else would listen on the Giants staff. He is just paying forward the advice he got from Nolan Ryan long ago that helped him figure things out. He’s been a great addition to the staff, for that reason alone, his influence will affect the Giants for years afterward if Cain and Lincecum pick up good advice from him.

    Vote -1 Vote +1

  5. SharksRog says:

    Matt’s pitching HAS improved as the season has gone along. Beginning with his May 12th start, he has improved. HIs hit, walk and homer rates have all declined, while his strikeout rate has improved.

    But he hasn’t pitched nearly as well as his 2.39 ERA would indicate.

    I thought Matt pitched a true gem Sunday, better even than Tim Lincecum’s second career shutout two days before. But even there Matt benefitted from some good luck early.

    To the first 11 batters he yielded a homer, two doubles and a single that should have been a double. Yet because of a sparkling defensive play that turned the single into an out when the batter/runner assumed he had a double and didn’t run full out, Matt got out of all that allowing only one run.

    The next inning he gave up a shot that was caught on a nice catch by Nate Schierholtz, the game guy who made the great throw earlier and then hit an inside-the-park home run on which he showed he might be the fastest major leaguer of his size. From that point on, though, Matt was nails.

    I think the 3.25/3.70 or 3.28/3.83 ERA/FIP numbers seem like good estimates for Matt’s entire season. I recently put the over/under on his ERA this season at 3.25.

    Either Matt is going to begin pitching closer to the level his ERA implies, or his ERA is likely to go up considerably. I see no reason Matt shouldn’t start the All-Star game for the NL, but I suspect at season’s end we may be asking why.

    Then again, if he continues to get double the run support he has received in the past two seasons, his won-loss record should continue to be quite good even as his pitching regresses toward his mean.

    It appears Matt is being rewarded by the run support gods for his long suffering the past two seasons.

    Vote -1 Vote +1

  6. FishFrenzy says:

    Loved this piece, Eric. As a guy trying to make his way into writing and using these sorts of metrics to profile players, it’s been difficult not to just throw stuff like BABIP and ERA/FIP regression like gospel and make rash regression remarks. I certainly think I’ve been guilty of it, but it takes a good eye to see each piece of potential regression and think about how that will work out. The projection systems like PECOTA, zIPS, and others help, but as an analyst you have to peruse all the information and find all the things that are pertinent to the discussion. I’m trying my best, and I’m glad this site is championing that kind of writing.

    Vote -1 Vote +1

  7. Aaron B. says:

    Eric is cool

    Vote -1 Vote +1

  8. EGC says:

    So when are we going to see this regression?

    Vote -1 Vote +1

  9. cpdodger says:

    why does cain’s era consistently outperform his FIP and xFIP?

    Vote -1 Vote +1

    • Adam says:

      Because almost after 1000 IP in his career, he’s shown that he has control over two things most sabrmetricians assume pitchers have little to no control: his IFFB% (12.4% for his career, 10% is MLB average), and his HR/FB% (6.6% for his career, 9-11% is MLB average). He seems to be an outlier in that batters don’t get good contact on his pitches, so even though he is an extreme flyball pitcher, those flyballs don’t hurt him near as much as they would for most other pitchers.

      Usually, a low HR/FB and a high IFFB causing a low BABIP would scream regression, but after over 5 years and 1K innings, I think we have to admit that Cain is just a unique pitcher.

      Vote -1 Vote +1