Reaffirming our faith in DIPS

image
Are we really so sure that Matt Cain has some sort of special home run-preventing skill? (Icon/SMI)

Last week, there evolved a long discussion about DIPS, regression to the mean, xFIP, HR/FB, BABIP, and forecasting in general over at RotoWire. I linked to it here at THTF, and a few of you posted comments to it. There was one comment in particular that I wanted to respond to at length:

Dan Haren is singlehandedly destroying my faith in FIP, xFIP, and SIERA.

I’ve kept him on my team all year long and he just continues to kick me in the nuts, like today with his 7 earned runs allowed. His K/BB is great, his strikeout rate is good, yet his BABIP continues to be off-the-charts high.

Meanwhile, I dropped Tim Hudson in early May because I thought his .240 BABIP was not sustainable and his K/BB was barely at 1.0. Hudson just keeps rolling along. His BABIP is now .235.

I’m beginning to think too much knowledge is a bad thing. I make moves based on underlying peripherals and with the thought of “regression to the mean” in mind, and I’m behind owners who pick up Carlos Silva and Livan Hernandez.

Another commenter followed with:

You need to look at the actual player too. Some players are good at bettering the stats while others don’t live up to them. Matt Cain seems to better them while someone like David Bush is not. Bush had 2 season with a 1.14 WHIP and his ERA was like 4.4 and 4.2.

The coin flip analogy

While Matt Cain has posted better-than-average HR/FBs for a few years now (probably the best and longest we’ve seen since batted ball data has become available), that doesn’t necessarily mean he’s any better at preventing home runs on fly balls than Dave Bush. Think about it this way: If we have 8,000 fair coins and we flip them, probably 4,000 will land on heads and 4,000 on tails. If we take the “heads” coins and flip them again, about 2,000 will land on heads again. Flip those, and you get 1,000 of them landing on heads. Do this another nine times, and you’ll probably end up with two or three coins landing on heads each time.

But are these coins any different than the others we’ve been flipping? Is there something special about them that makes them more likely to land on heads than one of the original 4,000 to land on tails? Of course not. I told you in the beginning that they were fair coins. So if we flipped those last two or three another 8,000 times each, I’ll bet you they land on heads close to 4,000 times each.

While it’s hard to view humans in this way, we do know that humans don’t have ultimate control over everything in a baseball game and that random chance is involved. If it weren’t, we’d have a much easier time projecting performance.

But which coins will they be?

image
Most players are clustered toward the middle, but when a dataset is distributed normally, there will always be a few outliers in the 0.2% area.

We know that stats like HR/FB follow a (relatively) normal distribution (the same as our coin flips would). They form a bell curve (of sorts), with most players clustered toward the middle, but there are always outliers who are far removed from the middle. We also know that these outliers are rarely the same from year to year—the same as if we performed our coin flip exercise several times and marked each coin, we wouldn’t end up with the same two or three coins at the end of each trial. They’d always be different coins, even though we could be certain that we’d always end up with two or three of them. But predicting precisely which two or three would be impossible to do beforehand.

And the same holds true for things like BABIP and HR/FB. Sure, Livan Hernandez and Tim Hudson are having years where their ERAs don’t match their peripherals. But ask yourself this: How long do you expect them to continue doing that? If you don’t answer “indefinitely, because they truly deserve low BABIPs and HR/FBs,” then don’t beat yourself up. There’s nothing you can do, because the fact of the matter is, they are getting lucky. For the 2010 season, they are those final two coins remaining from the 8,000 flips. And it’s as simple as that.

And I put my money where my mouth is. I happen to own both Livan Hernandez and Carlos Silva in LABR NL this year (part of a strategy that involved owning a few crappy pitchers), but despite their successes, I’ve only used Livan for 87 innings and Silva for 70 (though I have begun to start Silva regularly over the past couple months because he’s combined legitimately good peripherals with a change in approach. Our coin flip example would still hold for him to an extent, though, because no one expected him to outperform his projections to this extent unless they scouted him in Spring Training and noticed his improved change-up, improved breaking ball, renewed control, etc.)

Second-half splits

To go along with this, I wanted to bring up one last comment from a post I made at the CardRunners site:

The Incompleat Starting Pitcher
The end of the nine-inning start and how we got here.

Dan Haren is an example of a first half ace. He’s a bum every second half… not only does his ERA jump about a run (3.29 to 4.22), but his WHIP goes from 1.10 to 1.31.

First half ERAs from 2006 to 2009: 3.52, 2.30, 2.72, 2.01. Second half from 2006 to 2009: 4.91, 4.15, 4.18, 4.62

Like with BABIP and HR/FB, “second-half ERA” is a stat with lots of variation. It takes many years to stabilize, and because it’s normally distributed, there will always be outliers, especially when dealing with smaller samples. In Haren’s case, we are dealing with a small sample of four poor second halves (plus two years where his second half was better than his first), so claiming that he’s merely a “first-half ace” may be a bit hasty.

So does that mean we know nothing?

No, it doesn’t. Just because it’s possible that Matt Cain is a true 11% HR/FB pitcher doesn’t meant that he absolutely is. Along with knowing that we’re looking at a mere sample and that what we’ve seen could be simple random variation, we have seen something. And what we’ve seen for Cain is a career 7.8% HR/FB. So what we do is weight his career and regress to the mean to remove the effects of luck as well as possible. Once we do that for Cain, we probably arrive at an expectation for his HR/FB of around 9% or so.

And as I said in my previous article in this long-running discussion, that expectation would change if we have other data (such as scouting or a PITCHf/x study). But unless we have that data, that’s the best we can do.

Concluding thoughts

I think that covers everything I wanted to cover, so if you have any questions or comments, feel free to let me know. I’m sure there will be some of you who will still be skeptical, so feel free to voice your concerns if you are.


Print This Post
Sort by:   newest | oldest | most voted
Mike Podhorzer
Guest
Mike Podhorzer
The coing flipping example is a perfect illustration of this point. This also reminds me of another good example that I learned about a while ago. You know those gambling ads that promise “guaranteed winners” and give you a phone number to call? Well the way it actually works is that half the callers are told one team and the other half the opposing team. If 1,000 called, this means 500 people will win. The following week, 250 of the previous week’s winners will win again. The next week, 125, and so on and so forth. Several people who win… Read more »
keith
Guest
keith

Nice work.  Still, it sure is frustrating when the process is seemingly correct and the results are for the birds!

12 Team Roto
11 pts in K/BB
10 pts in WHIP
2 pts in ERA

><

obsessivegiantscompulsive
Guest
obsessivegiantscompulsive

According to what I read on The Book blog, TangoTiger says that it takes around 7 seasons worth of results for a starting pitcher to have enough “coin flips” to show that his BABIP is legitimately lower than the .300 mean most regress to.  Zito has passed that threshold and Cain will soon, as well, at least for BABIP.

Given that BIP is significantly larger than flyballs, I guess that means that for most pitchers, except for those who pitch maybe around 15+ seasons as a starter, never can “prove” to be significantly below the HR/FB norm, right?

microwave donut
Guest
microwave donut
My theory is that Matt Cain has typical R/L splits for a RHP and SF’s home park kills HR’s to right field. However, looking at the HR/FB home/away splits you don’t really see anything but noise. As for the better in the first/second half point I think it’s easy to forget that these guys aren’t robo-players.  So much of the game is mental and focus can absolutely fade or intensify throughout a season. What about players who are slow starters?  The most common examples (to me) are Mark Teixeira, Adam LaRoche and Alexei Ramirez, and they all held the trend… Read more »
Derek Carty
Guest
Derek Carty

Well put, again, Mike.

I hear you, Keith, and that was a point I meant to include.  It sucks when you’re making the right decisions and getting poor results, but it’s a part of the game we play.

Derek Carty
Guest
Derek Carty
obsessivegiantscompulsive, I believe it’s actually 6 years, but to clarify a bit, it’s not that at 6 years something magically happens.  It’s not some magic number that, once we arrive at it, we’re all good.  It’s more of a continuum.  Basically, once we have 6 years worth of BIP, we’d regress a pitcher’s BABIP half of the way to the mean.  So let’s say we’re regressing a career .280 BABIP to a mean of .300 (ignoring weighting, aging, etc for now), at exactly 6 years, our regressed BABIP would be .290.  But that doesn’t mean we can’t regress with less… Read more »
Pat
Guest
Pat
I think you hit the nail on the head. Randomness does factor a lot in it, but it is not complete random because it involves humans. Some love pressure, while other fear it. The reason I used Bush is that he is a different pitcher, when he has runners on. It is possible that it is a ton of bad luck but because of his long track record I would bet against it. Maybe reason is that he is pitching from the stretch or the pressure. From a statistics standpoint, someone like Bush should be looked at as a different… Read more »
Pat
Guest
Pat

I like the stats. The BABIP in 07 and 08 looks very out of line. I think the question here is when, if ever, do you ignore the underlying stats and go with the results?

I actually would take Hudson for the rest of this year and Haren next year.

Where do you get BABIP, K/9, etc. split stats from?

PS: If anyone here listens to the ESPN Fantasy Focus Podcast, Matthew Berry and Nate Ravitz have the Dan Haren argument all the time.

patrick dicaprio
Guest
patrick dicaprio
I have said it before and will say it again: when it comes to this topic you are mostly right, this is a perfect example of the Texas Sharpshooter Fallacy. Pat, above, talks about pressure, and others talk about stuff that tries to explain and give reasons for things that happen for no reason. There is not necessarily an explanation or a reason why everything happens; randomness is easily the biggest factor in most of life of fantasy baseball projections are no different. you get points as an analyst for process, not for individual players. it is a fact that… Read more »
BobbyRoberto
Guest
BobbyRoberto

@Pat,

For the BABIP, K/9 splits, I used Baseball-Reference.com.  I looked up the player, then used the Game Logs tab to look at a specific year.  Then I highlighted the first game and the final game before the All-Star break and got the totals for that stretch of games (this included BABIP).  For K/9, I just did the math based on the total Ks and IPs for the highlighted stretch of games.

obsessivegiantscompulsive
Guest
obsessivegiantscompulsive

Thank you for clarifying Derek and for the preview on your future article.

BobbyRoberto
Guest
BobbyRoberto
Even Haren’s first half/second half splits are wonky. If you look at ERA and WHIP, there’s a compelling case that he’s better in the 1st half: 2009/1st:  2.01, 0.81 2009/2nd:  4.62, 1.26 2008/1st:  2.72, 0.95 2008/2nd:  4.18, 1.37 2007/1st:  2.30, 1.00 2007/2nd:  4.15, 1.50 If you look further, he looks mostly like the same pitcher each half, except for BABIP: 2009/1st:  .233 BABIP, 8.9 K/9, 1.1 BB/9 2009/2nd:  .315 BABIP, 8.5 K/9, 2.0 BB/9 2008/1st:  .256 BABIP, 8.0 K/9, 1.6 BB/9 2008/2nd:  .375 BABIP, 9.4 K/9, 1.8 BB/9 2007/1st:  .234 BABIP, 7.0 K/9, 2.2 BB/9 2007/2nd:  .357 BABIP, 8.8 K/9,… Read more »
Derek Carty
Guest
Derek Carty

Count me in for Haren 100%.  When you consider that what’s really wrong with his second-half is BABIP and that he’s basically been bad for about 175 second-half innings, that’s NOTHING in terms of BABIP.  You need 10 times that to account for even half of the inherent variation in BABIP!

Pat
Guest
Pat
Thanks Bobby. I understand that fantasy baseball in a ton of luck and there are plenty of things that happen that cannot be explained. I don’t think you can remove the human element from the equation. Players in general will regress toward the mean, but it does not mean it is true for everyone. Why is it immpossible to draw reasonable conclusions that a pitcher pitches differently in certain situations? It is no where near fool-proof and may involve luck, but I think it is helpful in evaluating a player. What about some closers who pitch better when they are… Read more »
Derek Carty
Guest
Derek Carty
Pat, You ignore the underlying stats when you have legitimate reason to.  You seem something in his mechanics or in his approach or in his PITCHf/x data that indicates “Hey, this isn’t the same pitcher he was when he posted these past numbers.”  And even then, you wouldn’t ignore those numbers completely, just alter the way we’re projecting his future performance. FanGraphs has the splits by month, but not by half. And like Pat DiCaprio said, of course there is a human element to all of this, but pretending like we know what that element is for each individual player… Read more »
Pat
Guest
Pat
I agree with what you are saying. There is not always an explanation (other than luck) as to why a pitcher’s ERA does not match the underlying stats. I know you can get yourself in trouble by trying to find a reason in something that does not have a reason. I think it is a worthwhile cause if uses properly. Like your stated in Haren’s case, I understand that his 1st/2nd half splits are statistically reasonable. That the theory you are explaining is that if could put 2 Dan Harens in the MLB, in the exact same situation it is… Read more »
Derek Carty
Guest
Derek Carty
Fair enough, Pat.  I would be interested to hear exactly why you think Haren is an exception, though.  What leads you to believe that? Verlander might not have been a good example because his ERA was so good despite a worse-than-average BABIP, but I don’t think it would be hard to dig out a bunch more examples of guys who had an unlucky BABIP for a year and a half.  My point was mostly that over just a year and a half, nearly everyone is willing to write off BABIP as random variation.  Shoot, we’ve got nearly a year of… Read more »
Pat
Guest
Pat
The thing with Dan Haren is that there is drastic difference in 1st and 2nd half ERA and it has continued for the last several years. His whole career minus 2006. Although, the 375 innings are techinally random statisitcs wise, they not completley random because they are taken after Haren has pitched 125 innings or so. Does something happen to Haren after 100 or so innings? Dan Haren is not a coin that has the same odds on the 1st flip as the millionth flip. As you mentioned you would need thousands of more innings for BABIP to stabilize. Therefore,… Read more »
wpDiscuz