FanGraphs Logo

A Random Walk with FIP

Recently, I have begun to notice more and more distain for Defensive Independent Pitching Statistics (DIPS). There is a sizable group of individuals that believe some DIPS such as Fielding Independent Pitching or FIP is a poor metric because certain pitchers consistently “outperform” their FIP. More specifically, some starting pitchers consistently have lower Earned Run Averages than their FIP implying that there is something that FIP fails to account for. While there is no denying that FIP is imperfect, I could argue that all metrics are imperfect, thus saying so is somewhat trivial. Unfortunately for those that use Matt Cain and the likes as poster boys for “Why FIP is Flawed”, a small handful of counter examples is incapable of delegitimizing a stat like FIP.

Thought Experiment

Let’s begin by making some assumptions:

1) FIP is a perfect statistic that accurately measures a pitcher’s true talent level.
2) ERA equals FIP + ε, where ε can be seen as the luck or error term. => ε = (ERA-FIP)
3) ε is independent and symmetrically distributed around FIP.
4) There are 100 starting pitchers in the league (There are in fact about 150, but we’ll use 100 for simplicity)

Now that we have established this idealized situation, we can now begin our thought experiment. Because ? is symmetrically distributed around FIP, there is an equal probability that a pitcher will have an ERA lower than their FIP as they will have an ERA higher than their FIP. Using this fact, it follows that in our first year, if we have 100 pitchers, we expect half to outperform their FIP. This means that there are 50 players that outperformed their FIP in year one. Of those 50 players that outperformed their FIP in year one, we would expect 25 (.5*50) of them to outperform their FIP in year 2 by pure chance. Of those 25 players that outperformed their FIP in year two, we would expect 12.5 of them to outperform their FIP in year 3 by pure chance. Similarly, we can continue down this path halving the number from the year before. In year four, we would expect about 6 pitchers to have continued to outperform their FIP, and by year 5 we would expect just over 3 pitchers to have consistently outperformed their FIP by pure luck.

Because we started with 100 pitchers, we expect that about three of the pitchers would outperform their FIP in 5 consecutive years, by randomness alone. Many people point to those three pitchers and say, “Clearly, FIP is not accounting for something those three pitchers do.” We can now completely discount that argument for the “simulation”, because we have assumed FIP to be perfect. Thought experiments are nice because they easily allow you to comprehend and visualize a phenomenon, but there is not a lot to glean, if the experiment is completely incongruent with reality.

 

Reality Check

I began by working backwards, looking at starting pitchers in 2011. Thankfully, our leaderboard has a stat called E-F, which is precisely the ε term I described above. 50% percent of the pitchers outperformed their FIP in 2011 (this is a nice start. Of the 50% that outperformed their FIP in ’11, 41% (a bit lower than we would expect) outperformed their FIP in 2010, giving us about 22% of our original group of pitchers (in the thought experiment we had 25% at this point. Of the remaining 22%, 52% outperformed their FIP in 2009, giving us about 11% of the original group – very close to the expected 12.5%. Of the remaining 11%, 56% outperformed their FIP in 2008 giving us about 6% of our initial group – almost identical to the 6.25% from the thought experiment. Finally, of the just over 6% that outperformed their FIP for four consecutive years, 60% outperformed their FIP in 2007, giving us a not so unexpected final total of 3.6%. If you look at how these numbers compare to our thought experiment above, the similarities are staggering. I went through the same process again, but this time starting in 2007, and working my way to 2011. The results were 56%, 25%, 11%, 6.7%, and finally 4% of the original starting pitcher group in 2007, 2008, 2009, 2010, and 2011 respectively, which again is striking similar to our idealized situation above. The pitchers that outperformed their FIP for 5 consecutive years were Ted Lilly, Jeremy Guthrie, and of course, Matthew Cain.

This doesn’t mean that Cain & Friends have solely benefitted from luck, nor does it mean that FIP is in fact perfect, but it does mean that using Cain and others like him to discredit FIP doesn’t make sense.

 




Print This Post

87 Responses to “A Random Walk with FIP”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. Mike Green says:

    Many have a more nuanced view. If you project next-year ERA from 1 year statistics (i.e. 160-240 innings), FIP will on average be more accurate than ERA. If you project next-year ERA from longer-term statistics (600+ innings say), ERA will be on average more accurate than FIP. FIP is a tool, and (like all tools) has its limitations.

    +5 Vote -1 Vote +1

  2. Tom O says:

    Have you read Mike Fast’s new work on batted ball speed? He claims to be able to predict pitcher’s BABIP based on the speed of batted balls they allow, up to a certain level of reliability. He doesn’t pretend to have the whole thing figured out yet, but there is certainly evidence that pitchers can outperform their FIP through skill (for example, Matt Cain or Mariano Rivera, who both consistently did well). That said, he also doesn’t claim that his work means FIP is unreliable, just that the concept of it could be made even better and more accurate through use of more updated information.

    Vote -1 Vote +1

    • Naveed says:

      The most plausible theory with Cain has always been that he generates weak contact. It would be nice to look at that data to see the assumption holds up.

      Vote -1 Vote +1

      • Mike Fast says:

        Matt Cain had his worst BABIP allowed year in 2008, and the HITf/x data seem to show he deserved what he allowed that year, but it makes it tough to draw any conclusions about his lower BABIP allowed in other years.

        Vote -1 Vote +1

    • Nathan says:

      Love this post. FIP is a very useful stat and really tells us a lot more about the vast majority of pitchers than ERA does. But, it is not perfect, and everyone should (and I believe does) agree that better data is the only way to improve FIP (see the “HitFx” comment below).

      Vote -1 Vote +1

  3. Noah Isaacs says:

    I haven’t read it yet, but our own Matt Swartz wrote on what pitchers could and couldn’t control:

    http://www.baseballprospectus.com/article.php?articleid=10281

    I am not arguing that pitchers have no control over their BABIP – I think what little control they have is somewhat insignificant and unsustainable.

    Vote -1 Vote +1

    • Tom O says:

      Good point. Mike’s article definitely suggest that pitchers can control it, but he only studied one season of data (and used it to predict a second season) because he only had 2008 HitFx data available, so it’s far from a definitive finding on whether they can control it consistently. Very interesting topic.

      Vote -1 Vote +1

  4. Marver says:

    I’d be interested in seeing the magnitude of epsilon used rather than a simple +/- binary view of it. However, this is still a reasonable exercise, confirming what the theoretic statisticians believe.

    Vote -1 Vote +1

    • Anon says:

      The view that magnitude has no relevance (as displayed in your thought experiment) is a glaring flaw to your logic.

      How many pitchers were active for 5 consecutive years? This seems to be a requirement of this experiment.

      Vote -1 Vote +1

      • Noah Isaacs says:

        I was examining a flawed argument. I do believe the magnitude makes a difference, but that was not part of the flawed logic I was examining. Also, I am confident that people make the same mistake when looking at the normal distribution (not just in the binomial case).

        If you look at some data and see that it is normal, and you then see that one of the points is 3 standard deviations away from the mean, what do you conclude? Outlier? That is what we are taught in Stats 101, but if there are 1000 data points, we would expect to see, on average, about 3 data points outside of the three stdev line, by pure randomness. I think people, myself included, forget this all the time.

        Vote -1 Vote +1

      • B N says:

        @Noah: The issue is though, we don’t just have this mystical coin-flip distribution as noted. Think of these two cases:

        1. Performance of pitcher ERA relative to FIP is distributed according to a normal distribution for each season.
        2. Performance of pitcher ERA to FIP is distributed to a normal distribution to each pitcher.

        They’re both normal distributions. In both cases you would expect 3 guys outside the line out of 1000. The difference is that in case 1, it will typically be 3 different guys every year, while in case 2 it will typically be the same guys.

        What people are arguing is that it is typically the same guys who significantly outperform. The fact that ERA eventually becomes a better estimator of ERA than FIP pretty much confirms that.

        Vote -1 Vote +1

      • Anon says:

        (B N covered some of this.) Yes, more data would create the expectation of results further from the mean. Follow up with more analysis. How many outliers are expected? How many actually result? What are the odds of a person having repeated outlier seasons? Does anyone?

        Outliers don’t necessarily result from statistical chance. Barry Bonds and Pedro Martinez both had career peaks that are outliers (as a result of skill). The goal of many advanced baseball statistics is to determine skill by removing chance. In my opinion, FIP takes this a step too far and sacrifices by removing a hard to measure skill to reduce noise from chance.

        Vote -1 Vote +1

  5. Bill but not Ted says:

    This article is why I check FG everyday. Very well written

    Vote -1 Vote +1

    • Noah Isaacs says:

      Thanks BbnT, I appreciate it.

      Vote -1 Vote +1

    • Barkey Walker says:

      I agree, this is a great example of a FG article.

      Vote -1 Vote +1

      • Husker says:

        Me, three. Outstanding.
        It amazes me how many commenters are still determined to find FIP fundamentally flawed, even though pointing out the statistically expected number of exceptions is no refutation at all.

        Vote -1 Vote +1

      • Husker,

        Not a single commenter in these comments has attempted to dispute FIP just because there are exceptions (which as you say should be expected). Rather, the commenters who have criticized FIP in these comments are arguing that there are pitcher skills that FIP does not account for…and that the existence of these skills is why FIP has certain fundamental weaknesses.

        Read through again. No one here is saying what you are accusing them of saying.

        Vote -1 Vote +1

      • Husker says:

        You need to be more rational, “rational.” At least two commenters used Matt Cain to refute this articlde and one Mariano Rivera without offering any proof of what their “special ability” is.

        Vote -1 Vote +1

      • I didn’t say that people were offering PROOF. I said that people are making distinct claims from what you and Issacs are claiming. They are claiming that there is an ability to control things FIP says you can not control. You and Issacs are both discussing the claim that statistical outliers mean that FIP fails (which is an obviously wrong claim).

        The only reference to Rivera in the comments (besides yours) reads as follows:

        “there is certainly evidence that pitchers can outperform their FIP through skill (for example, Matt Cain or Mariano Rivera, who both consistently did well)”

        See how that comment is making an argument different from the one Issacs is discussing? The commenter is not saying that the mere presence of Cain and Rivera as statistical outliers disproves FIP, but rather that Cain and Rivera outperform their FIP because they can do something FIP does not believe they can do (control BABIP).

        And, reading through all the comments referring to Cain, I see absolutely nowhere where anybody suggests the argument that you claim commenters are making. Where do you see this?

        Vote -1 Vote +1

      • Obviously, that new claim has to be discussed. In other words, can certain pitchers actually control their BABIP, HR/FB, etc.? All I am trying to say is that Issacs’ argument doesn’t refer to that claim at all, and THAT is the claim that most FIP critics make (including in these comments).

        Vote -1 Vote +1

  6. craigtyle says:

    I’m not clear what the universe was — all starters in 2011, or only those who were starting pitchers for the past five years? If the former, didn’t some of them “drop out” (i.e., were not in MLB in the earlier years)?

    Vote -1 Vote +1

    • Noah Isaacs says:

      It was pitchers with at least 130 IP. Some pitchers do drop out, that is why I dealt in percentages instead of absolute numbers.

      Vote -1 Vote +1

    • wahooo says:

      I’m with you….working backwards there were 3 pitchers that made up the 4% that beat the FIP for 5 years–does that mean that 75 pitchers pitched all 5 years? Seems a bit high–what was the IP cut-off?

      Vote -1 Vote +1

  7. jesse says:

    I’m sorry, this is a really useless analysis. Why not assume a normal distribution then use hypothesis testing to see if their is some sort of disturbance in the error? This symmetrical distirbution assumption does nothing to tell me how significant the number of pitchers with consistently large error terms is. the decrease from ~50 to ~25 ect seem like a nice fit for your thought experiment, but evaluating them as a series with a certain direction is improper. It would be more reasonable to compare the first year to 50, the second to 25 and the third to 12.5 ect, whereupon the gaps may be much more significant (1/6, 2/25) than they appear by just grossly lining up the comparison

    The thought experiment itself is also fairly flawed. Why assume that to be counted as an abberation the pitchers biases should be repeated every year? You only followed the group that had one bias in all four years. The number with bias on average or even 4/5 out of five years would be much larger, and frankly more interesting. There is no reason to impose a false time series.

    I was hoping you’d actually use a random walk analysis, but instead you imposed a path dependency.

    Vote -1 Vote +1

    • Noah Isaacs says:

      The point of the article was to point out that the argument, “some starting pitchers consistently have lower Earned Run Averages than their FIP implying that there is something that FIP fails to account for.” is not reasonable. Of course there could be further analysis.

      Vote -1 Vote +1

      • I believe you do a good job refuting that argument, but I fear that virtually no one makes that argument, without an accompanying claim that there is some skill that these pitchers employ that FIP does not account for (e.g. ability to suppress BABIP).

        In other words, it is fairly obviously wrong to say that JUST BECAUSE there are statistical outliers, FIP fails. But most people arguing against FIP don’t merely make this claim. Rather, they attempt to explain WHY certain pitchers tend to be outliers.

        Vote -1 Vote +1

      • Kevin S. says:

        But given that lower BABIPs is one of the chief ways to have a positive ERA-FIP, doesn’t this still lead us to determine if BABIP suppression was skill or randomness?

        Vote -1 Vote +1

      • Yes, exactly. I was just noting that THAT is the real issue that needs to be discussed. This article, however, merely refutes (correctly) an argument that few people make.

        I am not saying that pitchers can or can not control their BABIP through skill. I am just saying that most FIP critics base their arguments on the claim that pitchers CAN control their BABIP, and not on the claim that the mere presence of statistical outliers refutes FIP (which is what this article argues against).

        Vote -1 Vote +1

      • Wouldn’t one use a similar analogy and logic to ‘decoding’ why some blackjack players beat house odds routinely and others don’t? So are Cain and his ilk, the ‘card counters’ of baseball?

        Vote -1 Vote +1

      • Cliff says:

        Absolutely not. A card counter’s results could not be statistical noise, the sample is too enormous and the results far too much of an outlier

        Vote -1 Vote +1

      • Nathan says:

        RSF, I completely disagree. EVERY DAY on this site when FIP is referenced in an article, there is a small handful of trolls in the comments that dismiss FIP out of hand because they can find one or two examples where FIP doesn’t look right for a certain pitcher.

        Just to be clear, I agree with the opinion that FIP is the best tool we have today to evaluate a pitcher’s ability independent of as many outside factors as possible. But I also fully acknowledge that as a result, we are sacrificing by leaving out some skill that certain pitchers deserve credit for. FIP is far from perfect, but it is more useful than considering fielding, errors, subjective score keeping, park effects, etc.

        Vote -1 Vote +1

      • Fair enough, Nathan. I haven’t really noticed that, but if those people are out there, then this article serves a purpose.

        Vote -1 Vote +1

      • Husker says:

        Wrong and wrong again, rational. Every commenter who disagrees with FIP uses exceptions to “disprove” it and offers no proof for the “special skill.”
        Yes, they say it must be inducing weak contact or something, but that is just flailing for an explanation for which there is no evidence.

        Vote -1 Vote +1

      • Husker,

        The issue is not whether anyone has yet offered proof. I am not saying that people should accept the claim that “Certain pitchers can control their BABIP, HR/FB, etc.” I am saying that that claim is what most FIP critics make. They are NOT merely making the claim that the presence of statistical outliers (which should be expected) means that FIP must be a failed stat.

        Once again, I am not attacking FIP here. I am just saying that your and Issacs’ comments do not dispute what FIP critics actually say. If you want to do that, you need to argue against the claim that pitchers can control their BABIP, HR/FB, etc.

        Vote -1 Vote +1

      • We might be talking past each other, Husker.

        Here’s what I am saying: Issacs refutes Argument A, but FIP critics don’t make Argument A. They make Argument B.

        Here’s what I think you are saying: There is no good evidence to support Argument B.

        All the above claims can be true at the same time.

        Vote -1 Vote +1

  8. mister_rob says:

    This study identifies the guys that managed to do it 5 years in a row
    doesnt address the guys like beurhle who have done it 7 of the last 8 years

    I think the one pitcher type that FIP continually underrates is the guy known as not a flame thrower, but a really smart pitcher (like maddux, beuhrle, lilly, etc). Those guys all made a living pitching to contact, throwing the ball where the scouting report said to throw it

    Vote -1 Vote +1

    • Kevin S. says:

      Why do people constantly lump Maddux in with this type of argument? He only outperformed his FIP by a tenth of a run over the course of his career. That’s not exactly significant.

      Vote -1 Vote +1

      • mister_rob says:

        look at Maddux’s career more closely

        In the 15 years between his first real good year and his last real good year, he outpitched his FIP 12 times. 7 times by a margin of a half a run or greater

        as he was in the twilight with the cubs, dodgers, and padres…he couldnt pitch up to his FIP. Which is why his career looks more balanced

        compare his 12 out of 15, and 7 times by over a half a run to his teammate Smoltz, who never ever in his career outpitched his FIP by more than .21
        you tell me that is a coincidence??

        Vote -1 Vote +1

    • Noah Isaacs says:

      The probability of someone randomly outperforming FIP in 7 of 8 years is 3.25% ((8 choose 7)*(.5^7)*(1-.5)^1) meaning that if we started with 100 pitchers, we would expect three to accomplish that feet by pure chance.

      Vote -1 Vote +1

  9. Matt says:

    Good article, the problem I do have with it is you seem to treat outperforming FIP as a single thing. However, there are multiple reasons one might outperform FIP. Are they doing it by limiting home runs? Limiting BABIP? Stranding runners?

    I’m no math major, but it seems like if the pitcher is outperforming FIP by doing the same thing better than average the chances of it being part of the random 12% 6% 3% chance are lower than that.

    Vote -1 Vote +1

  10. Anon says:

    RationalSportsFan said it well:
    I am just saying that most FIP critics base their arguments on the claim that pitchers CAN control their BABIP, and not on the claim that the mere presence of statistical outliers refutes FIP (which is what this article argues against).

    My problem with FIP is that it ignores BABIP. In other words, it assumes BABIP is completely determined by park, defense, and chance. This is untrue. Pitcher defense and quality of contact both have an effect that FIP (and fWAR) ignore. Both of these factors have a controllable skill range. Thus FIP (and fWAR) underrate pitchers who excel at these skills.

    Vote -1 Vote +1

  11. B N says:

    There’s an issue with even this thought experiment, mainly that it throws out a huge amount of data for no good reason. ERA and FIP are both interval measures- the concept of distance is important. Instead of using a statistical analysis that considers that, you lump everything into a binary analysis (outperformed vs no).

    It’s obvious why this is a bad idea. If I have a pitcher who outperforms his FIP by a full run every year, your analysis makes him entirely equivalent to the guy who was 0.000001 better on ERA than FIP. A better analysis would be to examine out how the distribution of |ERA – FIP| shrinks as you increase the innings observed. That way, you can examine how much your error is shrinking as you increase your data. If your error stops shrinking at a certain point, you’ll have a bit of an idea of how much systematic error is in play. Of course, this assumes that pitcher’s “natural talent” isn’t changing over time, but it still beats throwing out swathes of data in my opinion.

    Finally, the reason why DIP stats get a bad name is that they are an exceptionally crude metric for underlying talent. If you can get a better estimate of someone’s ERA from the trendline of their ERA as data increases, that basically means that FIP is damping random error at the cost of introducing additional systematic error. All told, while still useful, that’s not what I’d generally term as a “true talent” metric if it introduces systematic error.

    Vote -1 Vote +1

    • Nick44 says:

      Each probability would have to to be the result of a distributional fit and then integrating to the edge of the distribution to get the probability of outperforming it by that much.

      Heck, I’ll do that.

      Vote -1 Vote +1

      • B N says:

        Yah, if one has the time and the stats packages on-hand, these things aren’t like some sort of arcane science. That is why I was a bit surprised that Noah went with a coin-flip analysis. Convenient, I guess, but doesn’t really show all that much.

        Vote -1 Vote +1

  12. Cliff says:

    The point is that you can’t use the fact that a few pitchers consistently outperform their FIP to argue that FIP is not working. It could just be random chance.

    Yes, lots more work can be done on this and I believe pitchers can influence BABIP to some extent. But no one is providing any evidence of that in the comments.

    Vote -1 Vote +1

    • mister_rob says:

      I think the evidence would be that teams spend lots of time and energy doing things like pitch charts and advanced scouting for a reason

      They use those things to come up with sort of a pitching/defense game plan. and the pitchers who can actually execute that game plan generally will have better results than I guy who cant

      Pitching to the right spot I think any pitcher would tell you has alot to do with the quality of contact against him. and surprise, the guys known for having pinpoint control within the zone (ie maddux, beuhrle) are the guys generally outpitching their FIP every year

      Vote -1 Vote +1

    • “The point is that you can’t use the fact that a few pitchers consistently outperform their FIP to argue that FIP is not working. It could just be random chance.”

      Yes, but basically everyone agrees on that. It would be like having 100 people flip a coin 5 times in a row. We should fully expect about 3 to flip heads every time. Obviously that (by itself) does not give us reason to conclude that those three are able to control the results of their flips. We should expect those extremes, given our sample size.

      But, again, virtually nobody makes the mere claim that “If there are players who consistently outperform their FIP, then FIP is a failed stat.” Rather, FIP critics attempt to explain why certain players outperform their FIP, ultimately making a claim like “If pitchers can control their BABIP in a meaningful way, then FIP is a failed stat.”
      and then “Pitchers can control their BABIP in a meaningful way.”

      So, if no one makes the argument that this article is refuting, then what work is this refutation doing? No one who has thought about this for half a second would say that the mere presence of outliers means FIP is a failed stat.

      Vote -1 Vote +1

      • Nathan says:

        Sad thing is, no, people don’t agree on that. There are probably hundreds of posts on this site with comments sections littered with trolls that either just don’t get it, despite the fact it is a simple concept, or they are being intentionally dense.

        Vote -1 Vote +1

      • TK says:

        In other words, this is a strawman argument, at least on Fangraphs.

        Vote -1 Vote +1

      • Husker says:

        Three strikes and you’re out, rational. The fans who argue against FIP do exactly what you say they don’t do.
        You find one case where any of them has actually produced any evidence (other than the exceptions).

        Vote -1 Vote +1

      • Husker, read my above comment. You and I are not even disagreeing. You are just misunderstanding what I am saying.

        Vote -1 Vote +1

    • Cliff Notes version:

      “The point is that you can’t use the fact that a few pitchers consistently outperform their FIP to argue that FIP is not working. It could just be random chance.”

      Yes, and virtually every critic of FIP agrees with this. So, who is this article addressing?

      Vote -1 Vote +1

  13. Eric R says:

    Just to expand the sample a bit– I grabbed every pitcher season of 130+ IP from 1996 to 2011 [1832 of them].

    This sample gives 342 runs of five seasons [with overlap, ie AJ Burnett has 2005-2009, 2006-2010 and 2007-2011 as his five year runs].

    I got 5.2% ERA>=FIP and 8.2% ERA<FIP, both well above the expected ~3%.

    Vote -1 Vote +1

    • Eric R says:

      Adding a sixth year, we’d expect ~1.5% each to over or underperform in all six… I got 4.0% and 4.4%

      Seven? 0.75% each… I get 3.4% and 2.0%

      Unless I’m making a mistake somewhere, over a larger sample there defiantely appear to be quite a few more than expected by chance alone.

      Vote -1 Vote +1

      • LarryinLA says:

        The overlaps mean your runs are correlated. So, you can’t do it that way.

        Vote -1 Vote +1

      • Eric R says:

        “The overlaps mean your runs are correlated. So, you can’t do it that way.”

        Easy to test. Break it out by year.

        1996-2000: 32 pitchers with 130+ IP each year. Three pitchers had ERAs worse than FIP every year and two had ERAs that were better every year. That is 9.4% and 6.3% respectively, with random chance suggesting ~3%.

        1997-2001: 32 pitchers, 9.4% and 3.1%.
        1998-2002: 31 pitchers, 12.9% and 0%
        1999-2003: 28 pitchers, 7.1% and 7.1%
        2000-2004: 28 pitchers, 3.6% and 17.9%
        2001-2005: 30 pitchers, 3.3% and 13.3%
        2002-2006: 29 pitchers, 3.4% and 10.3%
        2003-2007: 32 pitchers, 6.3% and 12.5%
        2004-2008: 30 pitchers, 3.3% and 6.7%
        2005-2009: 27 pitchers, 0.0% and 7.4%
        2006-2010: 25 pitchers, 0.0% and 8.0%
        2007-2011: 27 pitchers, 0.0% and 11.1%

        The averages of those comes out to 4.9% and 8.9%.

        Vote -1 Vote +1

      • Eric R says:

        In the groups 2000-2004 through 2007-2011 there ERA>FIP crowd did stay very close to the expected 3%… in the 1996-2000 through 1998-2002 groups the ERA<FIP guys were also very near the expected.

        I wonder if there are some league-wide differences which have made it "easier" to have a low ERA relative to FIP recently vs making it harder earlier?

        In the periods starting in 1996-1998 the two groups are at 10.6% and 1.6%… then 1999 at 7.1% and 7.1%… then 2.5% v 10.9% afterward.

        Vote -1 Vote +1

      • Paul Thomas says:

        Eric, unless Fangraphs is adjusting the constant in FIP (the classic equation just uses 3.20 for the constant), the oddities you’re seeing are likely to be artifacts of the (substantial) changes in the leaguewide run-scoring environment between 1998 and 2011.

        Vote -1 Vote +1

      • Paul, I believe the constant is adjusted every year by the following formula : (LeagueAVG ERA – LeagueAVG FIP)

        Vote -1 Vote +1

      • Eric R says:

        Paul- but that shouldn’t change whether players are putting up ERAs better or worse than their FIPs, would they?

        In any case, since my years are just starting points for five year runs, it would still seem odd for such an abrupt change, no? I suppose I would have expected a gradual change if it was just about the more recent ‘correction’.

        Vote -1 Vote +1

  14. KSinDC says:

    Maybe I’m not following something here, but if we invented a junk metric (call it LIP and base it on number of times a pitcher shakes off his catcher), centered according to ERA and gave it some sort of non-crazy distribution, wouldn’t we see the same sort of properties, even thought LIP would be worthless?

    If you have two distributions centered at the same point and stretching over roughly the same range, it’s going to be rare for a value in one distribution to consistently be more of an outlier than the corresponding value in the other distribution but this tells us nothing about the quality of either distribution.

    If your point is that we shouldn’t find a few consistent outliers as proof of FIP’s flaws, that’s true, but isn’t it also true that the fact that there are only a few consistent outliers tells us nothing about FIP’s quality?

    +5 Vote -1 Vote +1

  15. FFFFan says:

    Nice way to visualize the FIP/ERA relationship, but I’m pretty sure you simply verified that the FIP coefficients are roughly correct (from a regression, the epsilon will be orthogonal). Also, the magnitude of epsilon may be better represented by the R squared.

    Vote -1 Vote +1

  16. thomas says:

    I am not one of those who think FIP is lousy, it’s a great stat for doing what it should be used for. To predict next year. It should not be used in WAR as that is measuring this year, for that you should use park and defense adjusted RA.

    Vote -1 Vote +1

    • Nathan says:

      But you are one of those people that is missing the very clear point that FIP is still a record of only things that actually happened on the baseball field.

      If you have a bone to pick, it is with xFIP.

      Vote -1 Vote +1

      • Barkey Walker says:

        But did they happen? Ever seen a called third strike 6 inches off the plate?

        Vote -1 Vote +1

      • Paul Thomas says:

        Sure, it’s measuring things that happened on a baseball field; it’s just that those things have a far less direct relationship to winning than runs allowed does. For measuring the past (obviously not for predicting the future), they’re just bad proxies of what people really care about.

        It’s like claiming that the best football offense in a given year was the one that gained the most yards rather than the one that scored the most (field-position-adjusted) points.

        Vote -1 Vote +1

  17. Evan says:

    Matt Cain got this strawman argument to hit an infield popup.

    Vote -1 Vote +1

  18. Tom says:

    The problem is some in the SABR community (and I’m not suggesting this author is) get really defensive when questions about the accuracy of an advanced statistic are asked. Suggesting that some pitchers who outperform their FIP may be indicative of a skill (other than K’s, BB’s and limiting flyballs) is suddenly considered attacking the stat instead of looking at the error bar of the stat and/or potentially evolving it. (Yes I used a strawman much like the author did in setting up this article)

    The key failure in this article is how “consistently” is defined in this thought experiment. As some have mentioned magnitude is also important and consistently should not be used interchangeably with “every year”. If a pitcher has a significant delta and had 1 or 2 years out of 10 where FIP > ERA, I still think he would be pointed to as “consistently” outperforming his FIP.

    Rather than just making this into a binary coinflip, why not look at the actual standard error of FIP (which I never have seen a study on) and use that to determine how many outliers there are over a 5 year (or whatever time period desired)… A statistical test of significance would be able to isolate the probability of those pitchers simply being outliers or there being a real difference.

    Unfortunately this statistical shorthand and thought experiment is not really testing/analyzing what the author is suggesting.

    Vote -1 Vote +1

    • Nathan says:

      The problem is actually three-fold. Yes, there are stat-evangelists that get immediately defensive. The other part of the problem are the “old school thinkers” that say, “Billy Beane ain’t won nuthin’!” and therefore dismiss stats other than Wins and Batting Average. The third part of the problem are the people that acknowledge the usefulness of stats, but for some inexplicable reason have a difficult time understanding that the majority of people that have worked toward creating good DIPS were among the first to say, “This is good, but we can do better, and will continue to try to do better.”

      Vote -1 Vote +1

    • B N says:

      The other issue being that FIP is considered an “advanced statistic.” I mean… it’s a weighted linear sum of outcomes. That’s advanced?

      At least from my perspective, FIP is an incredibly crude stat- very much on par with ERA but with a different utility. The problem is that a truly advanced stat (see: a more reliable measure that separates performance from outcomes) would be hard to intuitively understand. There would almost certainly be non-linear interactions, at a minimum.

      Vote -1 Vote +1

  19. Jonny5 says:

    I absolutely love this site as well as the comments. This is my first comment actually. Unlike many of you I use these metrics in a very different way. I use them to evaluate what has happened and don’t even try to get into predictions or fantasy baseball. I just enjoy the game immensely and love the work people at fangraphs contribute to the game and understanding of statistics they share with the world. I’m a lurker here, and I felt it was time to say thanks for all the work you do to expand the knowledge of metrics to anyone willing to learn it. One day this knowledge will be used on a much wider basis, I’m sure of it. And once again, thank you. On the ” FIP” side. I don’t think any of these metrics can predict what an imperfect human will do next year, it does accurately set the bar for what they are most likely capable of though and you can’t ask for much more than that. Math is perfect and people, well they’re far from that. Anyway I’m rambling and I just wanted to show appreciation for your contribution to my own understanding of these stats. Many thanks fangraphs contributors and comments

    +7 Vote -1 Vote +1

  20. Chok says:

    I’m somewhat new to sabermetric analysis. What strikes me as odd in these conversations is that “Fielding Independent Pitching” is an obvious oxymoron; it is not fielding-independent; innings pitched (getting the outs) is obviously fielding dependent to a large degree, and that’s half the stat (the denominator). Strikeouts only account for about a third of outs. Some portion of the remaining two thirds is routine glovework and some portion is fielder placement, skill, etc. I played in an ottoneu points league last year (flawed as any fantasy game will be, but really fun, with pitcher points based on FIP) and was struck how some relief pitchers would come into games and give up hard line drive double after hard double, give up the lead, exit with a single out, and penalize the fantasy team not a bit (zero points in zero innings, as if they’d stayed on the bench). I’m not sure if xFIP would have caught something like that (line drive/fly ball), but I’d almost rather see the denominator in the FIP equation be number of pitches thrown. Obviously, I’m generalizing from the tiniest sample size, but I don’t think it’s an absurdly small sample size; my experience of the game tells me that such outings are the pitcher’s fault to a significant degree. I think perhaps what the “FIP” stat is missing is something on how hard/accurately the bat can be manipulated against certain pitchers. Maybe leave singles out of the equation but let doubles and triples be factors, along with fly balls. What do you think?

    Vote -1 Vote +1

  21. gnomez says:

    “This doesn’t mean that Cain & Friends have solely benefitted from luck, nor does it mean that FIP is in fact perfect, but it does mean that using Cain and others like him discredit FIP doesn’t make sense.”

    “Me fail English? That’s unpossible!”

    Seriously, what?

    I’m guessing you mean “nor does it mean that FIP is in fact IMPERFECT” and “like him TO discredit” but good research..

    Vote -1 Vote +1

    • Richard says:

      no, “perfect” is correct there (remember, he was assuming its perfection for the purposes of the original exercise); the only mistake there is the missing “to”, which you so helpfully pointed out

      Vote -1 Vote +1

  22. MK says:

    If outperforming FIP is a skill it would likely be a very difficult skill to attain (we don’t see many pitchers do it). Further, it would likely be a command-related skill, not velocity. Therefore, injuries would likely have an effect on the skill, even small ones because this is a very specialized skill that requires absolute peak performance.

    When we look at the data over a five year period it is unlikely that many pitchers would be able to consistently stay healthy enough to maintain the outperform FIP skill. One more thing…

    Another thought… it could also be a lefty a lefty thing; the ball just moves funny coming from a left-hander. Buerle, Lilly, and Glavine (During his full seasons, 1988-2007, Tom Glavine outperformed his FIP 16 of 20 times.)

    The injury idea came from watching John Lannan (who is also a lefty), he has outperformed his FIP in three of four years. Watching him in 2010 (the season he didn’t outperform FIP), you could tell something was off….

    Vote -1 Vote +1

  23. Theo says:

    I don’t like FIP because:

    It uses old data (walks, strikeouts, hr’s). To me, it’s just a glorified K/BB ratio. I already know that K’s are good and that walks/HR’s are bad.

    The sample size is too small. Walks/K’s/HR’s only account for about 30% of pitchers performance. What about the other 70%? You need to use data from every batter the pitcher faced.

    It doesn’t use more comprehensive Ball/Strike data. Working the count is more than half the battle. A pitcher that gets 2 strikes has a major advantage. Avoiding 3 ball counts helps a lot too.

    It’s an ERA type number. If purpose of FIP is to avoid ERA, then it shouldn’t be trying to duplicate ERA. It’s like calling the kettle black.

    Vote -1 Vote +1

    • Nathan says:

      I think your points are fair, but I don’t get how FIP uses “old” data. Those data points are fundamental outcomes in the game of baseball, and will always be critically important to both offense and pitching analysis.

      DIPS needs improvement to be sure, but that wouldn’t result in “new” data replacing the “old” data — it will result in “new” data being brought in to better balance the “old.” Ks, BBs, and HRs will still be of the utmost importance when describing a pitcher’s performance.

      It is scaled to ERA to make it easy for newbies to understand contextually, so we can easily use its predictive powers, and so writers like Noah can have articles like this one.

      Vote -1 Vote +1

      • Theo says:

        FIP is just the outer shell of an entire component. You have to use Frankenstein math to figure out FIP. The result is that you get a Frankenstein like statistic, that’s dumb, lazy, and mindless.

        It’s just old basic data that you can’t make advanced stats out of.

        Vote -1 Vote +1

  24. Barkey Walker says:

    This article is a very good idea, thanks for writing it. However, I’m not sure I believe the results: isn’t UZR of fielders correlated over time so that I should *expect* that even under the null hypothesis that pitchers don’t control balls in play that epsilon is not iid, but is instead correlated through time (the alternative hypothesis).

    Example: shouldn’t having Longoria behind me improve my epsilon for the entire length of his contract?

    Vote -1 Vote +1

    • Husker says:

      An excellent point. Fielding stats wouldn’t mean anything at all if they didn’t affect ERA.
      Aren’t fielding stats intended to be, in fact, PID’s (pitching-independent defense)?

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Player Linker - Contact Us - Advertise - Terms of Service - Privacy Policy