For those who still don’t believe FIP is bad for fantasy analysis…

Our friend Brian Joura of RotoGraphs posted an article today citing my own article about the problems with FIP from earlier in the year. My assertion from then, which I still stand by completely:

While the original, underlying premise for FIP is sound, and while it’s absolutely better to use than simple ERA, and while there are certainly uses for FIP in some circumstances, for 99 percent of fantasy purposes, I ignore FIP completely and absolutely.

I noticed a few comments to Brian’s article that didn’t seem to completely buy my explanation, so I thought I’d run some quick numbers to help provide further evidence that a stat like LIPS or xFIP is better than FIP.

HR/FB instability

By definition, the only substantial difference between FIP and xFIP is that xFIP adjusts each stat line to assume a league average HR/FB, so this crude study will focus entirely on HR/FB.

I looked at all pitchers with at least 12 games started in adjacent seasons from 2004 to 2008. Over this period, we find 63 pitcher seasons where a pitcher’s HR/FB strays at least four percent from league average* in Year 1. In Year 2, just 5 of those 63 pitchers (7.9%) failed to regress in the direction of league average. That’s a very small number, especially when you consider that Chien-Ming Wang (who may be one of the rare exceptions I mentioned) and Brett Myers (who almost certainly is one of those rare exceptions) accounted for 2 of those 5 seasons. Exclude them, and the percentage becomes 4.8%.

This is a very crude study, but hopefully it reestablishes my point. HR/FB is unstable and because FIP makes no alterations, it will be misleading and less accurate than other indicators. David Gassko did some much more thorough work on HR/FB in the THT Annual 2007 (which can be read for free here), but the short version is that for pitchers with 350+ TBF, the previous season’s HR/FB explains just 3% of the variance of the following season’s HR/FB.

*I used a rough estimation of league average, using the aggregate league average for all five years. This is the lazy way to do it but won’t change my point.

Anecdotal evidence and precision

One comment from Brian’s article that I thought would be useful to answer for everyone:

“Well…FIP definitely helped predict Ricky Nolasco’s turnaround. Not sure what his xFIP was….”

We must remember that FIP is not so utterly useless that it will be incorrect in every scenario. In scenarios where the pitcher has a lucky or unlucky BABIP or LOB% (Nolasco’s BABIP was over .400 at one point), FIP will be able to predict the general direction the pitcher’s ERA should move as long as the HR/FB isn’t too far away from league average.

While we’ll know that Nolasco isn’t a 6.00 ERA pitcher, it is important to make a distinction over whether his ERA should be 4.50 or 4.00 or 3.50. Even the difference between a 4.25 and 4.00 ERA is the difference between ‘solid starter’ and ‘waiver wire material’ in many leagues. FIP is ill-equipped to make this distinction.

We can’t allow anecdotal evidence to rule our decision making. While FIP may have worked in Nolasco’s case given a very rough objective, the numbers tell us that a stat like xFIP or LIPS will be more accurate, for more pitchers.


Print This Post
Sort by:   newest | oldest | most voted
Brandon Heikoop
Guest
Brandon Heikoop
So it’s not that FIP is “bad” for fantasy analysis, it is that looking at FIP without context is bad fantasy analysis. One could have recognized this under any circumstance as looking at any one statistic without context is bad analysis in any spectrum, baseball notwithstanding. Hockey fans, which aren’t as stat savy as baseball fans, know that you cannot simply look at a goalies GAA and deem him #1, there are other circumstances that must be evaluated (SV%, SOG, SHO, PPGA, SHGA, etc). Basketball fans, who are becoming more stat savy, know that the shooter with the highest FG%… Read more »
Derek Carty
Guest
Derek Carty
Brandon, I suppose that’s one way to look at it.  But when we have readily available stats like xFIP, I don’t think it makes much sense to look at FIP and try to make subjective estimates about context.  All the necessary adjustments have already been made for us.  My beef is mostly that analysts continue to use FIP without looking at context and treating it as the exact number that a pitcher’s ERA should be, even when it’s completely off because of his HR/FB.  If a pitcher’s HR/FB isn’t exactly league average, I just don’t think looking at FIP is… Read more »
TheKid
Guest
TheKid

This is a great back and forth discussion you have begun with Mr. Joura.  I love both sites and read both regularly.

Josh
Guest
Josh

What if we normalized for BABIP and Strand Rate as well?  Would that be more useful?  Is it hard?

K76154
Guest
K76154

I want to use the LIPS ERA instead, but the problem is that the major websites does not provide LIPS ERA data, and it’s hard to calculate by myself.

Will Larson
Guest
Will Larson

Derek and others: The stats that really matter are the contact stats (hr/fb, gb%, ld%, iffb%) and the non-contact stats (k/9, bb/9). From these, you can predict how a pitcher will do much better than with FIP or any of the others. BABIP, LOB%, and ERA are all heavily dependent on the type of contact a pitcher induces. Thus, a pitcher with a very high fb% will have a permanently low BABIP but a higher hr/9 than the league average. Check out http://www.williamlarson.com/?p=95 or http://www.williamlarson.com/baseball_document.pdf for more info.

Derek Carty
Guest
Derek Carty
William, You’re absolutely right that these are stats that we should be looking at for pitchers.  If you go through the THT Fantasy archives, you’ll see countless examples where we use them. However, I believe you might be missing the point of stats like FIP, LIPS, etc.  What you cited are component skills.  What ERA Estimators look to do are combine certain component skills in the proper proportions to estimate what a pitcher’s “luck neutral” ERA would be. Sure, we can look at K/9, BB/9, GB% etc, but we don’t know the exact contribution each has to ERA in our… Read more »
Will Larson
Guest
Will Larson
Derek, thank you for your thoughtful response. I think this is a great conversation and one that people should be having more often. I agree with most of what you’re saying. However, I think you underestimate the effect batted ball stats have on ERA. First, batted ball stats affect BABIP. Regression analysis shows that a 10% point increase in GB% from LD% reduces your expected BABIP by 50 points (from .300 to .260 for example). A 10% point increase in GB% from FB% increases your expected BABIP by 10 points (from .300 to .310 for example). Where batted ball stats… Read more »
Brandon Heikoop
Guest
Brandon Heikoop

Derek,

I agree. But again, in a bubble, no statistic is truly accurate. We see all the time that author’s will call for a player to be a sleeper based solely on his BABIP compared to the league average, which can be useful, but in the same sense, is not 100% accurate.

For me personally, I will often use FIP contextually because I am lazy and typically utilize Fangraphs as my statistical database. If someone challenged me on this I certainly would not disagree and say, “Noooo, FIP is correct” in the same sense that you wouldn’t with xFIP, LIPS, or DIPS.

Derek Carty
Guest
Derek Carty

Brandon,
Absolutely right.  No stat is 100% accurate.  If it’s a matter of being lazy, that’s perfectly fine for fantasy players.  But for analysts to be lazy is by no means fine (and I’m pretty sure they’re not lazy, just misinformed or unaware or something like that).  While we will never be 100% accurate, a stat like LIPS or xFIP will be *more* accurate than FIP, and that’s the best we can do.

Derek Carty
Guest
Derek Carty
William, I don’t think I’m underestimating anything.  While you’re absolutely right that batted ball shifts can have big effects, the biggest ones are more a matter of luck evening out than anything else. We must remember that pitchers do not have much control over LDs, and that when we evaluate batted ball stats we should always normalize the LD% first.  If you dig through the archives, you’ll see I always refer to xGB% as opposed to actual GB%.  xGB% normalizes the LD%. I don’t think it’s a sound approach to say, “Player X has a 10% LD% and .300 BABIP. … Read more »
Will Larson
Guest
Will Larson
Derek, I think you’re right. We are talking about basically the same thing but from different ways. I’m saying, “we observe a BABIP, LOB%, and and ERA. Now let’s try to explain it using X,Y, and Z.” You’re saying “Usually, X, Y, and Z result in so many hits and HRs, so this is what the BABIP and ERA should be.” I think we arrive at basically the same conclusions either way. Please do think about LOB% though. There is a substantial multiplier effect of an inflated BABIP on ERA via LOB%. When this season is over, I’m going to… Read more »
NadavT
Guest
NadavT
I’m curious how you’re defining “more accurate” when you’re comparing FIP and xFIP.  Are you only interested in predicting a pitcher’s performance in the following season, or are you looking at within-season accuracy as well?  Looking at THT’s pitcher stats, I took a quick look comparing FIP and xFIP to ERA for the top 50 pitchers in both 2007 and 2008 (ranked by xFIP), and FIP was closer to ERA more than 60 % of the time.  100 pitcher-seasons might not be enough of a sample for the HR/FB regression to truly even out, but that’s sort of the point—for… Read more »
Derek Carty
Guest
Derek Carty
We can absolutely be confident that HR/FB rates will regress towards their expected values, Nadav.  As I noted in this article, of the 63 players with lucky/unlucky HR/FBs in Year 1, just 7.9% of them *did not* regress in the direction of league average in Year 2 (and there may even be some permanent outliers included in that percentage). As to FIP and ERA matching up better, this is actually the result we should expect.  xFIP is a luck neutral stat, while ERA and FIP both incorporate some elements of luck.  ERA is very luck dependent, while FIP is somewhat… Read more »
Derek Carty
Guest
Derek Carty
Will, No argument that BABIP impacts LOB%.  It certainly does.  But I don’t think that the controllable aspects of BABIP will result in huge LOB% differences. If we run a quick regression, we get a regression equation that looks something like this: 100.69 + BABIP*-95.99 which derives a table like this: BABIP-xLOB% 0.280-73.82 0.285-73.34 0.290-72.86 0.295-72.38 0.300-71.90 0.305-71.42 0.310-70.94 0.315-70.46 0.320-69.98 If we were to predict a pitcher to have a BABIP of .310 or even .315 as a result of GB%, he’d still only lose about 1% off of his LOB% as a result.  K rate would impact it… Read more »
Will Larson
Guest
Will Larson

I’m glad you use regression analysis. Makes this much easier to talk about smile

If you’re interested in xLOB metrics, just regress LOB% on K/9 and BABIP. Keep in mind that xLOB% is a function of xBABIP though (you have to adjust your BABIP first, then use this as an input to compute xLOB%). That explains most of what you can. xLOB% and xBABIP can be found for 2008 at http://www.williamlarson.com/baseball_spreadsheet.xls

Will Larson
Guest
Will Larson

I can write an article introducing xBABIP and xLOB% if you’d like, as well as numbers for this year. Please email me at
if you’re interested.

NadavT
Guest
NadavT
Thanks for the response, Derek.  I realized the point about FIP matching up better with ERA by definition (because it takes HRs as a given, rather than being influenced by luck) after I posted the question.  Nevertheless, I still think it’s important to distinguish between regression that’s expected to occur over an entire season and what might be expected to occur over a month or two.  I know that people have looked at the different sample sizes necessary to have confidence in a trend for each stat, so I don’t know if HR/FB is the kind of stat that regresses… Read more »
Clint
Guest
Clint

One issue I haven’t seen mentioned in these discussions is that ANY fielding-independent metric needs to be put into context when used for fantasy – you don’t necessarily WANT to remove fielding from the analysis, as it influences 3 of the 4 standard fantasy SP stats(although it’s not a great idea to chase Wins).  Ignoring the inherent flaws of FIP, this is the #1 reason in my mind to avoid it in fantasy discussions (without proper context at the very least.)

Derek Carty
Guest
Derek Carty

Will,
I’ll check out what you’ve done.  I’ve actually got my own versions of xBABIP and xLOB% in my personal stat database, so I’ll be interested to see how they match up.

Derek Carty
Guest
Derek Carty
NadavT, I don’t think I agree with distinguishing “between regression that’s expected to occur over an entire season and what might be expected to occur over a month or two.”  There is actually no difference.  If regression is to occur, we must expect it to occur immediately.  Absolutely must.  Let’s say we’re at the All-Star Break and we expect the regression to be complete by the end of the season.  How can a player’s numbers regress in that time, though, if we are constantly expecting it to *not* regress in the upcoming small sample of games? If we can say… Read more »
Derek Carty
Guest
Derek Carty

You’re right Clint, that context is important.  Unfortunately, fielding isn’t easily added to stats like these, and simply using ERA because fielding is included is certainly not the answer.  Still, that doesn’t mean ERA estimators should be avoided altogether.  It’s very important to be able to say something about the ability of the pitcher himself.  We just need to remember to apply the context afterwards.  I’m not sure if you’re a long-time reader, but if not, I’d recommend checking out my CAPS stat that accounts for a lot of different contextual things (and more additions are planned):
http://www.hardballtimes.com/main/fantasy/article/introducing-quality-of-opponent-adjustments-and-caps-for-pitchers/
http://www.hardballtimes.com/main/fantasy/article/introducing-caps-road-park-factors/

Will Larson
Guest
Will Larson

Derek, can you email/post your xBABIP/xLOB% for 2008? I’d like to compare with a full season of data if possible.

wpDiscuz