# Why you can’t subtract FIP from ERA

Okay, so the title is misleading. They’re both numbers and can be subtracted from one another with wild abandon. Just don’t expect it to mean anything.

What do I mean? The basic formula for FIP is:

(13*HR+3*BB-2*K) / IP + 3.2

You can fancy it up a bit (or a whole lot, if you want something like tRA), but what you’re getting is still a linear model of run scoring. Which is fine, so long as you understand what that means.

Let’s take the home run term of that equation for a minute. It’s supposed to correspond with the number of runs allowed per home run. Here’s the thing, though. The number of runs allowed per home run goes down if you have a low walk rate or a high strikeout rate or both, because that means there will typically be fewer runners on base when a home run is hit. A linear model of run scoring doesn’t account for that.

What this means is that you have a much narrower band of results when you use FIP than when you look at ERA. To illustrate:

The red line represents ERA graphed against itself. As one can imagine, ERA has a one-to-one relationship with itself. The blue line represents FIP relative to ERA. It’s a bit disjointed, because it’s based upon sample data.

But what you can see is that FIP doesn’t stretch as far as ERA – while ERA on the graph goes from 1.00 to 9.50, FIP runs from about 2.71 to 6.92. (The slope of the line is also rather different.)

So, let’s take an example, from Cyril Morong’s recent blog post about the Dodgers:

ESPN shows that the Dodgers DIPS% is 107, meaning that their pitchers would have an ERA that is 7% higher than it actually is if they allowed a league average of hits on balls in play (they are , of course, better than average). With their actual ERA being 3.61, then their DIPS ERA is 3.86. So here their fielders save .25 runs per game (that is, if the pitchers have nothing to do with batting average on balls in play). The Dodgers have played 115 games, so this is an additional 28.75 runs scored. Adding the 10 in from fewer unearned runs gives us 38.75 runs. Since it usually takes about 10 runs to win one game, a rough estimate is that the Dodgers have won close to 4 games this year with their fielding.

Except.

For an ERA of 3.5, we would expect a FIP of about 3.92, based upon the graph above. If we smooth out that line a bit with a linear regression, we can estimate that a 3.61 ERA should result in a 3.96 FIP. (FIP and DIPS ERA aren’t precisely the same thing, but they are both defense-independent component ERAs based upon linear models of run scoring, so I don’t fee too bad in conflating the two.)

So the effect Morong is seeing here is almost entirely a function of the linearity of FIP, not the Dodgers defense at all.

This doesn’t mean that FIP is useless, of course – it should do a good job of putting pitchers in the right ordinal ranking – the best pitchers will generally have the lowest FIPs and the worst will have the highest, at least within the limits of sample size. But what it will do is distort the distance between the best and worst pitchers.

And that’s why you can’t just subtract FIP from ERA. (Or, again – you can, but you shouldn’t.)

**UPDATE:** Someone asked about tRA. Well, I have that data, along with xFIP. Excuse me if I’m getting a bit too wild with the Photoshop effects; I promise in a few days I’ll stop feeling like a kid in a candy store and learn some restraint.

xFIP has an even smaller spread, which should surprise nobody – it normalizes differences between pitchers’ home run rates. This has the benefit of being more predictive of future ERA, one should note.

tRA and FIP are nearly identical in this regard, which again shouldn’t surprise anyone.

So can we just linearly transform FIP so that it matches the ERA line above?

Eyeballing it, it looks like

FIP 4.50 = 4.50 ERA

FIP 3.00 = 2.00 ERA

FIP 6.00 = 8.00 ERA

If we let

nFIP = (2*(FIP – 4.5)) + 4.5

Then we get

nFIP 4.50 = 4.50 ERA

nFIP 1.50 = 2.00 ERA

nFIP 7.50 = 8.00 ERA

Much closer. And of course, this could be improved if you use the actual numbers, used a larger sample, etc.

Does this make sense to do, or does this introduce some new problems that I haven’t considered?

So does this also mean that the way FanGraphs WAR is calculated is going to underestimate how much above replacement pitchers at the low end are and how much below replacement pitchers at the very high end are?

I meant a component ERA.

Does the ESPN DIPS% account for the fact that the Dodgers have one of the strongest pitchers park in the majors?

You can subtract ERC from FIP though, right? Both are linear formulas. I think that’s what people should be looking at when testing defense.

We might also check to see if there is some mind of correlation between walk rates and the difference between the blue and red lines. It might be something weird, like a parabola

We might also see if there is a correlation between the difference between the red and blue line and walk rates and/or strikeout rates. It might be something weird like a parabola

Cyril – Check out line 12, which is the part where dER are estimated. That’s the part we’re concerned with when we are talking about linearity, and that’s certainly a linear equation.

Okay, but we still don’t know if DIPS 2.0 has the same bias as FIP. I am about to grab some data from ESPN using 2008 data and seeing what biases there are between ERA and DIPS 2.0 ERA

Colin, what is your math? I don’t understand how you got your results for your (very pretty) graphs.

Colin

As you might guess, I don’t think I am hung up on minutia. But can you tell us some nonlinear formulas that work better for pitchers?

Cy

What I did was take and round all observed ERAs to the nearest half. Then I took the weighted average of the FIP for each of those groups of pitchers.I assume for this season only? Or for more than one season?

I ask because I think your graph significantly overstates the non-linearity of run scoring and you’re likely capturing a lot of outlier performances when collecting info this way. IOW, there is more going on behind your data than simply the non-linearity of scoring runs.

For FIP and dERA, I used 1993-2008. For the other graph, I used 2003-2008 (because I’m missing batted ball data for 2000-2002).

Did you use career totals, which would mostly get rid of the outlier issue, or did you use seasonal totals, which wouldn’t?

I thought you used a simulation or something like that, in which you would hold fielding performance constant. I don’t think this is a legitimate analysis if you didn’t.

I don’t have access to a sim, Dave. (Would that I did!) What I do have is Retrosheet data, however. I’ll see if I can do something with that.

I think the only way to really prove your point is to model it somehow—there will always be fielding bias in whatever “real life” sample you choose.

I haven’t looked at this dynamic specifically, but I doubt the issue is nearly as significant as your graph shows. It exists, but only at the extremes.

FWIW, you might be interested in this post, from about six years ago. It doesn’t address your point directly, but I think it has some related issues.

http://www.baseballgraphs.com/main/index.php/site/article/my_vacation_with_fip_and_der/

That is a great blog post. Thanks for that.

I came up with some ideas on a long car ride to the zoo and back that I’ll have to test later. But I think that the issue isn’t linearity, but what Matt Swartz was talking about with variance. (In short – I get to write about regression to the mean again! Yay!)

I WANT CATTER TO YOU http://goo.gl/Z7mJYy

your visit me in here

thank youvery muchlike my site in here please

thank youvery muchwhy do not ? http://goo.gl/V2cU4L

does you visit me in here

jasa konveksi kaostempat pesan kaos raglan