# Reinforcing the power of predictive FIP

In October, I introduced a metric called Predictive FIP (or pFIP for short). This metric is a slightly modified version of Tom Tango’s commonly used fielding independent pitching (FIP) statistic.

Tango’s version of FIP is meant to describe a pitcher’s performance in terms of the three true outcomes (walks, strikeouts and home runs). The FIP equation weights each of those three outcomes in a descriptive manner:

**FIP = (13*HR + 3*BB – 2*K)/IP + Constant (typically ~3.20)**

FIP works fairly well as a predictor of future ERA or runs allowed (RA9); thus, many use the statistic to predict, despite the fact that it is not meant to do so. A good way to think about FIP is what a pitcher’s ERA *should* have been, or better yet, what his ERA would be based solely on Ks, BBs and HRs. FIP is not meant to tell us what a pitcher’s ERA is going to be in the future.

I set out to convert FIP from its descriptive form into a predictive metric.

After a few tests and some advice, I changed some of the methodology behind FIP. First, the FIP weights and constant are meant to describe ERA; I decided make pFIP a predictor of runs allowed per nine innings rather than ERA. Second, I made plate appearances (or batters faced) the denominator of the statistic rather than innings pitched.

The result was this equation:

**pFIP = (17.5*HR + 7*BB – 9*K)/PA + Constant (typically ~5.18)**

The major differences between FIP and pFIP come in the weighting of strikeouts and home runs. Strikeouts become more important when predicting future runs, while home runs become less important.

pFIP held up very well against other more commonly accepted “ERA estimators” (including descriptive FIP). That being said, just because something works fairly well does not mean one should not at least attempt to improve it.

A while back, I attempted to reform pFIP by regressing each of its components (Ks, BBs, HRs), to the mean. Strikeouts and walks are less volatile over one to two year samples; thus, their regression was not nearly as significant as the regression for home runs. Interestingly, regressing the components to the mean, did not improve the metric.

My next idea to improve pFIP was to focus only on the home run component of the statistic.

Dave Studeman, the leader of the Hardball Times, converted Tango’s FIP into a version known as expected fielding independent pitching (xFIP).

According to the THT Glossary, xFIP is:

An experimental stat that adjusts FIP and “normalizes” the home run component. Research has shown that home runs allowed are pretty much a function of fly balls allowed and home park, so xFIP is based on the average number of home runs allowed per outfield fly. Theoretically, this should be a better predictor of a pitcher’s future ERA.

The FanGraphs Sabermetrics Library explains how xFIP is calculated:

(xFIP) is calculated in the same way as FIP, except it replaces a pitcher’s home run total with an estimate of how many home runs he should have allowed. This estimate is calculated by taking the league-average home run to fly ball rate (~9-10 percent depending on the year) and multiplying it by a pitcher’s fly ball rate.

Over most small-to-medium samples xFIP is a better predictor of future than FIP; thus, I decided to apply this concept to pFIP.

xFIP simply inserts the expected number of home runs directly into the FIP equation:

**xFIP = ((13*(FB% * League-average HR/FB rate))+(3*(BB+HBP))-(2*K))/IP + constant**

I decided against inserting the expected number of home runs into the pFIP equation with its current weights.

### An attempt to contrive an xpFIP

I took a sample of starting pitchers who had at least 100 innings in Year X and at least 100 innings in Year X+1 for the years 2007-12 (n = 479).

Then, I ran a multiple regression with strikeouts, walks and flyball percentage times the league average HR/FB in Year X against RA9 for each starter in Year X+1. This regression resulted in this regressed or xpFIP equation:

**xpFIP = ((5*FB%*League-average HR/FB rate))+ (9*BB) + (9*SO) )/PA + constant****

**In this case the constant was 5.23**

By estimating the home run total, the home run coefficient of pFIP is only about half of the weights of Ks and BBs, as opposed to being weighted twice as much as those two coefficients in the original equation.

Then, this xpFIP equation was tested against these other ERA estimators:

{exp:list_maker} pFIP

FIP

xFIP

kwERA

SIERA{/exp:list_maker}I ran a linear regression, on the same sample, between each starter’s ERA estimator in Year X and his RA9 in Year X+1.

I used r-squared as the measure of the predictive value of each estimator, and found these results:

Predictor | r^2 |
---|---|

pFIP | 18.50% |

xpFIP | 17.78% |

kwERA | 17.73% |

SIERA | 15.63% |

FiP | 15.33% |

xFIP | 14.82% |

This new xpFIP equation did fairly well, beating almost all of the other estimators tested. However, regressing the home run component hurt predictive ability of the original pFIP; which was the strongest predictor.

Before scrapping the idea of regressed home runs in pFIP completely, I tested the equation on a different sample. I used the same minimum requirements (100 IP) and the same estimators and ran the same linear regression for the years 2002-07 and found these results:

Predictor | r^2 |
---|---|

pFIP | 19.19% |

SIERA | 16.56% |

FiP | 16.33% |

xpFIP | 16.03% |

kwERA | 15.79% |

xFIP | 15.29% |

The xpFIP equation did not predict future RA9 nearly as well for this sample. My original pFIP equation did significantly better than the other ERA estimators at predicting future RA9.

*Why does the pFIP with a regressed home run component do worse than the non-regressed pFIP?*

It’s interesting that the statistic that uses actual home runs is more predictive than the regressed version, despite the random variation that affects home run numbers.

My best guess for the reason behind this finding has to do with survivor bias. It has been shown that some pitchers have the ability to suppress home runs and consistently have lower than average HR/FB rates. I think it is entirely possible that a fair number of pitchers who are allowed to throw 200+ innings over the course of two seasons have some ability to control their home run rates.

Also there is the issue of park factors. The majority of these players did not change teams during the span of two seasons. It makes abstract sense that a pitcher who made half of his starts in a park that suppressed home runs would have a lower than average home run rate over those two seasons, and vice versa for a pitcher in a home run-friendly park.

I think it’s well within the realm of possibility that regressing the home run component of pFIP would benefit the statistic when looking at pitchers who change teams between Year X and Year X+1.

### pFIP vs. ZIPS

At this point, I’m pretty confident in the strength of pFIP as a predictor.

However, I had always simply assumed that projection systems were more useful, as they consider many more factors other than just the three true outcomes, when attempting to project future runs for pitchers. (Although, this Matt Swartz article caused me to be a little uncertain about that opinion.)

So, mainly for fun, I compared pFIP’s RA9 projections for last year (2012) to the RA9 projections of the popular ZIPS projection system.

First, I looked at a sample of every pitcher who threw at least 100 innings in 2011 and at least *one* inning, in 2012 (n=137) and compared how well each system (or metric did) at projecting future RA9:

Predictor | r^2 |
---|---|

pFIP | 17.72% |

ZIPS | 14.65% |

Much to my surprise, pFIP explained over three percent more of the variation in RA9 than ZIPS. However, my minimum inning threshold for 2012 (one!!) was admittedly silly.

Thus, to eliminate some outliers and converted relievers, I set the minimum threshold in 2012 to be at least five games started in the season (n=118). I found these results:

Predictor | r^2 |
---|---|

pFIP | 19.84% |

ZIPS | 17.20% |

This change improved the predictive ability of both systems, and closed the gap slightly between pFIP and ZIPS. Interestingly though, pFIP still came out ahead of the much more sophisticated system.

This is very obviously a small sample. I looked at starting pitchers in only one season; thus, it could have been pure luck that pFIP was a better predictor of future runs than ZIPS. Also (and more importantly) ZIPS and other projection systems are built to predict many more factors (IP, GS, Ks, BBs, etc.) than just runs.

At the same time, I think these two short studies (regressing home runs and comparing to ZIPS), do a fair job at reinforcing the strength of this simple predictive re-weighting of the FIP equation.

**References & Resources**

All data comes courtesy of FanGraphs

Print This Post

Glenn,

One question on the ZIPS comparison. Were any 2012 data used in determining the coefficients in the pFIP model used in the comparison?

When I determined the coefficients I used 1996-2012, so short answer is yes.

That’s part of why I admitted the comparison was more fun than anything else. At the same time, over those years tested the coefficients were fairly stable, so if 2012 wasn’t included the pFIP model would be almost exactly (if not exactly) the same.

It will be interesting to see how pFIP holds up against other projection models for the 2013 season.

Looking at the pFIP formula, the HR weight is about twice that of BB and K. Why not simplify the formula and just use (K-2HR-BB)/PA ? There’s no big reason it has to be on the RA scale.

“The constant is what really puts pFIP on the RA9 scale.”

Yes, it is a phony number, added to another MUCH SMALLER number to somehow make the sum look more baseballish, whatever that means.

@rubes not sure what you’re trying to say? FIP is a very small number that doesn’t look baseballish until you add a constant.

That small number can tell you a lot, even before the constant is added, it also can tell you a lot more than a pitcher’s RA9 in the previous season.

Yeah, you might want to actually run those numbers. Hint: even though pFIP is on the RA9 scale, which is ~1.08x higher than the FIP/ERA scale, his pFIP from 2012 stats still starts with a 3.

Yes, I guess a simpler way to say it, is that it is fine to scale the various results of whatever FIP calculation using a constant, but when the results are aplied/compared elsewhere, the constant, which is a lot larger number than the calculation result, becomes the data.

Glenn’s adding 2 to the constant is by far the most significant / impactful part of this exercise.