FanGraphs Logo

Park Factors and ERA Estimators: Part I

(Note: I noticed a coding issue in the data, which resulted in three parks having a different classification. The data has been re-run to reflect the new results and the article updated to reflect the findings.)

Researchers have gone to great pains to highlight and account for factors outside of an individual player’s control when evaluating their performance and value. The standard for this is of course Voros McCracken’s seminal research into defense independent pitching and Tom Tango’s fielding independent pitching (FIP). While baseball is arguably the most “individualistic” of the major team sports, players do not perform in isolation from each other or from their environment.

Lately I’ve become more interested in how the physical environment of a team and its players affects their outcomes on the field. My initial research led me to look at whether a team’s home park and the degree to which it inflated or suppressed run scoring put the team at a fundamental advantage or disadvantage in terms of winning. The results suggested that hitter-friendly parks do, in fact, put a team at a fundamental disadvantage, likely due to the stress that playing 81 games a year in that environment places on the pitching staff.

In this article, I am concerned with how park factors may affect the various constructs we’ve developed to help us better evaluate a player’s talent and likely performance in the future. Specifically, to what extent to do park factors affect the usefulness of various ERA estimators? It seems reasonable to assume that while much of what happens when a ball is put in play is not controlled by a pitcher. However, given that some extreme parks are likely to exercise their own environmental force over the outcome of batted balls it stands to reason that ERA estimators that factor in a pitcher’s batted ball profile may do a better job in certain types of parks than others.

To test this hypothesis I collected data on all pitchers that threw 40 or more innings in successive seasons from 2002 through 2010 for the same team. For each season pair I collected the pitcher’s ERA, FIP, xFIP, tERA, and SIERA in Year 1 as well as their ERA in Year 2. I then set about calculating the correlation between each Year 1 metric and Year 2 ERA as well as the Root Mean Square Error (RMSE) for each (which is simply a measure for how accurately each predicted measure was to the actual value–the lower the RMSE, the better the predictor).

Now, let’s see what we see:

All Parks (n=1400) R RSQR RMSE
ERA 0.313 10% 1.142
FIP 0.371 14% 1.117
xFIP 0.369 14% 1.118
tERA 0.382 15% 1.112
SIERA 0.413 17% 1.096

In aggregate, we find that each of the estimators does a better job of predicting the following year’s ERA than the previous year’s ERA. Each estimator has a higher correlation with YR2_ERA and shows less variance in terms of the RMSE. This is consistent with previous research.

However, when we segment the Year1 and Year2 pairs by type of park we see some interesting differences.

Hitter Parks (n=168) R RSQR RMSE
ERA 0.320 10% 1.048
FIP 0.433 19% 0.997
xFIP 0.400 16% 1.104
tERA 0.459 21% 0.983
SIERA 0.409 17% 1.009

When we look at ERA estimator performance in hitter-friendly parks we find that the advantage of the estimators relative to YR1_ERA generally increases. The easiest way to visualize this is to look at the amount of variance each estimator alone explains (i.e. R-squared, or RSQR in the tables). For all parks, each estimator explains at least 4% more variation in YR2_ERA than YR1_ERA. However, when we focus just on hitter-friendly parks, the difference increases to at least 6% (xFIP) and a maximum of 11% (tERA–more than double the explanatory power of YR1_ERA).

SIERA and tERA do provide additional insight above just YR1_ERA, but FIP manages to outperform every other metric except tERA. That’s interesting, given that tERA and SIERA take batted ball profile into account.

Pitcher Parks (n=154) R RSQR RMSE
ERA 0.274 8% 1.233
FIP 0.334 11% 1.209
xFIP 0.345 12% 1.203
tERA 0.313 10% 1.218
SIERA 0.399 16% 1.176

For pitcher-friendly parks, the pattern is more familiar. Compared to all parks, the predictive power of the estimators actually decreases, but the relative ranking of each metric is about what we would expect. What’s more interesting is that a clear advantage emerges for SIERA in terms of R-squared and RMSE relative to both YR1_ERA and all other estimators.

What about neutral parks?

Neutral Parks (n=1078) R RSQR RMSE
ERA 0.313 10% 1.150
FIP 0.364 13% 1.124
xFIP 0.367 13% 1.123
tERA 0.379 14% 1.117
SIERA 0.415 17% 1.099

By definition, neutral park pairs make up the largest part of the sample. Compared to all parks, the results look quite similar. In terms of pure correlation, each estimator looks about the same (not surprising, since they are driving the overall sample).

So how should we interpret these results?

Well, first of all, we should note this is a first step. All this study looked at was how ERA and it’s estimators fared when segmented by park type when pitchers threw in the same park in consecutive seasons. It also included both relief pitchers and starting pitchers. Follow on articles in this series will look to segment by pitcher type (starters versus relievers).

Second, estimators that take into account batted ball profile fare do not show a clear advantage in hitter-friendly environments. This goes against my initial hypothesis. While tERA had the strongest showing, plain old FIP did nearly as well as better than SIERA. Since FIP takes into account actual HR/FB it appears able to pick up the effects of playing in a high run-producing environment. At this point, I am not sure why SIERA essentially performed the same as xFIP.

Third, the analysis suggests that different estimators will be more applicable to pitchers with different batted ball profiles. Like park factors, it may be that different estimators do a better job of predicting next year’s ERA for extreme ground ball or flyball pitchers. More importantly, it may also be that combinations of park factors and batted ball profiles lend themselves to some ERA estimators over others.

Fourth, it is not clear from this analysis which is the best estimator to use when trying to get a handle on how a pitcher will perform when moving from one park to another. Originally, I thought that predicting the future performance of someone like Michael Pineda, who is moving from Safeco to Yankees Stadium next year, the safest route would be to rely on estimators that take into account batted ball profile. However, the initial analysis here suggests that while tERA performed the best, relying on FIP might be just as useful. When it comes to pitcher-friendly parks, however, initial SIERA appears to be the best bet. This will also be the focus of a future article.

——————–

*In order to segment the pairs I took 3-year regressed park factors (courtesy of Seamheads) for each season since 2002 and calculated the standard deviation park factor for the league each year. For each season, a park was considered hitter-friendly if its park factor was greater than one standard deviation from the league average for that season. For pitcher-friendly parks, their park factor needed to be less than one standard deviation away from league average. The balance of parks were considered neutral. This coding scheme ensured that, on average, 76% of all parks each year were neutral, 14% were hitter-friendly, and 10% were pitcher-friendly. Season pairs were coded using the park factors from the first year of the pair. The average difference between Year1 and Year2 park factors was only .05 (i.e. a Year1 factor of 95 would have, on average, a Year2 factor of 95.5).




Print This Post

Bill (@BillPetti) is a staff writer at FanGraphs and a consultant and regular guest on MLB Network's Clubhouse Confidential. He is also a contributor at Beyond the Box Score and Amazin’ Avenue.

13 Responses to “Park Factors and ERA Estimators: Part I”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. Joe Peta says:

    Bill, I ‘d love to know if the is possible to factor in the quality of the defense behind the pitchers in the hitter-friendly and pitcher-friendly parks. For instance, Tama Bay pitchers consistently outperform their SIERA and xFIP because they have low BABIP figures. But, as long as TB defense stays strong, we shouldn’t expect those to regress. All of that, of course, is intertwined with the Trop being pitcher-friendly. I wonder if that would still be the case were Mark Reynolds at 3B instead of Evan Longoria, etc.

    Vote -1 Vote +1

  2. RC says:

    You never really defined what “Hitter Friendly” and “Pitcher friendly” are. Is it a PF below .9/above 1.1? Is it a combination of factors?

    IE, what is Fenway? It has a huge positive factor for doubles, but is negative for homeruns, and really bad for homeruns to center? Without knowing what hitter/pitcher friendly actually means, its tough to judge how much merit this article actually has.

    Also, I’m not surprised at all to see that FIP is less predictive than ERA in some cases, and xfip is even worse.

    I’ve been saying that FIP rewards pitchers for giving up hits instead of getting outs, and that looks like exactly what its doing.

    Vote -1 Vote +1

    • RC says:

      nevermind the first part, missed the end chunk about how you chose park factors.

      Still, I don’t like generic park factors at all. Parks can be extremely positive for hitters from one side of the plate, while negative for the other side of the plate.

      Vote -1 Vote +1

  3. A really thought-provoking first piece. Looks like Fangraphs has another terrific writer.

    Vote -1 Vote +1

  4. Jeff Wise says:

    This is some really cool information! I was actually thinking of this the other week. I’m a big M’s fan and I always wondered if it was only perception about Safeco that made it harder for them to put up better offensive numbers.

    I also kept hearing free agent batters don’t want to come because it’s not a hitters park.

    Despite the neat stats…is some of how a player performs due to perception or comfort?

    Vote -1 Vote +1

  5. RC says:

    I also think Standard Deviations is a poor way to do this.

    10% is 3 parks. The 3 parks at 82, 89, and 90 are pitcher friendly, but the 4 teams at 94 are neutral. Eh, but not so bad.

    14% is 4 teams. So, 126, 111, 110(CHN), and 110(FLO) are hitter friendly, but … 110 (TEX) isnt. Thats strange. Also, 108 and 107 are neutral. When the gap between members in your set is larger than the distance between your set and the items outside it, you’re generally not doing a very good job picking your set. You’re assuming a normal distribution, and thats not the case.

    My guess would be that SIERRA as a whole fits the curve better than anything else. FIP/xFIP fit the curve closer the lower the park factors get (because their erroneous assumptions are closer to true in pitchers parks).

    Vote -1 Vote +1

    • Bill Petti says:

      It’s a fair point.

      The use of SD was a starting point. I really wanted to pull out the more extreme parks on either side. You end up averaging a little over 4 parks per year for hitter-friendly and 3 parks per year for pitcher-friendly. Does that leave out some parks? Sure. I can go back and run with different coding, but my initial thought is we won’t see a ton of difference. But I will see what it does.

      Vote -1 Vote +1

  6. Jack Nugent says:

    I’ve made this comment on FG a few times now, but before anyone sets out to try and make a point using park factors, the following should be absolutely mandatory reading:

    http://highboskage.com/stat-corrections.shtml

    Perhaps you’ve considered Eric Walker’s research on the subject, but unless the author can tell me how specifically Seamheads addresses the enormous obstacles to calculating park factors, I’m not sure any of what’s been written here is at all useful.

    I’m afraid too many people have turned a blind eye to the simple reality that calculating useful park factors in this era of baseball may be completely impossible.

    Vote -1 Vote +1

  7. Lex Logan says:

    A good, thought-provoking article. As for that High Boskage article, while it pointed out some potential flaws in park factors, I saw no attempt to quantify the magnitude of any of the problems listed; “pile of reeking garbage” does not impress me a sabermetrics.

    Vote -1 Vote +1

  8. Jack Nugent says:

    Well, if that doesn’t have you sold, and it’s more numbers crunching that you’re looking for, then:

    http://highboskage.com/making-park-factors.php

    This was linked to in the same article I mentioned before. These are the numbers that demonstrate why calculating park factors is next to impossible.

    Vote -1 Vote +1

  9. Jack Nugent says:

    By the way, Eric Walker isn’t just any old dude writing about baseball. He worked for the Oakland A’s during Sandy Alderson’s tenure and basically “invented” moneyball.

    Vote -1 Vote +1

  10. KJOK says:

    http://www.seamheads.com/ballparks/about.php

    Fangraphs seems to have eaten my long post, but in short we’re very familiar with Mr. Walker’s work on parkfactors and largely agree with him, but we disagree that multi-year properly regressed factors cannot be useful in analysis.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Player Linker - Contact Us - Advertise - Terms of Service - Privacy Policy