Using all pitchers with at least 40 IP in consecutive seasons and weighting by IP in the first season (using ERA as next season’s park-adjusted ERA)

ERA = 1.50 + .95*SIERA – .28*xFIP

(all p-stats <.002)
(This has multi-collinearity issues since the two stats correlate and we are trying to differentiate which effect is what, but splitting odd & even seasons, I get):
ERA = 1.47 + .86*SIERA - .19*xFIP
ERA = 1.50 + 1.04*SIERA - .37*xFIP
(p<.003, except for the first xFIP coefficient is .128)
Doing FIP in there, we get:
ERA = 1.30 + .61*SIERA + .14*FIP
(all p-stats < .01)
ERA = 1.32 + .56*SIERA + .15*FIP
ERA = 1.27 + .67* SIERA + .06 FIP
(p-stats <.01 except for the last FIP one of .265)
Doing FIP & ERA-same-year:
ERA = 1.31 + .61*SIERA + .04*FIP + .07*ERA
(p-stats .000, .398, .007)
Splitting halfs...
ERA = 1.32 + .56*SIERA + .10*FIP + .06*ERA
(p-stats: .000, .146, .132)
ERA = 1.28 + .66*SIERA - .02*FIP + .08*ERA
(p-stats: .000, .764, .020)
Anything else to run? I'm happy to run tests as asked if my programming skills can handle it! Thanks for the great idea, Brent!

Furthermore, the weights should differ depending on the time frame used. That is, if I’m forecasting based on data for a relatively short period (say, half a season), I’d probably give almost no weight to ERA, but if a pitcher’s ERA is consistently better than the defense-independent metrics for several years, then I’d figure there are skills being captured by ERA (holding and picking off base runners, pitcher fielding, skill at inducing infield flies) that are missing from metrics like SIERA. So I’d guess the coefficients of FIP and ERA would tend to rise when forecasting using data for longer time spans.

]]>One thought on your closing lines: is there any meaningful trend on the pitchers where xFIP and SIERA produce significant disparities in predictions? Are these disparities random or consistent?

]]>I’m testing SIERA against next-year-ERA while the formula was devised as a regression against same-year-ERA. As a hypothetical, I showed what a biased regression like you describe would look like above when I developed the “fake SIERA” called SIERA*. That RMSE with next-year-ERA was 0.920, as opposed to regular SIERA’s 1.040.

In the original SIERA run, Eric and I developed it for 2003-2007 and then applied it 2008 peripheral data against 2009 ERA and it worked. The original formula released in 2/2010 did well using 2009 peripherals to predict 2010 ERA too.

Is that what you were asking about?

]]>