I’d like to have a more private conversation about some of this stuff, and I have a few general questions about fangraphs, too. Can I e-mail you?

]]>How do you add park effects?

]]>You can’t have it both ways:

1) claiming that coefficients matter but the large standard errors don’t when it comes to deciding which explanatory variables belong, and

2) using these same standard errors to argue that the coefficients from the reliever sub-sample aren’t really all that different from the coefficients from the full sample.

If in fact there’s all this noise due to their smaller number of innings pitched, then doesn’t it make more sense to just exclude relievers from the analysis altogether and apply your conclusions just to starters?? Instead, you’re making an assertion (and that’s all it is) that even though we can’t see it in the data, we should just ASSUME that relievers’ performance can be explained by the same model that explains starters’ performance.

]]>The other thing going on with relievers, though, that even with comparable IP won’t make your R^2 the same is that reliever usage varies, so that explains some of the differences in ERA between them too.

I’d focus more on the coefficients than the significance levels when you’re comparing datasets, since the t-stats are just coef/std.err. and we know the std.err. will be different.

]]>I doubt that the differences I see when I break the dataset into starters and relievers are due mostly to just sampling error — I have 1745 observations for starters and 1494 for relievers. Obviously there is some impact from the fact that the avg number of innings for starters in my sample is around 138, while the avg of relievers is 61. But still, the idea that all of the insignificant terms I see in the reliever equation are just due to sample size issues doesn’t make sense. Here are the t-statistics I got for the reliever equation:

SO/PA -3.41

(SO/PA)^2 -0.17

BB/PA 0.97

(BB/PA)^2 0.43

nGB/PA -0.92

(nGB/PA)^2 -2.23

(SO*BB)/PA 0.99

(SO*nGB)/PA 0.79

(BB*nGB)/PA 0.37

And again, the R^2 was around 0.28. What this says to me is that there’s something else beyond linear or nonlinear Ks, BBs and nGBs, and their interactions with each other, that explain reliever performance; as opposed to your suggestion that we KNOW these things are tied to performance and it’s just small sample sizes or bad Retrosheet data that’s preventing us from seeing it.

As to your specific questions:

The BB^2 term in the starter equation was 41.48 with a t-stat of 1.68; for relievers it was 10.24 with a t-stat of 0.43.

The SO^2 term in the starter equation was 9.11 with a t-stat of 1.26; for relievers it was -1.00 with a t-stat of -0.17.

So for starters I’m not too concerned about the significance levels, because the point estimates are similar enough to what you found for the entire sample. But not so for the relievers — there’s got to be something else going on that is just not being captured well by the variables in your equation IMO.

Thanks again for continuing this discussion; the more we can learn from the data, the better off we’ll all be.

]]>Firstly, you don’t need to add and subtract terms to a regression as you go, since we’re not trying to actually model each effect to see if it has one. We know that all of these things have effects, and we’re just not sure how much, and we’re not sure the functional form. If you’re familiar with calculus (which it sounds like you must be, since you speak pretty knowledgeably about regression), think of it as checking:

dERA/dSO = a + b*SO + c*BB + d*GB

dERA/dBB = e + f*SO + g*BB + h*GB

dERA/dnetGB = i + j*SO + k*BB + m*GB

In that sense, I’m only assuming that the direct and indirect effects of each thing are not constant (but are at least linear) with respect to every other thing that we know affects them. For a term to be insignificant or flip signs, it’s not a huge deal.

I did use Retrosheet data with Eric Seidman in version one of SIERA, and we also found that BB^2 wasn’t significant with that data set. Given my findings on situational pitching and walks (namely, higher proportion of inopportune UBB/all UBB for pitchers with higher UBB), it is highly likely that BB^2 should have a positive term. Thus, I think the insignificance in Retrosheet data tests is batted ball measurement error causing attenuation bias.

Relievers, by the way, are going to have a lower R^2 of course! They have more variance in their ERA due to smaller sample sizes! If you took two months of pitcher data at a time, I’d bet you’d get a similar R^2 as you get for a full year of relievers. That’s just the magnitude of the error, I’m pretty sure. There are also just other factors, like when a reliever gets brought in & versus who, that add variance to his ERA in a way that is potentially not too correlated with peripherals.

I’m guessing that the insignificant SO^2 term is probably a sample size issue. Does it change much or turn negative? Or does the p-stat just get bigger because the standard error is larger?

]]>