Also, it might be good to look at the pooled FIP, xFIP, etc. of all qualified pitchers, weighting by IP and ignoring the constant added per year to scale to ERA.

]]>Additionally, I think Rich C. gets at an important point: you should probably look at this year as a fluctuation from the past and then ask whether this year is likely in context of what we knew ahead of time. We could get pretty Bayesian here, but let’s keep it simple. The first and most important question is: what is a baseline year? To do this, we need a sample set. OK, we have one. Now, do you include 2000 when forming your a priori hypothesis? I wouldn’t — ten years and nine years are completely arbitrary, so we can decide to call the past nine years an era and see whether this year is a break if we want. You did, so let’s look at that first.

If you average 2000-2009 to get a baseline year, the probability of getting 8.88 runs or fewer per game in 2010 is considered to be about 4% (mean 9.49, SD .357). If you exclude 2000 from your data set, which seems like a number from the height of the steroid era to me, your Gaussian probability of seeing fewer than 8.88 runs per game is about 1.5% (mean 9.40, SD .238).

Essentially, we need to define what we think a normal year is before we test The Year of the Pitcher. In order to ask if this question, it’s necessary to have an opinion as to the baseline we’re comparing against, and you can choose any criteria you want for that baseline, especially in a statistically messy data set like baseball necessarily produces. Looking at anything longer than 3 or 4 years is probably silly, in fact (here it would be helpful to look at player attrition rates for guidance). That said, I feel pretty comfortable at choosing to look at a decade, then throwing out 2000 as The Year of the Hitter, which is a bigger outlier than this year.

So, with 2000 included, we are 96% sure that underlying factors have shifted in the direction of pitching. With 2000 excluded, we are better than 98.5% sure that underlying talent levels have shifted. We can’t know how much it has shifted using this method, but, as you can see, we are pretty sure that it has. (Key caveats remain that this is an incomplete season, and we haven’t looked at whether run scoring increases as the year goes, etc.)

]]>No, they absolutely don’t. Statistical factors point that this COULD be a normal fluctuation. Not that it is. That is an important distinction.

]]>Don’t get me wrong, i’m impressed with the statistical analysis, and i actually found it pretty intriguing. I just don’t think it does anything to disprove this being a year of the pitcher. In fact, I think it did more to prove that it really IS the most pitcher friendly year of the last decade.

]]>Jason – I only searched for a simple trend, didn’t run a regression. Thanks for the heads-up on that. I don’t recall claiming that RS/G should remain constant, I only said that it should constantly fall in a specific range. You’re right in that we can call this the Year of the Pitcher, because of the low run totals. I suppose that my point was more along the line “this isn’t that far out of the ordinary”, or that “this is probably not sustainable.” Thanks for the advice on my method, I’ll try to apply some of that in the future.

Thanks,

Arjuna

]]>I can see that you have some understanding that there is noise to any data point by your use of standard deviation. As such, I am doubly confused that you dismiss a trend because in four of the ten years, the RS/G went up on the previous year instead of down. Did you make a graph of these data? It looks like there’s a trend there. I ran the numbers using simple regression, which is by no means a perfect tool for analyzing this sample, but will allow for a trend (z-scores don’t).

The results: a coefficient of -.07 for time. This indicates that, over the period you look at, the trend was that each year RS/G went down by said amount. That’s not an unsubstantial number; it suggests that, over a 14 year period, we’d expect scoring to drop by a full run per game. And to re-introduce confidence intervals, with the data at hand, we can be more than 95% certain that there is a downward trend w/r/t scoring.

This says nothing about whether 2010 is The Year of the Pitcher, though. By your method, it seems to me that you are concerned with whether there is an observably different level of pitching talent this year than in the previous nine years. There are statistical tools that will allow us to look for “breaks” in data sets that are largely dependent on time, and that would be the appropriate tool to use here. There are several problems with this though: (1) being that you wouldn’t be separating out the talent of pitchers from hitters and (2) being that in this sample size and without several years of data after 2010, it would be almost impossible to isolate this break, to name two of them.

In the end, this YotP question is semantic, really. It is perfectly fine to call this The Year of the Pitcher based only on the fact that scoring is the lowest it’s been in a decade. The “why” question of pitcher talent needs only apply if you want it to. We can’t say that this year’s scoring level isn’t luck based, but we don’t correct WAR production for BABIP either. If you’ve produced, it doesn’t matter for analyzing the past whether it was due to luck or not. There is no question that, so far this year, pitching has been successful at a rate that is better than pitching has been successful across whole years previously. To that extent, it has been The Year of the Pitcher.

Anyway, best of luck in your future calculations.

Cheers,

Jason

]]>