FanGraphs Baseball - Comments on ‘Stabilizing’ Statistics: Interpreting Early Season Results
RSS feed for comments on this post.

## Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: `<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> `

0.175 Powered by **WordPress**

Pizza Cutter’s methodology was impressive, but his conclusions fly in the face of what my stats books tell me about rate statistics — the variance of a sample proportion (and therefore the ability to infer anything about “true rates”) depends only on the sample size and the proportion. There is no way different rates can exhibit different “stabilizing” behavior unless the events are non-independent. Take groundball rates — do pitchers compensate by varying what they throw based on how many ground balls they’ve been getting recently?

Since Pizza Cutter’s article was published in 2007, we’ve had three more seasons — how about double-checking his work by looking at rate stats at the end of April and see how well they project the rest of the season? Pizza Cutter mentioned social science in his article — one hallmark of the social sciences is that no one ever verifies anyone else’s work. We should not accept that for baseball analysis. Looking at the past three seasons would give us some clue how much faith to put in these early returns for 2011.

Comment by lex logan — May 6, 2011 @ 5:38 pm

oh fiddlesticks! just when I thought I had gotten that one bit of intel that would give me the deciding edge lex has to ruin everything.

good story and response, thanks. go charlie morton….. hope you didn’t take anything from doc ellis to help you make this adjustment…..

Comment by jimbo — May 6, 2011 @ 5:51 pm

PC’s study looked at statistics from a given year to determine when different statistics “stabilized.” He was trying to determine “at what point in the season (measured by PA) do stats go from being garbage to being meaningful and actually describing something about the player?”

There are two problems with using that here.

1) PC overstates his case. 50 PA is not the point at which swing % “go[es] from being garbage to being meaningful,” it is the point (rounded to the nearest 50 PA) where swing % goes from having a split-half correlation of .7.

In other words, 50 PA is not some kind of talismanic number where a light bulb suddenly goes on. It is simply the point at which you can say that a guy with a swing % of N, in a league where the average swing % is M, has a “true” swing % of (N+M)/2. Before that point, you can still say that his true level is closer to N than M, and after that point, you can continue to build evidence that his true level is closer to N than (N+M)/2.

2) PC’s research was done in a vaccum, where he was simply looking at a player’s performance in a single season. It is not directly attributable to the question here, which is that N PA tells you when someone’s skill level has

changed. If you want to determine that, the size of the sample isn’t the only thing that matters — how much the sample differs from expectations matters as well.Lance Berkman isn’t anywhere near the threshold where his HR/FB rate “stabilizes” (300 PA), yet ZiPS has gone from projecting him to hit 17 HR over 504 PA before the season to projecting 17 HR over 413 remaining PA as of today.

Why? Because Berkman’s performance to date is already statistically significant. He is multiple standard deviations above his projected performance to the point where it is more likely than not that his projection is no longer accurate going forward.

Calculating exactly how much significance his performance to date has — and thus how much to adjust his projection — will relate to similar concepts as “stabilization,” but you cannot simply treat any PA/BF threshold as a magic number where things suddenly become meaningful. Every PA and BF carries some tiny bit of significance, and if they consistently point in the same direction, they can add up to real meaning long before a statistic “stabilizes.”

Comment by Chris — May 6, 2011 @ 7:49 pm

I don’t understand why no one uses confidence intervals with sabermetrics. Point estimates are essentially worthless; as you say, they don’t switch from meaningless to meaningful when you cross some magic threshold. Instead of making observations and predictions and then wasting verbiage with the mandatory “Small Sample Size” disclaimer, why not report a range of plausible values that automatically improves as more data is acquired?

I admit that the simple confidence intervals taught in introductory stats classes might not be the complete answer; but I suspect the literature has examples of forecasting a range. In project management, for example, times to complete a project are often estimated using a “best case–most likely–worst case” estimate for each component, weighted 1-4-1.

Comment by lex logan — May 7, 2011 @ 1:24 pm

Well said.

Comment by Luke in MN — May 7, 2011 @ 6:00 pm

How are you going to come up with a confidence interval?

Comment by imabookie3 — May 7, 2011 @ 8:26 pm

The Zimmerman link is to the 1999 player, not the currently active Jordan Zimmerman.

Comment by lex logan — May 8, 2011 @ 12:05 am

A simple formula is p +/- z*sqrt(p(1-p)/n) , where p (stands for proportion) is the observed rate, n is the number of observations (sample size), sqrt means take the square root, and z* depends on the desired level of confidence. Common values for z* are 1.96 for 95% confidence, 1.645 for 90%, and 1.28 for 80%. For Jordan Zimmerman’s 2011 GB%, he’s had 59 grounders out of 139 batted balls, for a .424 rate. An 80% confidence interval would be

.424 +/- 1.28 sqrt(.424(.576)/139, or .370 to .477 .

The advantage of confidence intervals are that as you acquire more data, you are dividing by a larger “n”, so the confidence interval gets tighter; you don’t need a fuzzy “small sample size” disclaimer. However, does a GB% of between 37% and 48% tell us anything meaningful? And if a batter gets 11 hits in his first 50 AB’s, do we really think his true Batting Average is between .145 and .295? More likely he’s just been unlucky so far, and a BA over .300 is probably more likely than one below the Mendoza line for an established or talented hitter. So I think a better approach would employ “regression to the mean”, but also produce a range rather than a point estimate.

Comment by lex logan — May 8, 2011 @ 12:55 am

Just a question, if a player or group of players are aware of and reacting to a statistic, is there any relationship to the cumulative stat or it variance damping at a different rate than if the player was ignorant of the stat? I’m thinking of Greinke’s efforts to refine his ground ball rate last year.

Comment by channelclemente — May 8, 2011 @ 1:01 pm

I second that – Chris’s and Lex’s replies are two of the most thoughtful and well-stated I’ve read in quite some time.

Comment by Jason B — May 9, 2011 @ 9:55 am

“Pizza Cutter’s methodology was impressive, but his conclusions fly in the face of what my stats books tell me about rate statistics — the variance of a sample proportion (and therefore the ability to infer anything about “true rates”) depends only on the sample size and the proportion.”

Lex, that’s only true of the random causes of variance. What you’re describing would only be true if baseball were entirely a game of chance – if all players were equally talented, in other words.

Comment by Colin Wyers — May 9, 2011 @ 9:55 am

That possibility — feedback basically — might explain some of the differences in stabilization Pizza Cutter found. If every event were known to be independent, I would dismiss PC’s findings out of hand. But the possibility of ongoing adjustments suggests the classic probability models may not be appropriate.

Comment by lex logan — May 9, 2011 @ 3:36 pm