‘Stabilizing’ Statistics: Interpreting Early Season Results

As I’m sure many of you are aware, doing early season baseball analysis can be a difficult thing. It’s tempting for saberists to scream “Small sample size!” whenever someone makes a definitive statement about a player, and early season results should always be viewed with a heavy dose of skepticism. After all, it’s a heck of a long schedule: the season started over a month ago, but we’re still less than 20% of the way finished. With most players, we have years and year of data on them – whether in the majors or minors – so why should we trust their results over a mere 100 plate appearances? More data almost always leads to better predictions, so at this point in the season, trusting 2011 results over a player’s past history is a dangerous thing.

At the same time, completely ignoring 2011 results is a horrible idea too. Some players do make dramatic improvements in their game from year to year, and there are always players that age at a different rate than expected — young players that develop fast (or slow) and old players that age quickly (or slowly). Some of a player’s early season results might be the result of a slump or streak, but sometimes there’s also an underlying skill level change that’s tied in with that slump or streak.

So how do we untangle what’s random variation and what’s a skill level change? Scouting information is huge when evaluating players in small samples, but sadly, not many of us are scouts. But stats can still help; you just have to know where to look.

This is common sense to anyone that’s played fantasy baseball, but some statistics are more fluky than others.  Even very casual baseball fans can recognize that ERA and Wins bounce up and down from year to year, and players’ batting averages fluctuate like crazy over the course of a season. And while some statistics shouldn’t be trusted even over the course of a full season, there are some statistics that stabilize quite rapidly. Thanks to research by Pizza Cutter (which can always be found in the FanGraphs Library), we can see that there are four statistics that have stabilized so far in 2011 for most players: swing and contact rates for position players (50-100 PA), and strikeout and groundball rates for starting pitchers (150 BF).

When I say “stabilize”, I don’t mean that these rates won’t change at all over the remaining course of the season. Instead, all it means is that once a player approaches these sample sizes, you can consider that there’s something more than just random variation going on: there’s some underlying change in a player’s approach/skill level/process/etc. in play as well. Matt Garza isn’t guaranteed to finish the year with a 12 K/9 rate because his strikeout rate has “stabilized”, but at the same time, I wouldn’t be surprised if his final strikeout rate is higher than what it’s been in the past.

With this in mind, let’s take a quick look at some of the early season standouts in each of these stats:

Swing Rate – 50 PA

Jose Bautista is only swinging at 33% of pitches thrown to him, which isn’t all that surprising considering that pitchers are acting like he’s the reincarnation of Barry Bonds and only throwing him strikes 34% of the time. But Bautista doesn’t even have the lowest swing rate in the AL; that belongs to Carlos Santana at 31%. This can’t be a good long-term strategy: while Santana is walking a lot, he’s also getting thrown strikes 43% of the time and striking out at a higher rate than last season. His plate approach still needs some refining.

On the other end of the spectrum, Vladimir Guerrero is giving new meaning to the phrase “swing at anything”. He’s swinging at 64% of pitches he sees, which is crazy high even for him (career 60% swing rate). Other players with notable high rates: Alfonso Soriano (58%) and Robinson Cano (57%).

Contact Rate – 100 PA

The list of players with low contact rates shouldn’t surprise anyone. Adam Dunn, Carlos Pena, and Nelson Cruz have all made contact on only 66% of their pitches, but that rates isn’t a large aberration for any of them; they’re simply sluggers that strike out a lot. Mike Stanton is looking to join their group, though, making contact on 68% of the time.

Where aren’t any big surprises on the other side of the list either. There’s nobody dramatically performing better than expected, and the list if topped with slap hitters like Michael Brantley, Ichiro, and Denard Span.

Strikeout Rate – 150 BF

Now we switch over the pitchers, and there are immediately some odd results so far this season. Matt Garza leads the majors with nearly 12 K/9? While Garza has always had the stuff to be a dominant pitcher, he’s never struck out more than 8.3 per nine over the course of a season before. And on the opposite end of the spectrum, there are a number of pitchers with low strikeout totals so far this season. Wade Davis, Clay Buchholz, and Jordan Zimmermann are all young starters that have struck out over 6 batters per nine in past seasons, but are averaging only around 4.5 strikeouts per nine (or less, in Davis’ case) this season. These pitchers are just barely over the 150 BF threshold, but I’d keep my eyes on them just in case.

Groundball Rate – 150 BF

As if there wasn’t already enough reason to be worried about John Lackey, it turns out his groundball rate has plummeted this year from 45% to 33%. Meanwhile, his rotation-mate Jon Lester has increased his groundball rate for the third year in a row, bringing it all the way up to 58% so far this year. The largest increase in the majors, though, comes from the enigmatic Charlie Morton, who has increased his groundball rate from 47-50% all the way to 64%.

Are all of these players with dramatic increases or decreases in their stats going to continue to perform at this rate over the rest of the season? No, of course not. But in each of these cases, the sample size has grown large enough that we can realistically consider that their skill level may be different than what we originally projected for them this season. Only time will tell in all of these cases, but don’t ignore the early returns. There’s value to be found in them if you look in the right places.




Print This Post



Steve is the editor-in-chief of DRaysBay and the keeper of the FanGraphs Library. You can follow him on Twitter at @steveslow.


12 Responses to “‘Stabilizing’ Statistics: Interpreting Early Season Results”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. lex logan says:

    Pizza Cutter’s methodology was impressive, but his conclusions fly in the face of what my stats books tell me about rate statistics — the variance of a sample proportion (and therefore the ability to infer anything about “true rates”) depends only on the sample size and the proportion. There is no way different rates can exhibit different “stabilizing” behavior unless the events are non-independent. Take groundball rates — do pitchers compensate by varying what they throw based on how many ground balls they’ve been getting recently?

    Since Pizza Cutter’s article was published in 2007, we’ve had three more seasons — how about double-checking his work by looking at rate stats at the end of April and see how well they project the rest of the season? Pizza Cutter mentioned social science in his article — one hallmark of the social sciences is that no one ever verifies anyone else’s work. We should not accept that for baseball analysis. Looking at the past three seasons would give us some clue how much faith to put in these early returns for 2011.

    Vote -1 Vote +1

    • Chris says:

      PC’s study looked at statistics from a given year to determine when different statistics “stabilized.” He was trying to determine “at what point in the season (measured by PA) do stats go from being garbage to being meaningful and actually describing something about the player?”

      There are two problems with using that here.

      1) PC overstates his case. 50 PA is not the point at which swing % “go[es] from being garbage to being meaningful,” it is the point (rounded to the nearest 50 PA) where swing % goes from having a split-half correlation of .7.

      In other words, 50 PA is not some kind of talismanic number where a light bulb suddenly goes on. It is simply the point at which you can say that a guy with a swing % of N, in a league where the average swing % is M, has a “true” swing % of (N+M)/2. Before that point, you can still say that his true level is closer to N than M, and after that point, you can continue to build evidence that his true level is closer to N than (N+M)/2.

      2) PC’s research was done in a vaccum, where he was simply looking at a player’s performance in a single season. It is not directly attributable to the question here, which is that N PA tells you when someone’s skill level has changed. If you want to determine that, the size of the sample isn’t the only thing that matters — how much the sample differs from expectations matters as well.

      Lance Berkman isn’t anywhere near the threshold where his HR/FB rate “stabilizes” (300 PA), yet ZiPS has gone from projecting him to hit 17 HR over 504 PA before the season to projecting 17 HR over 413 remaining PA as of today.

      Why? Because Berkman’s performance to date is already statistically significant. He is multiple standard deviations above his projected performance to the point where it is more likely than not that his projection is no longer accurate going forward.

      Calculating exactly how much significance his performance to date has — and thus how much to adjust his projection — will relate to similar concepts as “stabilization,” but you cannot simply treat any PA/BF threshold as a magic number where things suddenly become meaningful. Every PA and BF carries some tiny bit of significance, and if they consistently point in the same direction, they can add up to real meaning long before a statistic “stabilizes.”

      Vote -1 Vote +1

      • lex logan says:

        I don’t understand why no one uses confidence intervals with sabermetrics. Point estimates are essentially worthless; as you say, they don’t switch from meaningless to meaningful when you cross some magic threshold. Instead of making observations and predictions and then wasting verbiage with the mandatory “Small Sample Size” disclaimer, why not report a range of plausible values that automatically improves as more data is acquired?

        I admit that the simple confidence intervals taught in introductory stats classes might not be the complete answer; but I suspect the literature has examples of forecasting a range. In project management, for example, times to complete a project are often estimated using a “best case–most likely–worst case” estimate for each component, weighted 1-4-1.

        +5 Vote -1 Vote +1

      • Luke in MN says:

        Well said.

        Vote -1 Vote +1

      • Jason B says:

        I second that – Chris’s and Lex’s replies are two of the most thoughtful and well-stated I’ve read in quite some time.

        Vote -1 Vote +1

    • Colin Wyers says:

      “Pizza Cutter’s methodology was impressive, but his conclusions fly in the face of what my stats books tell me about rate statistics — the variance of a sample proportion (and therefore the ability to infer anything about “true rates”) depends only on the sample size and the proportion.”

      Lex, that’s only true of the random causes of variance. What you’re describing would only be true if baseball were entirely a game of chance – if all players were equally talented, in other words.

      Vote -1 Vote +1

  2. jimbo says:

    oh fiddlesticks! just when I thought I had gotten that one bit of intel that would give me the deciding edge lex has to ruin everything.

    good story and response, thanks. go charlie morton….. hope you didn’t take anything from doc ellis to help you make this adjustment…..

    Vote -1 Vote +1

  3. imabookie3 says:

    How are you going to come up with a confidence interval?

    Vote -1 Vote +1

    • lex logan says:

      A simple formula is p +/- z*sqrt(p(1-p)/n) , where p (stands for proportion) is the observed rate, n is the number of observations (sample size), sqrt means take the square root, and z* depends on the desired level of confidence. Common values for z* are 1.96 for 95% confidence, 1.645 for 90%, and 1.28 for 80%. For Jordan Zimmerman’s 2011 GB%, he’s had 59 grounders out of 139 batted balls, for a .424 rate. An 80% confidence interval would be
      .424 +/- 1.28 sqrt(.424(.576)/139, or .370 to .477 .

      The advantage of confidence intervals are that as you acquire more data, you are dividing by a larger “n”, so the confidence interval gets tighter; you don’t need a fuzzy “small sample size” disclaimer. However, does a GB% of between 37% and 48% tell us anything meaningful? And if a batter gets 11 hits in his first 50 AB’s, do we really think his true Batting Average is between .145 and .295? More likely he’s just been unlucky so far, and a BA over .300 is probably more likely than one below the Mendoza line for an established or talented hitter. So I think a better approach would employ “regression to the mean”, but also produce a range rather than a point estimate.

      Vote -1 Vote +1

  4. lex logan says:

    The Zimmerman link is to the 1999 player, not the currently active Jordan Zimmerman.

    Vote -1 Vote +1

  5. channelclemente says:

    Just a question, if a player or group of players are aware of and reacting to a statistic, is there any relationship to the cumulative stat or it variance damping at a different rate than if the player was ignorant of the stat? I’m thinking of Greinke’s efforts to refine his ground ball rate last year.

    Vote -1 Vote +1

    • lex logan says:

      That possibility — feedback basically — might explain some of the differences in stabilization Pizza Cutter found. If every event were known to be independent, I would dismiss PC’s findings out of hand. But the possibility of ongoing adjustments suggests the classic probability models may not be appropriate.

      Vote -1 Vote +1