# The control hitters have over everything

A couple weeks ago, I wrote an article titled “The control hitters have over LD%,” examining why it’s a bad idea to use single-year line drive rates in any discussion of a hitter’s underlying skills. Afterward, I received an e-mail from a reader who wanted me to go a step further:

Hi Derek,

I really enjoyed your post on the stability of LD% over time. It was very helpful to have the GB% correlation (.65) as a comparison. I want to encourage you to do a post at some point on the stability of a variety of common conventional and sabermetric stats; I fully understand the concept of looking for stable, repeatable skills but I have little idea what is stable and repeatable! For example, how stable is a player’s walk rate? Strikeout rate? HR/FB rate?Just a table of 20 of these stats would be really cool for perspective.

With that, here we go…

### The results

As I said last time, this is far, far from a comprehensive study. For comparative purposes, though, it can be quite useful. Anyway, I looked at all hitters from 2004 through 2008 who amassed at least 350 at-bats in adjacent seasons (and played on the same team both years, to eliminate some park-to-park biases). What you’re seeing is the R-squared results for each stat, which essentially tells us how much of the variation in Year 2 can be explained by the Year 1 figure.

+---------------------------+------+ | STAT | R2 | +---------------------------+------+ | Batting Average | 0.18 | | On-Base Percentage | 0.36 | | Slugging Percentage | 0.37 | | OPS | 0.35 | | ISO Power | 0.52 | | ISO Discipline | 0.60 | | Batting Average with RISP | 0.06 | +---------------------------+------+ | Contact (K) Rate | 0.76 | | Walk Rate | 0.61 | | HBP Rate | 0.37 | | Pitches per PA | 0.61 | +---------------------------+------+ | BABIP | 0.15 | | 1B per BIP | 0.21 | | 2B per BIP | 0.16 | | 3B per BIP | 0.26 | | AB/HR | 0.42 | | HR/FB | 0.59 | | GIDP Rate | 0.13 | +---------------------------+------+ | LD% | 0.09 | | GB% | 0.60 | | OF FB% | 0.52 | | IF FB% | 0.43 | +---------------------------+------+ | SBO% | 0.33 | | SBA% | 0.80 | | SB% | 0.10 | +---------------------------+------+

### Quick takeaways

As we always stress here at THT Fantasy, stats like batting average and BABIP are poor indicators of a player’s actual skill. It’s much better to focus on component skills like contact rate, which is one of the most stable stats around. Home runs are relatively stable, which might surprise some but really shouldn’t—after all, Juan Pierre isn’t going to start posting 30-home run seasons, nor is Ryan Howard going to hit only five home runs.

As we saw last time, line drive rate is very unstable, while the other batted ball stats are much more stable. And for those who like to blame hitters for being “unclutch” with runners in scoring position (I hear far too much of this from fellow Mets fans), check out no. 7 on the list.

### Quick glossary

**EDIT**: I’m adding this late per request. Sorry for some things being a little unclear to begin with.

**ISO Power**: SLG-AVG

**ISO Discipline**: OBP-AVG

**Contact (K) Rate**: Contact rate on a per AB basis (not a per pitch basis). Calculated as (AB-K)/AB

**HR/FB**: Home runs per outfield fly ball

**GIDP Rate**: GIDP/BIP

**LD%**: Line drives as a percentage of all non-bunt balls in play

**GB%**: Groundballs as a percentage of all non-bunt balls in play

**OF FB%**: Outfield flies as a percentage of all non-bunt balls in play

**IF FB%**: Infield flies as a percentage of all non-bunt balls in play

**SBO%**: Stolen base opportunity rate. The percentage of times a hitter reaches first and thus is in position to attempt a steal. Calculated as (1B+BB+HBP-IBB)/TPA.

**SBA%**: Stolen base attempt rate. The percentage of times a hitter attempts a steal given that he is on first base. Calculated as (SB+CS)/(1B+BB+HBP-IBB).

**SB%**: Stolen base success rate. The percentage of times a hitter is successful on a steal attempt. Calculated as SB/(SB+CS).

### Concluding thoughts

That’s all for today. Any questions, feel free to comment or e-mail me!

Print This Post

“What you’re seeing is the R-squared results for each stat, which essentially tells us how much of the variation in Year 2 can be explained by the Year 1 figure.”

Huh? I am sure you’ve done some nice math here, but that sentence makes no sense. Let me give you an concrete example to illustrate.

Year BA

1 .278

2 .302

What you’re seeing is the R-squared results for each stat, which essentially tells us how much of the .024 can be explained by the .278.

I’m not sure what your example means, but the R squared measures how much of the variation among all players in Year 2 can be attributed to the variation among those same players in Year 1.

Brilliant idea for a piece. When doing research for my fantasy team next season, I will be sure to look up guys with high contact rates who have underachieved this season…could be another article even.

“Pitchers per PA” is close to 1.0 for everyone in the league. I’m sure you mean “pitches per PA”.

I would guess that Batting Average with RISP appears to be more unstable from year to year than just Batting Average simply because the sample size, the number of PA we are using, for each season is smaller.

Detroit Michael,

You’re right I changed “pitchers per PA” to “pitches”. Good catch.

You’re absolutely right on BA with RISP as well. If we’re looking at players with 350 ABs for the year, they might only have 150 ABs or so with RISP, so the number is much more unstable. If we were to look at all batters with exactly 350 regular ABs and all batters with exactly 350 ABs with RISP (given a large, fictional, perfectly-constructed-for-our-needs-data set), the correlations would probably be almost identical.

It should be remembered that all of these correlations are artificially high due to the 350 AB cutoffs used – that substantially reduces the variance and therefor increases the correlation. This is why a weighted correlation is preferable.

Gotcha on keeping things simple. I’d probably just report the coefficient on the lagged variable. Under the same assumptions you’re using, it would be just as informative. Under less restrictive assumptions, it would be more informative. Of course, your articles are in any case also extremely informative.

Are these stats defined anywhere? For instance, is OF FB% a percentage of all balls hit that are outfield flies, or a percentage of fly balls that are outfield flies? And what is SBO% and the other SB stats?

Last point: it would be nice to see references and comparisons with the many other studies of this that have been done in the past.