Basic Hitting Metric Correlation 1955-2012, 2002-2012

The distinction between observed performance and true talent is one that is, in a way, intuitive, yet tends to be elusive. Even the most careful of us can slip from talking of one to talking of the other. Determining the difference for a specific skill or a specific player can be difficult, but the general idea itself is not so hard to understand. Even the most casual fan of baseball understands that pretty much any player can go 0-4 or 4-4 in any given game without thinking that player’s true talent batting average is either .000 or 1.000. That understanding already contains the basic notion of a proper sample size and its relation to true talent. The reader can peruse the appropriate sections of the Sabermetric Library to get caught up.

Instead, this post is going to look at some specific metrics for hitters (in the near future I may do a similar post for pitchers) and compare how well they correlate from from year to year. This gives us an idea of how these assorted metrics compare (relative to each other) with respect to predicting a player’s future performance (another way of saying true talent). I have included a couple of tables for reference, along with some brief commentary.

I make no claims to this sort of project being original or groundbreaking. This sort of project goes far back in the history of sabermetic research, but is worth revisiting. In the recent past, our own Bill Petti posted something very similar on correlation during the 2011 season at Beyond the Boxscore. A different, more mathematically sophisticated version of this sort of thing was done by Russell Carleton (a.k.a. Pizza Cutter), whose research was summarized in post by Eric Seidman. So why am I bothering to do this kind of thing yet again?

It started because I just wanted to see if I could do something like this myself, but that does not explain why it is worth posting. I will let each reader judge that for herself or himself, ultimately, but here are some reasons I thought it was worth sharing. For one, I look at some different correlations for different metrics than I have seen published in similar studies. For example, I wanted to look at how some of the metrics derived from a binomial approach fared compared to others. In addition, I look at a pretty big sample, starting from 1955. That limited what metrics I could look at (more on that below), but, well, gave me a bigger sample. So for people interested in the same sorts of metrics I am (as well as others), this post could serve as a quick reference.

This little project does not establish how much each of the following metrics (component or otherwise) should be regressed or weighted, it merely gives an idea of how much regression or whatever might be required to do so for one metric as compared with another. It should also be said that year-to-year correlation is not the only, or necessarily the best way to determine such things. It is, however, relatively easy to do, which is why I chose it for this relatively unambitious project.

I have included two tables for two different sets of correlations. The first covers all seasons from 1955-2002. I excluded pitchers hitting, and used only player-seasons with at least 400 plate appearances. That does introduce a sort of selection bias, but since we are typically concerned with major-league regulars, I do not think it is a big issue. I also tried to minimize the affect of park-switches by only including correlations from players who are with the same team (although if the team switches home parks, that would obviously make a difference, but that was a bit more complicated to code. Sorry.).

I included various sorts of hitting metrics that have different uses. One metric having a higher year-to-year correlation than another does not necessarily make it “better,” it simply means it has a higher year-to-year correlation and, by itself, it may require less regression to the mean to get from observed performance to true talent. Some metrics included may seem redundant, like home runs per contacted ball (the binomial version) and the simpler home runs per plate appearance, but I wanted to see how they compared. With those qualifications out of the way, here are the correlations from 1955-2012:

Hitting Metric Year-to-Year
SO/(AB+SF) 0.872
SO/PA 0.868
HR/(AB-K+SF) 0.809
HR/PA 0.785
uBB/(PA-HBP-IBB) 0.781
ISO 0.767
SLG 0.678
wOBA 0.641
HBP/(PA-IBB) 0.641
iBB/PA 0.635
OBP 0.632
3B/(2B+3B) 0.515
3B/PA 0.508
AVG 0.477
BABIP 0.463
(3B+2B)/(H-HR) 0.461
2B/PA 0.455

For those who have seen this sort of thing before, there probably are not any surprises. What stands out to me, initially, is the ordering of the three true outcomes. Strikeout rate, in either of its forms, has the highest correlation. With home run rate correlating so well year-to-year, it is not all that surprising that isolated power does, too, although slugging surprised me a bit. They correlate slightly higher than wOBA, but that should not be taken as a problem with wOBA. After all, wOBA is a composite stat that depends on just about everything else in here. I would guess that few people are taken off-guard by batting average and BABIP ranking so low here, but the rate of doubles being even lower might raise some eyebrows. I found it pretty funny that hit by pitch and unintentional ralk rates correlate better than batting average. Note that while the binominal versions of metrics included here had quite small advantages in correlation over their per-plate-appearance counterparts, that is not the only reason those are to be preferred.

The second table includes all of the above metrics and more, such as those which depend on batted-ball classification. That data is only available since 2002, which is why I have included a separate chart so that the comparisons would draw from the same sample of hitters (since 2002, at least 400 plate appearances, matching teams). Those classifications are still somewhat controversial, and the following should not be taken as a comment either way about their accuracy or usefulness. Obviously, with different samples, the correlations for metrics found in both charts will be different. Without any further ado, here are the results of that research.

Hitting Metric Year-Year
Contact% 0.896
SwStr% 0.886
SO/(AB+SF) 0.870
SO/PA 0.861
Swing% 0.851
O-Contact% 0.848
Z-Swing% 0.845
O-Swing% 0.826
Z-Contact% 0.821
GB% 0.793
HR/(AB-K+SF) 0.775
Zone% 0.768
uBB/(PA-HBP-IBB) 0.765
FB% 0.759
HR/FB 0.740
HR/PA 0.737
iBB/PA 0.734
ISO 0.722
HBP/(PA-IBB) 0.627
OBP 0.611
3B/(2B+3B) 0.606
SLG 0.605
IFFB% 0.605
F-Strike% 0.593
3B/PA 0.590
wOBA 0.581
AVG 0.427
(3B+2B)/(H-HR) 0.407
BABIP 0.373
2B/PA 0.315
LD% 0.293

It is hardly surprising that contact on swings would correlate so highly compared to other metrics, given that we have already seen that strikeout rate (which follows is closely) does so. As has been commented on many times before, it is curious that fly ball and ground ball rates both correlate so well, while line drive do not. I will leave that discussion to others.

Much else could be said, and different people will gain or notice different things depending on what they are looking for (or not looking for). Hopefully this will be helpful to others.

Print This Post

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

Sort by:   newest | oldest | most voted
Dan Rozenson
Dan Rozenson

If I had to guess, I’d say the large variation in LD% owes to a lower base rate. That magnifies the apparent fluctuations.

Unless I’m making a math logic error.


I’d guess that some of it is due to scoring issues also. There’s been some research that has shown that hits are more likely to be scored as LDs, and outs more likely to be classified something else.

A “Line drive” handled by the shortstop may be hit on the same trajectory as a “ground ball” hit up the middle and handled by the CF.

Brian Cartwright
Brian Cartwright

3B/(2B+3B) has a low rate, generally under 0.10, with small samples (at most 50 chances per year, many times 30-35) and it’s correlation is .606. LDs are just too subjective, even when Matt limited the sample to players on the same team in consecutive years.