Recently, in these pages, I’ve been looking at the relationship between ground-out/air-out ratios (GO/AO) — available in slightly different iterations at MLB.com and Baseball Reference (B-R) — and the ground-ball percentages (GB%) we host here at FanGraphs. (See Part One and Part Two of this most exciting of explorations.)
We know that GB%s measure the percent of all batted-balls that were hit on the ground and that GO/AOs are a ratio only of ground-outs to air-outs (with the Retrosheet data from B-R appearing to include line-outs as part of the air-out data and MLB appearing to exclude line-outs entirely). Despite this difference, it appears as though the correlation between GB% and GO/AO — especially the iteration available at B-R — is rather strong. Plotting actual GB%s against “expected GB%s” (xGB%) — that is, the exptected GB% using only GO/AO as the input — gives us a correlation coefficient of .98 for qualified pitchers from 2002 to 2010. The root mean square error (RMSE) between the GB%s and xGB%s is a mere 1.4%.
Because the data at B-R goes back to 1950, what started off merely as an attempt to provide a reference for people who had access to GO/AO but not GB% has potentially become an opportunity to estimate with some accuracy the GB%s of pitchers going back to the middle of last century.
“OMG” is what you’re saying to yourself, I’m sure.
Because I’m not very smart, this work is unfolding somewhat slowly — and with no little help from statistical luminaries such as Tangotiger, Colin Wyers, and Eric Van.
What I’ve done since last time is apply our equation to the average MLB GO/AOs at B-R for every year going back to 1950.
Here’s what it looks like:
A couple points on that:
1. The average xGB% for all years combined since 1950 is 43.7% — i.e. very, very close to the 43.8% mark for the years (2002-10) for which we have actual ground-ball data.
2. The graph raises an immediate question: Why do the xGB%s rise so dramatically from 1950 to the mid-1960s?
Here are some guesses to the question in point No. 2:
1. The data is somehow bad. For whatever reason, more air-outs or fewer ground-outs appear in the data than actually exist, thus skewing GO/AO numbers and producing lower xGB%s.
2. Pitchers were actually inducing fewer grounders.
3. Pitchers were inducing the same number of grounders, but infielders were converting fewer of those grounders into outs.
4. Perhaps owing to larger parks or weaker batters or different hitting approaches or some other reason, a greater percentage of balls hit in the air were caught by fielders.
We can sort of test this last point by looking to see if there’s any relationship between an increase in home runs and a decrease in the amount of air-outs recorded relative to ground-outs.
First, here’s a graph of home runs per batted ball since 1950 (where “batted ball” is actually defined, for purposes of ease in finding the data, as any plate appearance that ended in neither a walk nor strikeout):
We see here a spike in that early part where we were getting lower-than-expected xGB%s.
On the other hand, when we plot our HR/Batted Ball data against all xGB%s, we find basically no correlation, at all.
Is it possible that an increase in HR per Batted Ball explains the early spike (1950-1964) but does little to account for changes after that?
Maybe, actually, yes:
If I’m concluding anything from this, it’s that I’m wary of making any statements about pitcher xGB%s previous to 1964. On the other hand, the GO/AO numbers since 1964 appear to be mostly stable — producing xGB%s neither greater or less than present-day rates by more than 1%.