Breaking New Grounders: Ground Balls Since 1950
Recently, in these pages, I’ve been looking at the relationship between ground-out/air-out ratios (GO/AO) — available in slightly different iterations at MLB.com and Baseball Reference (B-R) — and the ground-ball percentages (GB%) we host here at FanGraphs. (See Part One and Part Two of this most exciting of explorations.)
We know that GB%s measure the percent of all batted-balls that were hit on the ground and that GO/AOs are a ratio only of ground-outs to air-outs (with the Retrosheet data from B-R appearing to include line-outs as part of the air-out data and MLB appearing to exclude line-outs entirely). Despite this difference, it appears as though the correlation between GB% and GO/AO — especially the iteration available at B-R — is rather strong. Plotting actual GB%s against “expected GB%s” (xGB%) — that is, the exptected GB% using only GO/AO as the input — gives us a correlation coefficient of .98 for qualified pitchers from 2002 to 2010. The root mean square error (RMSE) between the GB%s and xGB%s is a mere 1.4%.
Because the data at B-R goes back to 1950, what started off merely as an attempt to provide a reference for people who had access to GO/AO but not GB% has potentially become an opportunity to estimate with some accuracy the GB%s of pitchers going back to the middle of last century.
“OMG” is what you’re saying to yourself, I’m sure.
Because I’m not very smart, this work is unfolding somewhat slowly — and with no little help from statistical luminaries such as Tangotiger, Colin Wyers, and Eric Van.
What I’ve done since last time is apply our equation to the average MLB GO/AOs at B-R for every year going back to 1950.
Here’s what it looks like:
A couple points on that:
1. The average xGB% for all years combined since 1950 is 43.7% — i.e. very, very close to the 43.8% mark for the years (2002-10) for which we have actual ground-ball data.
2. The graph raises an immediate question: Why do the xGB%s rise so dramatically from 1950 to the mid-1960s?
Here are some guesses to the question in point No. 2:
1. The data is somehow bad. For whatever reason, more air-outs or fewer ground-outs appear in the data than actually exist, thus skewing GO/AO numbers and producing lower xGB%s.
2. Pitchers were actually inducing fewer grounders.
3. Pitchers were inducing the same number of grounders, but infielders were converting fewer of those grounders into outs.
4. Perhaps owing to larger parks or weaker batters or different hitting approaches or some other reason, a greater percentage of balls hit in the air were caught by fielders.
We can sort of test this last point by looking to see if there’s any relationship between an increase in home runs and a decrease in the amount of air-outs recorded relative to ground-outs.
First, here’s a graph of home runs per batted ball since 1950 (where “batted ball” is actually defined, for purposes of ease in finding the data, as any plate appearance that ended in neither a walk nor strikeout):
We see here a spike in that early part where we were getting lower-than-expected xGB%s.
On the other hand, when we plot our HR/Batted Ball data against all xGB%s, we find basically no correlation, at all.
Regard:
Is it possible that an increase in HR per Batted Ball explains the early spike (1950-1964) but does little to account for changes after that?
Maybe, actually, yes:
If I’m concluding anything from this, it’s that I’m wary of making any statements about pitcher xGB%s previous to 1964. On the other hand, the GO/AO numbers since 1964 appear to be mostly stable — producing xGB%s neither greater or less than present-day rates by more than 1%.




Carson: what if you focus ONLY on the Dodger game in the pre-1966 time period? That data is apparently very solid, since we even have pitch data there. I’d like to see if there’s a difference.
I wasn’t aware of that. Where is it available?
Retrosheet. What I mean to ask is if you repeat your xGB% chart (your first one), but ONLY with the Dodger games. Do you still get the same pattern or does it look smoother?
A shrinking of ballparks from 1950 to 1964 would help explain the data neatly. Looking at the last chart, the next thing I’d want to do is calculate the aggregate yearly HR/Batted Ball in all the parks that were replaced or added* at any time in these years versus in all those which weren’t. Comparing the two year-by-year will tell you if parks were shrinking.
*Braves / County
Sportsman’s / Memorial
Shibe / Municipal
Griifith I / II / RFK
Ebbets / Coliseum / Dodger
Polo Grounds / Seal / Candlestick
+ Wrigley (LA) / Dodger
+ Metropolitan
+ Colt
+ Polo Grounds / Shea
Versus the other 10 franchises.
Oh, and thanks for the shout-out! This is fabulous work.
I’m humbled, sir — by your praise specifically, and by my shortcomings in attempting to wrestle this work to the ground. (And also, apparently, my shortcomings in constructing a decent metaphor with regard to my shortcomings.)
As for the suggestion about ballparks, I’ll consider tackling it. But also: it looks hard. Would it be hard?
Whether it’s hard completely depends on the data set you’re working with, and / or how quick and comfortable you are processing Retrosheet data.
It’s actually pretty easy, I think, and I might tackle it myself after the Academy Awards. I have eight movies to see before then (the curse of multiple obsessions strikes again!).
The “early spike” that you date 1950-64 actually seems to peak a bit later. Bear in mind that after 1968, when batting averages were very low (in the AL the top average was .301, the second-best .290) changes were made, notably to the mound, to increase offense. In other words, if the peak is about ’68, there could be a reason why the curve would rise to that point, then drop a bit and stabilize.
While it looks like xGB% does peak around 1969, it was on a clear and steady rise from the mid-1950s on.
Considering the grouping over the past 8 years or so, my guess is that previous data simply wasn’t sufficient. Recent data points are grouped closer together than during any other time frame on the chart. More consistent data hints at (though doesn’t necessarily prove) more reliable data.
A thought-provoking read.
Is there any chance that integration plays a role in the style of baseball (rebirth of small-ball a la #42/Wills/Mays/Brock) or the impact of speed in the outfield in that period? My impression is that the expansion of the talent pool was weighted toward a huge influx of rangy outfielders. This coupled with the greater availability of hitters with plus speed might have fed into deliberate shifts in batting approaches or managerial strategy. The list of potential explanations for the difference in GO/AO over that span understates that it represents a rising trend, not just an era with a distinct mean. Option 4 mentions hitting approaches and the possibility that more flyballs were caught in ’50 than ’64. Perhaps the decreasing denominator in the ratio (AO) reflects that for a growing number of batters, it made sense to keep the ball on the ground.
Excellent work, thoroughly enjoyable and informative.
The rapid increase in the late 50s corresponds with a significant change in glove technology. Modern gloves with the edge-u-cated heel started replacing the old gloves that had been worn for generations in the late 1950s.
I’m not sure how this change would affect GB rates, but it has to have some impact.
I dunno if this actually made such a big difference, but this is an awesome insight.
If you look at the data, ’50-’58 is one era, ’64 and onward is another, and ’59-’63 is the transition between them. If the use of the new gloves began significantly in ’59 and their use was universal by ’64, that would fit.
It would make sense that improved glove technology would be much more important to infielders than outfielders, based on the frequency with which modern players get their glove on a ball but fail to make the play.
This excellent Rob Neyer article could help explain the increasing GB tendency. A major switch, as Neyer suggests happened in the 70′s, from curve to slider would result in a lot more ground balls, no? And it appears from the article that the pitch gained favor with pitchers over time.
http://sports.espn.go.com/mlb/columns/story?columnist=neyer_rob&id=1786104
I wonder if pitch type usage matters. I know the fastball and change have been around forever, but when were other pitches invented? Perhaps as more pitch types were invented or perfected by some ground balls became more likely.
I think part of what you’re seeing may just be a change in the data.
See the study “FIELDER ‘RANGE’ RATINGS” at
http://www.retrosheet.org/Research/Research.htm
and especially the graph entitled “Per Cent Frequency of Hit Descriptions”.
Since the B-R data comes from Retrosheet I believe, this should be applicable.
Looks like I’m late to the party, but is there a chance xGB% dating back to at least 1964 could be presented on the sight, and possibly something like XxFip (xFip with xGB%). That would be really awsome