Converting GO/AO to GB% (Retrosheet Remix)

Last Friday, I submitted for the readership’s consideration a brief post on how one might convert the ground-out/air-our ratios (GO/AO) found at MLB.com to the ground-ball rates (GB%) found here at FanGraphs. Though, as the much esteemed Tangotiger noted, the work wasn’t entirely grounded (get it?) in logic, the effort satisfied my immediate concern — namely, to create a quick-reference table for translating MLB’s GO/AOs (which are, for example, sometimes included with press-box stat sheets) into the GB%s with which most saber-oriented readers will be more familiar.

Of course, MLB.com is not the only site that publishes GO/AO data. Retrosheet (via Baseball-Reference) has GO/AO ratios going back to 1950. If it so happened that Retrosheet’s GO/AO numbers correlated strongly with our GB%s here, then we might — and I’ll stress might — have a tool with which to look back at some 60 years’ worth of ground-balling data.

To test the correlation between Retrosheet’s GO/AO and GB%, I took every qualified pitcher season from 2002 to 2010 (i.e. the years for which we have GB% data).

Here are the results:

The reader will note that the correlation coefficient here is actually higher than from the single year of MLB data we looked at on Friday.

The reader will also note that Retrosheet’s GO/AOs are generally lower than MLB’s — averaging 1.18 over 2002-10, where the average GO/AO for qualified pitchers in 2010 per MLB was 1.45. This appears to be consistent with the fact that — among other factors — MLB omits line-outs from its data, thus decreasing the denominator.

Calculating the expected ground-ball percentages (xGB%) for all the pitcher seasons in the data set using the equation from the above graph, we find that the root mean square error (RMSE) for said xGB%s comes out to a mere 1.4%. That seems pretty good.

As with last time, here the leaders by GB%:

And the laggards:

Finally, here’s the quick-reference table for the approximate equivalencies between Retrosheet’s GO/AO data and our GB%s:

As I note above, it’s possible that — owing to the strength of the correlation between the GO/AO and GB% data — that we might be able to make some reasonably confident statements about ground-ballers prior to 2002. That will be an area of focus in my next post.

Print This Post

Carson Cistulli has recently started a new project called Paris Matches.

17 Responses to “Converting GO/AO to GB% (Retrosheet Remix)”

You can follow any responses to this entry through the RSS 2.0 feed.
1. Carlos D. Corredor says:

Great post. This also comes in handy to evaluate the players in the winter leagues. Milb.com gives you GB/AO, although I’m thinking they probably calculate it the MLB way (1.45 average) so I guess the final table needs some adjustment.

2. mmoritz22 says:

So it seems like we can use GO/AO rates as a ‘luck indicator’, is that right? As in, maybe a pitcher has a .90 GO/AO rate, but instead of having the 40% GB rate that he should have, it might be 45%, due to some luck?

• Kind of.

Our GB% here expresses the percentage of all fair balls — outs or not — that were hit on the ground. Because GO/AO deals only with outs, it’s obviously going to represent some other variables, as well.

But, yes, one of those variables is luck. If a pitcher’s xGB% (GO/AO as the input) is 40%, but said pitcher ACTUALLY had a 45 GB%, there’s a CHANCE that’s the difference is due, in part, to bad luck (a higher than expected BABIP on ground balls, for example).

It could, of course, also be due to poor fielding. Or excellent outfielders (raising the denominator, the AO), or a lower-than-expected rate of homers per ball in air.

So, I’d be careful about attributing it all to bad luck. Ultimately, if you can, it’s ideal to look at the GB%s. But if you can’t, or if you’re looking at data from 1995, for example, this should act as a decent reference.

So are you saying that Derrek Lowe and Brandon Webb were consistently lucky?

• Telo says:

He’s saying if GA/AO and GB% don’t jive, then there is probably some luck one way or the other.

And I am wondering if he thinks that Brandon Webb and Derek Lowe can have consistent luck. I think they can’t.

• To clarify: the reason Lowe and Webb are on that list is because they’ve posted some of the better GB%s of the last nine years. Ground-ball rate becomes reliable after just 150 or so batters faced. In a typical season, Lowe and Webb have faced 900 or 1000 batters, probably. Also, they’re known for throwing excellent sinkers. So, no, they’re not getting lucky.

Luck might be a contributing factor if, for some reason, an expected rate of those ground balls weren’t becoming outs. But poor infield defense, excellent outfield defense, and lower-than-expected HR/BIA could all be other reasons.

• Barkey Walker says:

While luck may play a factor, there isn’t a huge residual here, so for FIP to be worth a darn vs ERA, that residual had better be strongly predicted by a defensive player performance measure of the team and NOT luck.

Which is to say, perhaps it is the Padres defense, not luck. But Carson Cistulli is not focusing on the luck vs defense part (for now?).

3. Barkey Walker says:

I always thing this site is at its best when you follow up on great comments. Bravo! I look forward to the next installment.

4. tangotiger says:

• Cool. I also just submitted a question for your consideration at your blog — it concerns whether I should normalize GO/AO ratios from the past to modern-day figures, or just use the raw numbers.

5. kds says:

A pitcher is, “ground ball lucky”, if his GB% is much greater than is predicted (GB%+) from his GO/AO. Looking at the top table, Lowe may be consistently lucky, Webb is not. Of course when we say “lucky” here we are just saying that GO/AO explains less of GB% than we would normally expect. But it may well be other things such as defense, that are out of the pitchers control, but could be measured, that we are calling luck here.

6. Nathaniel Dawson says:

Carson, could you also do this as GB/FB ratio? It should work with MLB data, as line drives are excluded. Would be nice to know how to convert to a groundball to flyball ratio as well as GB%.

7. Eric M. Van says:

Re the leaderboard: you can explain 90% of the difference between GB% and xGB% (r = .95, p = .0003) with the formula

GB% – xGB% = 3.4 + .05 * Team UZR – 1.6 * (Pitcher is Brandon Webb)

The p values on the coefficients are .0017 for UZR and .00025 for Webb; of the intercept, 2 * 10^-5. So, yeah, even with a sample size of 10, there’s little doubt that this is actually what’s going on.

I initially looked at INF vs. OF UZR, on the assumption that a disparity would affect GO / AO. They were both significant but the coefficients, surprisingly, were identical. I also looked at (Pitcher is Derek Lowe); that wasn’t significant.

All of these pitchers pitched for bad defensive teams; the ’08 Diamondbacks were least bad at -6.8, the ’03 Red Sox worst at -47.1.

Conclusions in English:

– The formula underestimates the GB% of the most extreme GB pitchers, which is to say the most extreme GB pitchers have a lower GO/AO than expected.

– The better the overall team defense, the bigger the underestimation, which is to say that good defense (anywhere on the field) lowers the GO/AO of extreme GB pitchers.

– Brandon Webb consistently has a higher GO/AO than other pitchers with similar GB%.

– Tossing in team INF and OF UZR into the spreadsheet might be helpful.

It makes sense that an individual pitcher such as Webb might consistently have a higher GO/AO. Why GO/AO is a function of overall team defense for these pitchers, rather than the disparity between infield and outfield defense, is by no means obvious.

• Eric M. Van says:

I should of course have said that the *worse* the team defense, the smaller the underestimation. Since we have no examples of an extreme GB pitcher with a better than average defense, it may not follow that good defense increases the estimation error.

• Eric M. Van says:

OK, the team defense thing is obvious: the better (least bad) defenses are catching more line drives both in the infield and outfield, thus increasing the AO in the denominator. Team defense ability to catch line drives is going to very consistently warp the difference between GB% and GO / AO.

Note that this effect shows up profoundly in this sample size of just 10, while the expected effect of the disparity between INF and OF defense is completely undetectable

• Eric M. Van says:

If anyone is unclear as to the relationship I’ve found here (hey, you visual thinkers!) … if you plot the error of the XGB estimate on the X axis and team UZR on the Y, you get six points that fall in a pretty good line, and four other points which fall on an even better line, parallel to the first and beneath it. The second line is all Webb.

The correlation for the six non-Webb seasons is r = .81, p < .05. For the four Webb seasons it's r = .97, p < .03. The slopes are .050 and .043 respectively. That Webb forms a neater pattern is not surprising, since he's one guy in one park.