The Quest to Predict HR/FB Rate, Part 2

Yesterday, Mike Podhorzer shared part one of this epic journey and teased that today we would be incorporating batted ball angle to see if that helps our model. I am, in fact, going to do that today.

But first, I am going to take a quick step back and let you know my reasons for doing this. Next, I’ll introduce some data on batted ball angle, as noted above.

A few days ago, Bill Petti and had an extended Twitter conversation about Matt Kemp and Andrew McCutchen that, by the end, centered on whether or not McCutchen’s HR/FB surge in 2012 (8.7% in 2010, 12.2% in 2011 and 19.4% in 2012) was likely to stick. The issue is that when a player suddenly hits more home runs than they did before, it is difficult to know what to expect from them the next year. The first step to answering this is finding the ingredients of a HR rate, which is what Mike and I have been doing so far. The second step is figuring out if we should expect changes in those ingredients to be permanent or temporary. This second step is what started me on my half of this quest and resulted in Mike and I finding each other on the path.

So now that you know why I am here, let’s see if batted ball angle helps.

For all the data today, I am using a slightly smaller set of data than Mike did yesterday. I wanted to look at the HR/FB rate in year X in light of the rate in year X-1, as well as distance and batted ball angle in year X. In order to do this, I needed to take the 1,742 hitter seasons Mike started with yesterday and eliminate all seasons where I don’t have the previous year’s data. This eliminates all 2007 seasons (since we don’t go back prior to that) and any player who didn’t play in the previous year (so, for example, since Brandon Moss only accrued 6 PA in 2011, his 2012 has to be thrown out, as well).

That leaves me with 1,095 hitter seasons for which I have batted ball information, that years HR/FB and the previous year’s HR/FB. With that, let’s look at some correlations to start:

  Correlation to HR/FB
Prev. Year HR/FB 0.6453
Distance 0.7117
Absolute Value of Batted Ball Angle -0.0779

First thing that jumped out at me was that the correlation here for year-to-year HR/FB is much lower than what Matt Klaassen found and Mike noted in the first part of this series. Matt, though, was working with a different data set (2002-2012) and I could come up with at least three reasons for the differences. For one, the smaller data set I am using would allow any outliers or random variation in the data to have an out-sized effect on the analysis, leading to a lower correlation. Second, Matt limited the data set to players who did not change teams, and I did not. This could obviously cause an issue (a guy moving from Colorado to San Diego would expect to see a big drop in HR/FB). Third, something could be fundamentally different from 2002-2007 vs. 2008-2012 that results in less stability in HR/FB. My guess is the third is not a major issue, and I think that for our purposes we can safely work around #1 and #2. What we are looking at here is relative differences in how the variables explain HR/FB in a given year. Park changes, etc. would impact all three correlations equally, so as long as we stick to relative analysis within this data set, we should be fine.

The next thing I noticed was that current season batted ball distance does a better job explaining current season HR/FB rates than the previous season’s HR/FB rate does. The difference is moderate, at best, but it is there.

And, finally, we get a pretty meaningless number for the correlation between batted ball angle and HR/FB rate. This surprised me, at first – HRs are easier to hit down the lines, so a pull-happy hitter should have a higher HR/FB rate. But the fact remains, a batter needs power and distance more before you can worry about angle. A 400 ft drive will make it out of more parks, regardless of angle, than a 300 ft. drive hit straight towards the pole. What we need to do is look at the impact batted ball angle has once you account for fly ball distance.

To do that, I ran three regressions:

Variables R-Squared
Distance 0.5066
Angle 0.0052
Angle and Distance 0.5104
Prev Year HR/FB 0.4164

The results for distance and for angle shouldn’t surprise you. If the first row looks different than what Mike showed you yesterday, it’s because I am using the smaller data set of players for whom we have two years of data. The disappointing result here is that the R-Squared for the regression with both distance and angle is pretty much no different than the one with just distance. Batted ball angle hasn’t really helped us at all.

This does not mean, however, that batted ball angle isn’t useful here, as it may serve as a proxy for a change in approach (the way many people, myself included, have read the shift in approach and significant power improvement from Jose Bautista), but we will get into this deeper in a later part.

Before we go for the day, I want to bring in two more regression results:

Variables R-Squared
Angle, Distance and Prev. Year HR/FB 0.6027
Distance and Prev. Year HR/FB 0.5991

And now we see the largest R-Squared yet – the result of a regression including Distance, Angle, and Previous Year’s HR/FB rate as variables. Leaving out Angle gets us to almost the same R-Squared.

What does all of this suggest? Well, first it suggests that we still have a lot of work to do, an that more will be coming. Second, it suggests that looking at both distance and previous HR/FB rates will help you do a better job of explaining the current HR/FB rate than looking at either one of them alone.

Tomorrow, Mike will get us one step closer to a method for finding a expected HR/FB equation.

Print This Post

Chad Young is a product manager at Amazon by day and a baseball writer (RotoGraphs, Let's Go Tribe), sports fan and digital enthusiast at all times. Follow him on Twitter @chadyoung.

Sort by:   newest | oldest | most voted
Daniel Schwartz

I am commenting just b/c you deserve credit as well ;)

Excellent insight