- FanGraphs Fantasy Baseball - http://www.fangraphs.com/fantasy -

The Quest to Predict HR/FB Rate, Part 4

**For those of you who have listened to the Fantasy Baseball Roundtable radio show in the past, I wanted to share that we are officially back on the air! And if you have never listened, well here’s your chance to hear my manly voice talking about nerdy stats. Listen live every Wednesday night at 9-10 PM EST.

The quest continues! On Monday, Chad Young and I set out on a journey to try utilizing the fly ball and home run distances and angles found on Baseball Heat Maps in an attempt to answer several questions regarding a hitter’s HR/FB rate. As expected, we found a strong correlation between it and batted ball distance. But, distance alone wasn’t telling us the whole story. Chad decided to incorporate batted ball angle and the previous season HR/FB rate and that certainly improved our equation. Then yesterday I took that another step further and found that including the HR/FB rate from two seasons ago was even better. But, I wasn’t satisfied. A hitter’s HR/FB rate in 2012 should not be affected by how he performed in the metric in previous years. We pinpointed what we thought may be one of the major hindrances and that was the way the angle data was presented as an average.

Well sure enough, like a superhero, Jeff Zimmerman came through. Not only did he provide me with batted ball angles that take the average of the absolute values of each angle (rather than differentiating between left and right field with one side a negative number and the other a positive), the data listed every single hitter instead of the truncated list of around 250 per season on the leader boards page of his site. Better data and more of it?! I was giddy.

I then spent hours in my mother’s basement playing around with the data and crunching the numbers. The good news? We are making progress folks! I ran all the same regressions and tests that I did in Part 3, but this time included the more useful angle data. I also experimented with squaring both the distance and angle due to the suggestions of many. I required a minimum of 30 fly balls and home runs and ended up using 1,926 player seasons from 2008-2012. Here are the results:

Model Adjusted R-Squared
Distance 0.5613
Distance & Distance^2 0.5713
Distance & Angle 0.5852
Distance, Distance^2, Angle, Angle^2 0.5980
Distance^2 & Angle^2 0.5892
Distance^2 & Angle 0.5900
Distance & Angle^2 0.5839

The exciting part is that whereas we saw essentially no change in the R-Squared when adding the old angle data, we now do see a meaningful increase compared to the first two models that only include distance. Unfortunately, that increase was not as significant as I had hoped. You might remember that in Part 3, the best model was Yr1 and Yr2 HR/FB ratio and Yr3 Distance, with an R-Squared of 0.5848. The best model here is barely ahead. This is perfectly fine. My mission was to find a model that only used current year data so I did not have to use HR/FB rates from previous seasons. The R-Squared is now just as good as the model that cheated.

The model that included distance, distance squared, angle, and angle squared proved to be best and it was confirmed through another residual sum of squares test. The equation is thus:

xHR/FB = (-0.00845 * distance) + (0.00002 * distance^2) + (0.02125 * angle) + (-0.00043 * angle^2) + 0.61064


The max distance from those 1,900 plus player seasons was 324 feet, while the max angle was 26.2 feet. I was curious what HR/FB rate the above equation would spit out given those maximums. If a player averaged both those two numbers, the result from the equation would act as a cap for the highest xHR/FB it would ever produce. That answer is 27.4%, which is usually very close to where the MLB leader sits.

However, as you can see in the chart, the equation still seems to underestimate the top tier of home run hitters. The equation produces an xHR/FB of 20%-25% for very few players, only 39 to be exact. Yet, 114 hitters in the data set posted an actual HR/FB rate of at least 20%. The thought was that squaring the data would give an extra boost to the higher distances, but it doesn’t seem to have been enough. I have been playing around with raising the distance to various higher powers like 10 and 15 times to get the boost we need.

So we are definitely inching closer, but we know there is at least one variable not accounted for yet: park factors. That would be impossible to incorporate though, so the search continues for the holy grail. A player list with xHR/FB rates including an analysis of the hitters with the largest discrepancies is coming….as soon as I feel we have exhausted all avenues and are left with an equation we are unable to improve any further (which might very well be the one just above!).