As a forecaster of baseball player performance for usage in my fantasy leagues, I am always looking for new data to incorporate into my projections and new methods for predicting outcomes. Home runs are a result that excites all, even chicks dig ’em. But how can we better determine whether Chase Headley‘s breakout, for example, was for real? The answer may lie in newly available data collected by our own research extraordinaire Jeff Zimmerman.
Historically, my home run projections have been a function of the hitter’s contact rate, fly ball rate and home runs per fly ball rate (or HR/FB). The contact and fly ball rates are pretty easy to project. Recently, Matt Klaassen found these two metrics correlated year-to-year at 0.896 and 0.759, respectively, from 2002-2012. The HR/FB rate was less stable at 0.740, but that was actually pretty close to FB%. The goal is to identify a metric or group of metrics that are more stable than HR/FB rate and would therefore be a better predictor of it.
That’s where Zimmerman’s “Angle and Distance of a Hitter’s Batted Balls” tool comes in. What this tool provides is the average distance and angle of every type of batted ball hit by a batter for any date range. If you want to get even more granular, you could even indicate the pitch type that the batted ball was hit off of. Throughout last season, we have referenced a hitter’s average home run plus fly ball distance a lot. Angle, however, has gotten limited virtual ink. What the angle tells us is where on the field the balls are hit, on average. From this, we could quickly determine if a hitter is pull-happy or hits to all fields. While we have used this data a lot, there has been no exhaustive studies done to determine how useful the data actually is. Until now.
Chad Young and I have teamed up to analyze the data going back to 2007, for a full six seasons of numbers. The first thing I wanted to do was determine how average distance and HR/FB ratio correlated with each other. I included 1,742 hitter seasons from 2007 to 2012, which was everyone listed on the leader boards on Baseball Heat Maps. Here is a plot of the results.
The good news is that the two are indeed positively correlated as we would expect. The P-value was ridiculously tiny at the 95% confidence interval, so we know there is no randomness here. The R-Squared is just barely lower than the 0.5476 mark for HR/FB ratio from Year 1 to Year 2 as found by Matt Klaassen. So we are definitely close and on the right track. The ultimate goal is to identify additional components that would increase that R-Squared to a level well above the year-to-year HR/FB R-Squared.
Right now, the best fit equation doesn’t work so well at the top end. If you take the hitter who had the highest average distance last year and plug it into the equation in the graph, you will quickly realize we haven’t reached the finish line just yet. Matt Kemp‘s 313.26 average distance led baseball (well, led those listed on the leader board, which required at least 46 total home runs plus fly balls to appear). If we plug Kemp’s distance into the equation, we get an expected HR/FB rate of 19.7%. While that is pretty close to Kemp’s actual mark of 21.7%, we realize that the highest expected HR/FB rate this formula would calculate doesn’t even reach 20%. Last season, 16 hitters posted a HR/FB rate of at least 20%, so clearly there is more that this formula isn’t capturing.
Earlier in the article, I mentioned that I wanted to identify components that were more stable year-to-year than HR/FB rate. Unfortunately, we found that average home run plus fly ball distance only had a year-to-year correlation of 0.61. While that’s pretty good, it’s not as high as HR/FB rate itself. Of course, that doesn’t mean we should stop our analysis here. At the very least, average distance could simply be another piece of data to analyze when projecting HR/FB rate, the same way SwStk% is a metric not to be ignored when projecting a pitcher’s strikeout rate.
In future parts, we will be incorporating batted ball angle to see if that improves our R-Squared. It would make sense for it to play a role, no? The distances down the lines at ball parks are much shorter than in and around center field, so given equal distance, a pull hitter has a much better shot of seeing his balls clear the fence. We will also be looking at what a significant spike or drop in average distance means for future seasons. Last, we hope to identify what other variables to include to come up with a best fit equation good enough so that we could give you some names of potential HR/FB surgers and decliners.
Oh, and for those wondering after my tease in the intro, Chase Headley’s average distance in 2012, 2011, 2010 order: 303, 282, 283. His 2012 distance ranked 10th on the leader board. Maybe he’s not a fluke after all?