- FanGraphs Fantasy Baseball - http://www.fangraphs.com/fantasy -

The Definitive Home Run Tracker Study

In 2006, ESPN Home Run Tracker (then known as Hit Tracker Online) launched to provide us stat fiends with data about every home run hit in baseball. It wasn’t enough anymore to know that Ryan Howard led all of baseball with 58 long balls. No, we craved more information, and more information we got, in the form of distance, speed off the bat and a classification into a specific bucket depending on how far past the fence the ball flew. For years, I have been referencing a hitter’s percentage of “Just Enough” or “No Doubt” home runs, operating under the assumption that they have some predictive value. Although I had done a small scale study several years ago which did seem to support this thesis, nothing exhaustive has been done since. Until now.

Process

I first collected the home run totals for every single batter from 2006 through 2012 that fell into each bucket, which include the classifications of “No Doubt”, “Plenty” and “Just Enough”. “Lucky” home runs is another classification, but they are a subset of the three buckets and do not represent their own category. Any home run type could be considered lucky and it is defined as a “home run that would not have cleared the fence if it has been struck on a 70-degree, calm day.” I then calculated the percentage of total home runs each player hit for each bucket (e.g., the percentage of Matt Kemp’s total home runs that were labeled No Doubt), including lucky homers. Last, I added the hitter’s HR/FB rate, as well as his total fly balls (for proper weighting), to my massive spreadsheet. I was then left with a large population of 3,090 hitter seasons. The following table represents the averages for the entire population from 2006-2012.

HR/FB Rate No Doubt % Plenty % Just Enough % Lucky %
10.9% 18.4% 48.9% 32.7% 6.6%

The league average rates per classification bucket will be helpful when comparing an individual player’s distribution. Next, I narrowed down the 3,090 hitters to only those who played in consecutive seasons in order to investigate what happens in year two. That left me with a population of 1,943 batters to work with. The relevant data for this population is thus:

Yr 1 HR/FB Rate Yr 2 HR/FB Rate No Doubt % Plenty % Just Enough % Lucky %
11.2% 11.0% 18.6% 49.1% 32.3% 6.6%

Finally, I had the data set to begin my quest. Since I wasn’t exactly sure how to best splice the data, I decided to go all out and test many different ways. I first started by using the entire population of hitters and taking the top and bottom 10% of each bucket. But, that didn’t yield meaningful results, likely because of the small number of homers for many of the players that cause the percentages to mean little. I then decided to only use those hitters who hit at least 15 home runs in year 1, and again stuck with the 10% cutoff. Next, I looked at only those hitters who hit at least 10 homers in year 1, with the same 10% grouping. Last, I took that same data set (Yr 1 HR >= 10), but shrunk the cohort group to the top and bottom 5%.

For each method, I compared the test group’s average HR/FB rate in year 1 to the average rate in year 2. But, the key was to then compare the year 1 to year 2 change in HR/FB rate to the change a control group taken from the appropriate population experienced. So, I arbitrarily decided on a control group of 200 hitters who in aggregate averaged the same year 1 HR/FB rate as the test group it was being compared to. It is not enough to know the test group’s HR/FB rate declined from 12% to 11%, because the control group may have seen its HR/FB rate decline as well. So we must compare the changes to learn what, if any, predictive value the various buckets have.

Keep in mind that all data sets include only those batters with consecutive seasons and use a 200 hitter control group. And apologies in advance for the presentation of all the data as it was difficult to determine how best to display it.

No Doubt Bucket Analysis

Let’s start with the No Doubt (ND) bucket. The hypothesis for this home run type is that a batter who hits a high percentage of ND home runs is more likely to maintain his HR/FB rate, or perhaps increase it. On the other hand, a hitter with either a low percentage of ND home runs or 0 is at greater risk to experience a decline in HR/FB rate in year 2.

Pop Group Test Group Test N Test Yr 1 HR/FB Test Yr 2 HR/FB Change Control Yr 1 HR/FB Control Yr 2 HR/FB Change Test Change – Control Change
Yr 1 HR >= 15 Top 10% 73 15.5% 14.7% -0.8% 15.5% 13.4% -2.1% 1.3%
Yr 1 HR >= 10 Top 10% 97 14.2% 14.2% 0.0% 14.2% 12.7% -1.5% 1.5%
Yr 1 HR >= 10 Top 5% 48 14.2% 14.2% 0.0% 14.2% 12.7% -1.5% 1.5%
Yr 1 HR >= 15 Bottom 10% 73 12.5% 10.8% -1.7% 12.5% 11.9% -0.6% -1.1%
Yr 1 HR >= 10 Bottom 10% 97 12.1% 11.1% -1.0% 12.1% 11.8% -0.3% -0.7%
Yr 1 HR >= 10 Bottom 10%, 0 ND HRs 99 9.2% 9.2% 0.0% 9.2% 10.0% 0.8% -0.8%
Yr 1 HR >= 10 Bottom 5% 48 13.2% 11.6% -1.6% 13.2% 12.5% -0.7% -0.9%
Yr 1 HR >= 10 Bottom 5%, 0 ND HRs 49 10.4% 9.2% -1.2% 10.4% 10.5% 0.1% -1.3%

First, we see that batters that hit a high percentage of ND home runs post a higher HR/FB rate than those who hit a low percentage of them, which makes sense. However, the most important column to focus on in the above and following tables is the last one titled “Test Change – Control Change”. It represents the difference between the change the test group experienced from year 1 to 2 in HR/FB rate and the change the control group experienced. Given our hypothesis, we would expect the top ND groups (the first 3 rows) to display positive numbers in that column, while the bottom ND groups (the bottom 5 rows) should display negative numbers. And that is exactly what we find.

Just Enough Bucket Analysis

Next up is the Just Enough (JE) bucket. Our hypothesis for this classification is that batters who hit a high percentage of JE homers are unlikely to be as lucky in year 2 and should see a HR/FB rate decline. On the other side, those hitters who post a low rate of JE homers should see better luck in year 2 and improve their HR/FB rates.

Pop Group Test Group Test N Test Yr 1 HR/FB Test Yr 2 HR/FB Change Control Yr 1 HR/FB Control Yr 2 HR/FB Change Test Change – Control Change
Yr 1 HR >= 15 Top 10% 73 13.5% 12.1% -1.4% 13.5% 12.4% -1.1% -0.3%
Yr 1 HR >= 10 Top 10% 107 10.0% 9.6% -0.4% 10.0% 10.4% 0.4% -0.8%
Yr 1 HR >= 10 Top 5% 53 9.4% 9.0% -0.4% 9.4% 10.2% 0.8% -1.2%
Yr 1 HR >= 15 Bottom 10% 73 14.9% 14.2% -0.7% 14.9% 13.4% -1.5% 0.8%
Yr 1 HR >= 10 Bottom 10% 107 13.6% 13.4% -0.2% 13.6% 12.5% -1.1% 0.9%
Yr 1 HR >= 10 Bottom 5% 53 14.0% 14.5% 0.5% 14.0% 12.8% -1.2% 1.7%

To start, we see that the players who hit the highest percentage of JE homers post lower HR/FB rates than those with a low percentage of JE homers. Then comes the exciting part. If you peek over to the last column, you find that our hypothesis is proven correct once again. We would expect the top JE groups (the first 3 rows) to display negative numbers in that column, while the bottom JE groups should display positive numbers. We also see once again that the closer to the top or bottom you get, the more pronounced the difference between test and control group, which is another welcome finding. In fact, the bottom 5% in JE percentage from those with 10 or more home runs actually increased their HR/FB rate, the only group to do so.

Lucky Bucket Analysis

And last is the Lucky subcategory, in which any home run type could be labeled as such. The sample size of lucky home runs each season is quite small, though, as the leaders in the category only hit 5 to 8. So, I was unsure how much I would find here. The hypothesis, of course, is going to be the same as the JE home run type.

Pop Group Test Group Test N Test Yr 1 HR/FB Test Yr 2 HR/FB Change Control Yr 1 HR/FB Control Yr 2 HR/FB Change Test Change – Control Change
Yr 1 HR >= 15 Top 10% 55 13.7% 12.5% -1.2% 13.7% 12.5% -1.2% 0.0%
Yr 1 HR >= 10 Top 10% 107 11.2% 11.0% -0.2% 11.2% 10.7% -0.5% 0.3%
Yr 1 HR >= 10 Top 5% 53 11.4% 11.7% 0.3% 11.4% 11.1% -0.3% 0.6%
Yr 1 HR >= 15 Bottom 10% 55 17.2% 15.7% -1.5% 17.2% 14.7% -2.5% 1.0%
Yr 1 HR >= 10 Bottom 10% 107 15.1% 13.5% -1.6% 15.1% 13.2% -1.9% 0.3%
Yr 1 HR >= 10 0 ND HRs 183 13.4% 12.5% -0.9% 13.4% 12.3% -1.1% 0.2%
Yr 1 HR >= 10 Bottom 5%, 0 ND HRs 53 15.8% 14.6% -1.2% 15.8% 13.7% -2.1% 0.9%

So we see that the high Lucky percentage hitters have lower HR/FB rates than the low Lucky percentage guys. However, we would expect the top Lucky groups (the first 3 rows) to display negative numbers in the last column, while the bottom Lucky groups (the last 4 rows) should display positive numbers. Unfortunately, except for the first row, all the numbers are positive. Like I warned above, there doesn’t appear to be any meaningful pattern here. At best, knowing a hitter’s lucky percentage is just descriptive and should match with his HR/FB rate rather than predict his year 2 HR/FB rate.

Conclusion

Phew! We learned a lot today, so feel free to take a snack break and then return and comment on this post. We can sum up our conclusions as follows:

Of course, now you are going to want names. Be patient, they will be arriving on your computer screens in due time.