## The Quest to Predict HR/FB Rate, Part 1

As a forecaster of baseball player performance for usage in my fantasy leagues, I am always looking for new data to incorporate into my projections and new methods for predicting outcomes. Home runs are a result that excites all, even chicks dig ’em. But how can we better determine whether Chase Headley‘s breakout, for example, was for real? The answer may lie in newly available data collected by our own research extraordinaire Jeff Zimmerman.

Historically, my home run projections have been a function of the hitter’s contact rate, fly ball rate and home runs per fly ball rate (or HR/FB). The contact and fly ball rates are pretty easy to project. Recently, Matt Klaassen found these two metrics correlated year-to-year at 0.896 and 0.759, respectively, from 2002-2012. The HR/FB rate was less stable at 0.740, but that was actually pretty close to FB%. The goal is to identify a metric or group of metrics that are more stable than HR/FB rate and would therefore be a better predictor of it.

That’s where Zimmerman’s “Angle and Distance of a Hitter’s Batted Balls” tool comes in. What this tool provides is the average distance and angle of every type of batted ball hit by a batter for any date range. If you want to get even more granular, you could even indicate the pitch type that the batted ball was hit off of. Throughout last season, we have referenced a hitter’s average home run plus fly ball distance a lot. Angle, however, has gotten limited virtual ink. What the angle tells us is where on the field the balls are hit, on average. From this, we could quickly determine if a hitter is pull-happy or hits to all fields. While we have used this data a lot, there has been no exhaustive studies done to determine how useful the data actually is. Until now.

Chad Young and I have teamed up to analyze the data going back to 2007, for a full six seasons of numbers. The first thing I wanted to do was determine how average distance and HR/FB ratio correlated with each other. I included 1,742 hitter seasons from 2007 to 2012, which was everyone listed on the leader boards on Baseball Heat Maps. Here is a plot of the results.

The good news is that the two are indeed positively correlated as we would expect. The P-value was ridiculously tiny at the 95% confidence interval, so we know there is no randomness here. The R-Squared is just barely lower than the 0.5476 mark for HR/FB ratio from Year 1 to Year 2 as found by Matt Klaassen. So we are definitely close and on the right track. The ultimate goal is to identify additional components that would increase that R-Squared to a level well above the year-to-year HR/FB R-Squared.

Right now, the best fit equation doesn’t work so well at the top end. If you take the hitter who had the highest average distance last year and plug it into the equation in the graph, you will quickly realize we haven’t reached the finish line just yet. Matt Kemp‘s 313.26 average distance led baseball (well, led those listed on the leader board, which required at least 46 total home runs plus fly balls to appear). If we plug Kemp’s distance into the equation, we get an expected HR/FB rate of 19.7%. While that is pretty close to Kemp’s actual mark of 21.7%, we realize that the highest expected HR/FB rate this formula would calculate doesn’t even reach 20%. Last season, 16 hitters posted a HR/FB rate of at least 20%, so clearly there is more that this formula isn’t capturing.

Earlier in the article, I mentioned that I wanted to identify components that were more stable year-to-year than HR/FB rate. Unfortunately, we found that average home run plus fly ball distance only had a year-to-year correlation of 0.61. While that’s pretty good, it’s not as high as HR/FB rate itself. Of course, that doesn’t mean we should stop our analysis here. At the very least, average distance could simply be another piece of data to analyze when projecting HR/FB rate, the same way SwStk% is a metric not to be ignored when projecting a pitcher’s strikeout rate.

In future parts, we will be incorporating batted ball angle to see if that improves our R-Squared. It would make sense for it to play a role, no? The distances down the lines at ball parks are much shorter than in and around center field, so given equal distance, a pull hitter has a much better shot of seeing his balls clear the fence. We will also be looking at what a significant spike or drop in average distance means for future seasons. Last, we hope to identify what other variables to include to come up with a best fit equation good enough so that we could give you some names of potential HR/FB surgers and decliners.

Oh, and for those wondering after my tease in the intro, Chase Headley’s average distance in 2012, 2011, 2010 order: 303, 282, 283. His 2012 distance ranked 10th on the leader board. Maybe he’s not a fluke after all?

Print This Post

*Projecting X 2.0: How to Forecast Baseball Player Performance*, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

Good stuff. This type of research is usually about trying things that don’t work until you find one that does.

I incorporated a similar concept into my projections last year. My system basically just used the number of “just enough” home runs (from hittracker) compared to actual home runs. It turned out to be a bad predictor of future home runs.

I actually planned to take my second look at JE home runs within the next week or two. email me, let’s discuss (email in signature above)

the secret sauce may just be park factor yes/no?

Parks definitely play a role I bet. Park Factor’s themselves are so flawed that they aren’t going to improve things very much though. Trying to judge a park for all types of hitters by using one number is just never going to cut it.

Very True. Higher fences, wind patterns, different seating arrangements and signage year to year….perhaps this is why they play the games….

I also wonder if some players spike in HR because they see more FB in a given season for whatever reason.

Yes, park factors will def play a role. Feels like it will be very difficult to incorporate them into some formula though.

Park factors and ISO?

Park factors, yes, but ISO is the result and home runs is part of that. So you can’t use that to predict HR/FB.

Hey Mike,

How do you think opposite field home run hitters can be evaluated?

I am specifically asking because of Mike Moustakas. He is one of the FB% leaders and his HR/FB seems abnormally low. I felt that he would be a guy making a good jump in 2013 as far as HRs. Now, if you look at his HRs and fly balls he obviously goes the other way a lot (which a billion television analysts say is the mark of a “good hitter.”)Most of the HR leaders appear to be pull hitters though. I was burned by Jesus Montero, who had a similar reputation and I am not sure now how to proceed with Moustakas for next year.

Have you heard anything about opposite field hitters and power?

I’m not sure where you are seeing that he goes opposite field a lot. Both his angle data and his home run chart on Hit Tracker tells us that he’s a pull hitter.

However, it’s a good question about opposite field hitters. Jeff Zimmerman said that those hits would fall into the line drive bucket and so would be excluded from the data we’re analyzing.

Thanks Mike. My mistake, forgot he bats left-handed and only throws right-handed. Haha

Well at least I brought up a point for discussion.

Typo: “Recently, Matt Klaassen found these two metrics correlated year-to-year at 0.896 and 0.759, respectively, from 2002-2012. The HR/FB rate was less stable at 0.759, but that was actually pretty close to FB%.”

Notice that it says .759 twice. This is because the second one should read .740

Woah, this is weird. Last night I made a couple of edits, including what you noted and nothing went through. Crap. I even removed a sentence and it’s still showing!

There we go. My edits last night were autosaved, maybe I forgot to hit the update button. Everything fixed now.

Oh, and for those wondering after my tease in the intro, Chase Headley’s average distance in 2012, 2011, 2010 order: 303, 282, 283. His 2012 distance ranked 10th on the leader board. Maybe he’s not a fluke after all?

What i took away from those numbers was that 2011 gave no inclination of 2012 performance. Maybe it was a fluke?

Ahhh, good observation. What we are trying to answer is two different questions:

1) Given the distance data and any other components we identify, is Player X’s HR/FB rate legit? Basically, we’re looking for an expected home run formula. This is the one that Headley would look like he wasn’t a fluke.

2) Given previous year’s distance data and any other components we identify, what should we project Player X’s HR/FB rate in the following year? Chad is working on this question, trying to determine how sustainable distance spikes are. This could help answer what we should expect from Headley this season.

I guess it depends on how you define fluke. Was it a lucky performance like the wind sent 10 balls over the wall that wouldn’t have gone over any other day? Or do you define it as simply a performance that you don’t expect to occur again, even if he didn’t benefit from any kind of good fortune?

But wouldn’t the most helpful thing be not to determine if his spike is sustainable, but to have predicted the spike itself? In other words, isn’t this more an effort to determine what a player’s hr/fb rate ‘should’ have been, and less what it is going to be going forward?

Not that determining sustainability isn’t helpful, it certainly is.

Yes, what the hitter’s HR/FB rate “should have been” is that expected HR/FB rate metric we are trying to come up with. But that’s going to be largely based on the average distance data and whatever else we find. I honestly don’t think it’s possible to project before hand that Headley was going to see his distance spike. All we could do is try to figure out if the distance spike is repeatable and whether it justifies the HR/FB rate spike.

why wouldn’t you try a quadratic or exponential best fit equation?

Here’s my (obvious) point: If your avg. distance as a hitter was 25 ft. then there would be NO CHANCE = 0% that any FB could turn into a HR.

but if your avg. distance was 500 ft. then maybe 90% of your fyballs would turn into HRs, approaching 100% HR/FB rate with increasing distance.

sorry not 0% chance with an avg. distance of 25ft. but rather a probability approaching zero as the avg. distance is decreasing.

I think I know what you mean by this. I’m going to give that a shot and see what happens.

Great stuff, Mike. I look forward to future articles on this topic. Hey, do you happen to know which is more sustainable, a breakout in FB% or HR/FB, and to what degree? It seems like you’ve studied this sort of thing.

Thanks. I have not studied it. It’s an interesting question though. Since FB% is pretty stable year to year, a significant change could signal a change in approach/hitting mechanics/batting stance. And that usually sticks. But of course, sometimes it’s just random and we see the player return to his historical rate the following year.

I’m sorry I really can’t answer your question, but it might be something worth looking into next. I know Chad is looking into a breakout in average distance and how sustainable that is, which is exciting.

Hi Mike, kind of followup to this question. I often here people evaluating prospects say stuff like “they’ll be a 30 HR hitter as they grow into their power,” or “those minor league doubles will turn into major league HRs.” Assuming these guys aren’t just full of it, do you take these comments to mean that as players mature they will develop a better HR approach (higher FB%) or that they will just figure out how to hit the ball harder so it goes over the fence (HR/FB increase)?

I’d be interested to see which side you tend to fall on, b/c I think trying to figure out where the growth is more likely to occur will help us fantasy players to pick the next big breakout stars. I mean if I’m looking for the next big thing should I be more interested in Moustakas who hits 40%+ FBs but under 10% HR/FB, or David Freese who hits under 30% FB but had a 20% HR/FB? In looking at recent examples of players who took a step forward in the power department, I see guys on both sides. Billy Butler last year increased his HRs by 50% while actually decreasing his already low FB%, as compared to Jose Bautista who figured out how to double his HR/FB rate a few years ago.

Any thoughts?

Excellent question. I like the Moustakas side and do believe there is a lot of merit to those quotes you included. We see all the time minor league hitters who appear to be 10 HR guys that suddenly experience MLB power surges and those stick. It’s more difficult to increase your FB% I think.

Butler I expect to see a HR/FB regression, but a FB% rebound. It all leads to my projected homer total of 24, which is between last year and previous seasons. Freese will never end up on my team. I can’t believe he can continue to maintain such a high HR/FB rate and given that low FB%, he’s a great risk.

Good stuff, Mike, looking forward to seeing whether angle plays a significant role. Is there any way of incorporating changes in approach, such as location of pitches swung at, particularly those that went for home runs/long distances? I’m thinking of Headley in particular in that I remember someone commenting during a game (after he had hit such a pitch for a home run) that he was hitting pitches low in or even below the strikezone for long distances, and a quick glance seems to indicate that there was a change in his swing percentage for that location.

I don’t think that really matters. It would show up in longer distances if the approach was a positive one.

I would suspect a non-linear relationship between HR/FB and FB distance. Here’s why. For any given hitter, his FB distance distribution will be roughly normal (i.e., Gaussian), perhaps with a non-Gaussian tail on the low side. Looking at things very very simply, suppose we say that a HR is any fly ball exceeding, say, 380 ft. You can choose your own distance, it really doesn’t matter for my argument. You will see that that threshold falls on the falling tail on the high side of the distribution. Therefore, a small shift in the mean of the distribution could have a very large and highly non-linear effect on the fraction of FB exceeding the threshold (i.e., HR/FB). I wrote an article about this a few years ago for SABR’s Baseball Research Journal, summarizing earlier work by Roger Tobin on the effect of steroids on HR production. For a dramatic example of what I am talking about, see Fig. 1 of that article:

http://webusers.npl.illinois.edu/~a-nathan/pob//BRJ-Steroids-v3.pdf

Thanks Alan, sounds like you’re suggesting what another commenter earlier suggested- trying some quadratic/exponential equation where we square the components?

Mike…something like that, although I’m not sure exactly what. For the recent ProGuestus article I wrote, I had access to some TrackMan data from StL for the 2009 and 2010 seasons. For about 1700 fly balls hit with a vertical launch angle larger than 10 deg, the distribution of distance was pretty close to Gaussian, with a mean of 290 ft and standard deviation of 65 ft. You might be able to do something with that distribution, keeping in mind that all batters are included (which must increase the standard deviation considerably). Of course, if you had access to much more data of this type, you could look at the distributions for individual batters.

Another suggestion: divide distance into convenient bins, then get average HR/FB for each bin. You might be able to discern more easily the non-linear behavior and fit to a more realistic function (quadratic?).

The bin idea I like. Chad has been using bins on what he has been working on. I will take a look as well.