## The Quest to Predict HR/FB Rate, Part 3

On Monday, I set the groundwork for a quest to try to predict a hitter’s HR/FB ratio utilizing the new data first made available last year that tells us the average distance of a hitter’s batted balls. Our goal was to answer several questions:

1) Given a hitter’s average distance and any other factors we identify, what should that hitter’s HR/FB rate have been in Year X? In other words, this would be an xHR/FB rate that is backwards looking.

2) Given a hitter’s average distance and any other factors we identify, what should we project that hitter’s HR/FB rate to be in Year X + 1? In other words, this would be a forward looking projected HR/FB rate.

While trying to answer question 2, we are also attempting to determine:

3) If a hitter experiences a significant change in average distance in Year X, how much of it sticks in Year X + 1?

To begin, we are trying to answer the first question and come up with an equation to calculate an xHR/FB rate. Once we are happy with our findings, we will turn our attention to the second and third questions to project HR/FB rate in the next year.

In Part 1, I confirmed that average distance does indeed have a strong correlation with HR/FB rate and we should therefore use it as a variable to determine both what a hitter’s HR/FB rate should have been and what we should project for the next season. Yesterday, Chad Young took that finding one step further and concluded that an equation that includes the previous season’s HR/FB rate was even better than just distance. This bothers me because it suggests there is something else, perhaps multiple influences, at play besides average distance that affect a hitter’s HR/FB rate.

We originally thought that batted ball angle was the answer. Given the same distance, a ball hit down the line is significantly more likely to land in the seats than a ball hit to center field. However, I realized that the way batted ball angle is presented as an average is problematic. Consider a hitter who hits half his fly balls and home runs down the left field line and the other half down the right field line. The angle statistic would be 0.0 for this hitter (one side of the field is considered negative, the other positive), suggesting he hits balls to dead center on average. This, of course, is not the case in the example and might explain why our attempts to incorporate the data into our regressions have barely moved the R-squared needle. So for now, we are going to ignore batted ball angle.

In Chad’s Part 2, he made mention of reducing the entire player list from what I used in Part 1 to only those players who had also played the previous season. That led him to a model that included average distance in Year X and HR/FB rate in Year X – 1. I went a step further and further reduced the data set to include only those who played any three consecutive seasons. This left me with n = 697. I decided to test five different potential models. For ease of understanding, Year 3 is the season I am coming up with an xHR/FB rate for. Here are the results:

Variable(s) | R-Squared |
---|---|

Yr1 & Yr2 HR/FB, Yr3 Distance | 0.5848 |

Yr2 HR/FB, Yr3 Distance | 0.5698 |

Yr3 Distance | 0.4830 |

Yr1 & Yr2 Hr/FB | 0.4277 |

Yr2 HR/FB | 0.3784 |

I also tested the models using Yr3 Distance squared as some have suggested, but that screwed with the p-values, so I left them out. You will notice that the R-squared for “Yr2 HR/FB, Yr3 Distance” is a bit lower than what Chad found yesterday. This is due to the different data set I am using.

I then ran all five models through a Residual Sum of Squares (RSS) test and the results came in the same order as the above. Thus, the best model at this point appears to be using HR/FB rates from the previous two seasons along with the average distance for the most recent season you are calculating the xHR/FB rate for.

The equation is:

**xHR/FB = (0.165864 * Yr1 HR/FB) + (0.263489 * Yr2 HR/FB) + (0.002081 * Yr3 Distance) – 0.528386**

Is it cheating to use HR/FB ratios from previous years? Yes. I am not happy about it. The xHR/FB rate for Year 3 really should have nothing to do with how a hitter has performed in previous seasons. But as you may recall from Part 1, distance alone was simply not telling the whole story. Whether it’s a combination of the elusive angle data and park factors or something else at work, we have yet to figure it out. So at the moment, HR/FB rates from previous seasons are essentially acting as a proxy for these other mystery variables.

In the model that only included Yr 3 Distance that was introduced in Part 1, I was also unhappy with the highest xHR/FB it spit out. The max for 2012 was 19.7% for Matt Kemp. This new formula arrives at a 20.6% estimate, which is closer to his actual 21.7% mark. In addition, the new equation does a better job of estimating other hitters on the high end. Ryan Howard used to consistently average 320 feet and post 30% HR/FB rates. In the old distance-only equation, Howard’s xHR/FB rate would be just 21.8%. No one averages more than 320 feet, so that essentially places a hard cap of nearly 22% for xHR/FB. Plugging in that 320 average distance, along with 30% HR/FB rates in the previous two seasons, into the new equation above, we arrive at a more reasonable 26.6% xHR/FB rate.

I still believe that the batted ball angle is going to prove important. We are currently working on getting the data into a more useful format to incorporate into our regressions. Logic suggests that a fly ball will go for a home run depending on a) how far it travels, b) where in the park it is hit and c) what park it was hit in. Taking into account park dimensions would be way too difficult and time consuming, so I have to think that batted ball angle is the holy grail. I’m all ears though in hearing your thoughts about what other variables could possibly be included.

Print This Post

*Projecting X: How to Forecast Baseball Player Performance*, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

Maybe exclude bunts and groundball distance instead of using distance of all batted balls?

Only fly balls and home runs are included.

It could add a lot of work, but if you divided the outfield into areas for batting angle, such as only left/right/center field and ran for each area and summed you could have something.

However you divided your fields (1/3rds would be easiest) but you could get specific and figure the average point where most parks became ‘longer’ or clearly CF territory.

Figuring out both LF/RF angles in their respective buckets vs CF would be helpful I would think as CF shots are least likely to go out of the park. A hitter’s tendency to hit to LF/RF (however you decided above to define that) might correlate well.

I’m completely on board with this “batted ball angle” aspect.

You’re all way smarter than I am, but if positives/negatives netting to 0 is the issue, can’t we just take the absolute value of the angle?

Interesting idea, and I think an improvement on simply using the raw angle. However, I don’t think will get at the whole story either. Anecdotally, it seems the vast majority of hitters hit balls much harder down the line on the pull side than when they go the other way.

That sounds like something else that can be determined and weighted appropriately. If you take a players handedness you can determine pull/opposite (not sure what to do for switch), and we can measure how much harder (on an individual level) they pull. You can then adjust their pull/opposite angles appropriately based on these weights, right?

Referring to it as pull/push would be so much easier and more fun..

Makes me think back to the “Opposite Field HR Increases as Indicator of Breakout” study from my Baseball Forecaster a few years ago.

I was thinking something similar as I was reading the article.

A question you might want to ask is if pulling the ball has any correlation to HRs. I’m not sure if you can do this, but maybe you could look at angle for left handed batters and the angle for right handed batters. If my hypothesis is correct, fly balls that are pulled will result in more home runs than other fly balls.

Yup, this is the answer. Currently working with Jeff on getting this data. I’m crossing my fingers that the new data, if provided, provides what we need.

Awesome – can’t wait to see the results!

If you’re going to consider previous year’s HR/FB, would it not make sense to also look at ISO? Maybe ISO – 3*HR/AB to remove the HR component?

Hmm, not sure. I really don’t want to finalize an equation that includes anything relating to past years. But if it must be done, I will look into ISO. Thanks.

I also think that batted ball angle in the vertical plane could be looked at. For instance, a fly ball hit in the 30 degree plane has a higher probability of being a home run than a ball hit in the 40 degree plane (most likely a pop up). Also, I don’t know if you can get your hands on batted ball speed data, but I think that could be huge too. If a guy like Mike Stanton hits a laser beam 330 for a home run, that doesn’t have a big effect on his average distance, but it will show up in the speed of the ball off the bat. Those three things together: Distance, angle (in the vertical plane) and batted ball speed I think could produce a really nice r squared. Thoughts?

Let me clarify – Balls hit in the 30 degree plans will have a higher probability than balls hit in the 70 or 80 degree plane. The 40 degree plane is most likely still a good HR angle.

Jeff’s site lets you filter for batted ball type, and I’m only working with fly balls and home runs. So that should take care of your different angle at which a ball is hit. It was a good thought though. Does it really matter how fast the ball travels though? I would think the speed would be an input in predicting distance, but we already have the distance. I actually thought of speed off bat as well and then made that realization.

I think you may be underestimating the importance of speed off the bat. Is any HR considered a fly ball? or can HR’s also be categorized as line drives? Because if every HR is a fly ball, the speed and vertical angle will definitely play a role here. A line shot that only travels 320 but that has tremendous speed, will not have a large effect on distance. I’m sure a lot of HR’s in the MLB are line shots, but may not be categorized as such in the data. Those may be fly balls. Do you know the answer to that?

Home runs are a separate category on Baseball Heat Maps. So I simply chose the fly balls bucket, as well as the home runs bucket. I am assuming the fly ball and line drive buckets do not include home runs, since long balls are a separate filter.

I think you’re missing his point, or I’m missing something. You are using fly balls and HRs in the data, but a HR hit 380 feet has different possibilities.

Projectiles reach a maximum distance when launched at the same speed by taking a 45 degree angle.

Hitters who hit at 45 degrees more often will have better HR/FB. At less than ideal angles you still get HR distance if the speed of the ball off the bat is faster than someone else.

I don’t think you can ignore the very basics of what make a ball go farther. Angle off the bat will have a large amount of luck associated with it, but must be a skill that stabilizes with enough time. Speed off the bat should be more stable and can be factored into this.

If a player’s HR distance goes down in one year, but his bat speed is the same, either he’s lost the ability to hit the ball at the right angle or he had an unlucky year (where many non-luck factors play a roll).

I guess figuring out how much of hitting a ball at 45 degrees is a repeatable, predictable skill would be important..

But doesn’t already knowing the distance make the speed off bat and ball angle irrelevant at that point? What I’m saying is, if you want to predict the distance the ball would travel, you would use the angle and speed off bat as variables. However, we already know the distance. So it seems to me that those extra factors are moot.

Am I still missing something?

i guess the point is that the distance the ball goes can be misleading, because it is based off other factors. The distance the ball goes can be considered luck if the guy doesn’t have improved or high-end ball speed off the bat; maybe he just squared some balls up that year. Or on the other side, a batter with superior bat speed sees his HR distance go down significantly in a year. Vertical angle of the ball off the bat can be like BABIP. Its got some luck factored into it, but it also is a skill to a degree.

Anyway, you’re looking at the result of ball distance when that’s just another stat to misrepresent the truth, like batting average.

if you want to predict HR/FB, don’t you want to know how fast the player is hitting the ball and how frequently he can square it up at the right flight angle so you can predict how likely it is for a player to continue to hit the ball out at his current rate? it seems to me if you want to predict future likelihood, you don’t base it on results as much as you do weighting what allowed the player to get those results. The horizontal angle that the ball is hit is an equally important factor in this. If a player can get better ball speed off the bat by pulling it down the line or can square it up on the meat of the bat more frequently, then HR/FB would be better. Distance alone seems like just a way to look back at a season retrospectively and say if he over/under achieved.

of course if horizontal and vertical angles aren’t stats that stabilize for individual players, then it wouldn’t be very useful.

Ahhh, I see what you’re saying. The problem is that the data isn’t available. This whole quest really started because of the distance and angle data on Jeff’s site and using that. So we have to assume that there is no luck involved in the distance itself, which obviously there is going to be.

If we truly wanted to project HR/FB rate based on our choice of any variables, then yes, speed off bat, trajectory, angle, etc would all have to be included.

I think the way you used batted ball angle the wrong way. Simply making an equation to find the average place where the ball is hit won’t work, mostly for the reasons you identified. I think a better way to do this would be to A) make 2 different charts for righties and lefties so you can differentiate between who’s pulling the ball, and therefore likely getting more power on the swing, and B) assign a value to each zone where the ball would be hit over the fence. In other words, without considering if the batter is a righty or lefty, hitting the ball down each line would be considered roughly the same thing. You would do this for each possible zone, assigning a likely percent of the ball being hit out when it’s hit to that direction, and make that number into an equation.

Yeah, I definitely agree that the angle data we have now is holding us back. The hope is that we can get something that could greatly help.

Forgive me if this was mentioned throughout your series of articles but have you thought to look at xHR/FB for pitchers as well?

No, I have not. Unfortunately, the batted ball leaderboards on Baseball Heat Maps that we used for hitters is not available for pitchers. I’ll have to bug Jeff to try getting those published.

To included batted ball angle into your formula may I suggest perhaps breaking each player’s data pools into 3 categorys… LF, CF, and RF. I would expect the HR/FB% to be different to each “field” for one individual player. The data collected from this can be used in conjuction with the overall percentage a player hits to each “field”.

Just my thoughts after having read the 3 articles on this subject thus far.

Ps- what about FB trajectories? I think this is a metric that can potentially be very indicative of HR/FB trends one way or the other.

As someone else mentioned above, the distance should already take this into account. If 2 hitters have the same 360 foot distance, but one is a line drive and the other a high fly ball, does knowing the latter information matter? The distance already tells us that the ball landed 360 feet away, so the trajectory should be irrelevant. Am I thinking about this correctly, or am I wrong?

I’m more thinking of trends in trajectory as opposed to distance. For example if a hitter hits most of his FBs at say 50º angle or higher suddenly clicks and sees that average FB trajectory come back down to 35º or so I would expect a substantial jump in his HR/FB distance. This would tell us that he isn’t getting under the ball quite as often (which leads to more easy fly balls). I think that trajectory data could be very helpful in predicting an increase or decrease in FB/HR rates as well as other metrics such as BABIP.

Aren’t the two factors on how far a ball lands (distance) just velocity off the bat, which is determined by strength and bat speed, and angle (up and down) that the ball is hit? I agree that the angle from side to side will help a little in getting really specific and accurate with the predicted HR/FB rate, but shouldn’t we be more focused on the main two factors? Is there a way to measure velocity off the bat, and quality of contact (balls hit closest to a 45 degree angle)? Just taking a step back and thinking about it I feel like these are the factors that determine whether or not the ball leaves the yard, is a deep fly to the warning track, or is popped a mile into the air to the short stop…

Yes, if we also wanted to project distance. But we already have distance. So what I’m trying to do is estimate what HR/FB should have been based on that distance and angle. Speed off bat is available on ESPN Hit Tracker, but unfortunately there’s no full list of every player, you would have to manually go to every player’s page to collect it. The trajectory data I’m not aware of being available anywhere. Probably a HitF/X thing.