Predicting Home Runs Per Fly Ball, The Next Step

A year ago, I discovered how highly correlated a hitter’s average home run and fly ball distance is to his HR/FB rate. Chad Young and I then embarked on a quest to use an assortment of data, including this batted ball distance, to construct an expected HR/FB, or xHR/FB rate, metric. Unfortunately, we failed to find an equation much better than the one that used just distance, of which the R-squared was just 0.54. While this was an excellent start, it simply wasn’t good enough to use in place of plain old HR/FB rate.

Thanks to Jeff Zimmerman, whose Baseball Heat Maps site inspired this quest to be undertaken to begin with, I have been provided with a wealth of additional data. The hope was that it included another piece or set of pieces to the HR/FB rate puzzle.

I began with a player population set that included 4,985 hitter seasons from 2008-2013, which also included pitchers during their times to the plate. In order to prevent the results from being skewed due to the randomness occurring in the smaller samples, I removed all player seasons with fewer than 20 total home runs and fly balls. This left me with a pool of 2,645 ready for analysis.

Let us begin with a correlation table:

Sign up for FanGraphs+ to read the rest of this post or Log In if you are already a subscriber.




Print This Post

Mike Podhorzer produces player projections using his own forecasting system and is the author of the eBook Projecting X: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

7 Responses to “Predicting Home Runs Per Fly Ball, The Next Step”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Elias says:

    Wouldn’t you also want to know how correlated the inputs to your xHR/FB measure are from year to year? If those inputs are themselves products of random within-season variation, then xHR/FB might provide a useful description of what happened that year, but would not tell us much about what we should expect going forward. Your look at under- and over-performers suggests you see this having predictive value. Just curious on how you see this measure being used.

    Vote -1 Vote +1

  2. I think it’s two different questions – given a hitter’s combination of distance, angle and std dev, what should his HR/FB rate have been? And given his history in those three metrics, what should his HR/FB rate be in year X + 1? The second question would require figuring out the YoY correlations, but not the first one.

    If you have historical xHR/FB rates (I have a spreadsheet going back to 2007), then you could follow a hitter’s trend and it could be used the same as regular HR/FB rate, with no real need to know how well they correlate with each other.

    Vote -1 Vote +1

  3. schoenbl says:

    Could you go into more detail about how you incorporated the “maximums” for each metric, so the formula would not underestimate the best hitters?

    Vote -1 Vote +1

  4. It wasn’t incorporated into the formula, it was just a check I did to make sure the equation worked well. I simply found the maximums for each category and plugged it into the equation just to see what the highest mark it could spit out was.

    Vote -1 Vote +1

  5. murphyluke says:

    There is no park adjustment in this equation, correct? So, taking Chris Davis as an example, assuming he maintains the same batted ball distance/angle profile, in HR friendly Camden Yards he actually ought to do somewhat better than the 22.3% xHR/FB, right? Not that I expect him to pull of 29% again, of course. Just want to make sure I understand the equation properly.

    Vote -1 Vote +1

  6. You are correct that there’s no park adjustment but it wouldn’t work like that. Park factors are affected by environmental factors such as wind, the air (think Coors Field), etc, in addition to the distance of the fences. The environmental factors should already be accounted for when looking at a hitter’s distance. So the only thing remaining is fence distance. Does Camden Yards have closer fences than other parks? That’s what would matter, not just looking at a park’s HR park factor.

    Vote -1 Vote +1

  7. murphyluke says:

    Ah, I see. That makes sense. Thanks.

    Vote -1 Vote +1

You must be logged in to post a comment.