Who Are The Real HR/FB Outliers?

Last week, I examined the factors which affect a pitcher’s HR/FB rate and constructed a model which can be used to predict the pitcher’s future rate of allowing home runs. Using that model, we can examine which pitchers truly have good and bad HR/FB rates, find the realistic range of HR/FB rates, and analyze the pitchers which over- or under-performed their projections.

With this new perspective, Matt Cain no longer looks like a pitcher with completely unexplainable HR/FB numbers. Instead, he looks like a pitcher who has the ideal skill set and ballpark to minimize HR/FB rate.

One quick housekeeping note before we get to the charts. Many of the comments from last week’s post asked about including a lagged HR/FB term into the equation. This is an excellent point that I forgot to touch on previously. A lagged HR/FB variable would be highly significant and influential in the model, but there is a reason why it was excluded: the goal of the model is to examine what factors could help a pitcher exert control over his HR/FB rate. Truthfully, including a lag would improve the predictions of the model on the whole, but it injects problems into it’s utilization. By excluding a lag, it’s possible to make HR/FB projections for rookies, imports, pitchers who change teams, and pitchers with evolving skill sets, rather than relying on the pitcher’s historical rate.

It’s also important to remember that these projections will only include starters who threw at least 80 innings and did not switch teams during the season. Also note that, while much of this article is about where the projections differ from the actual results, overall the model’s predictions were accurate and had a lower forecast error than simply using the league average rate.

With that out of the way, on to the leaderboards. The first chart shows the 10 pitchers with the lowest projected HR/FB rate.

The biggest thing to note above is that Matt Cain has the third lowest projection of any pitcher. While he still out-performed his projected rate, it’s by a far less margin than had we just used the 10.6% league average. Cain was extremely good at keeping fly balls in the park last year, but only about one percentage point better than the model predicts.

The most surprising name on this list has to be Josh Beckett. The Red Sox starter is somewhat notorious for giving up a lot of home runs, but he fits many of the characteristics of having a low HR/FB rate. Beckett throws hard, has a high strikeout rate and plays in a park which, contrary to popular belief, depresses home runs. Although his 2010 HR/FB rate was in the teens, Beckett has been all over the charts during his career. Beckett posted a 7.2% HR/FB in 2003 and an astronomical 15% in 2006. The model suggests that Beckett’s true rate is between those two extremes, somewhere closer to 9%.

Every pitcher on this list has at least two out of these three characteristics: soft-tosser, control problems, extreme hitter-friendly ballpark. There is no surprise that those three variables matter, but the model provides validation for those long-held assumptions.

Mark Buehrle is the surprising name on this list, but with his ballpark, soft-tossing style, and a career 10.3-percent HR/FB rate, all signs point to 2010 as a fluke.

The two preceding charts provide a basement and ceiling for expected HR/FB variation. According to these predictions, any pitcher with a HR/FB rate below 8.5% or above 11.6% should expect to see some regression to the mean in the near future. This also indicates that pitchers with HR/FB rates between those thresholds might be able to sustain that performance long-term.

Next, let’s look at the pitchers who under- or over-performed what the model predicted for them. Of the 81 starters who met the selection criteria in 2010, only four had a difference between the model and their actual rate which was significant at the 95-percent level, meaning that the model predicted a rate that was more than two standard deviations away from their actual rate.

There are numerous theories why these pitchers outperformed their projected HR/FB rate. There is a definite possibility that a variable can be added or tweaked which could help explain these outliers. Outside of that, it is most likely the case that these results are just one-year flukes, and these pitchers’ HR/FB rates will fall closer to their predictions in 2011. All three pitchers who significantly outperformed their predictions have career rates far higher than their 2010 number.

Likewise, these pitchers may represent a flaw in the model that can be revised in future iterations. Even with a new variable or two, these pitchers likely had fluky 2010 HR/FB rates and are due to regress next season.

Print This Post

Jesse has been writing for FanGraphs since 2010. He is the director of Consumer Insights at GroupM Next, the innovation unit of GroupM, the world’s largest global media investment management operation. Follow him on Twitter @jesseberger.

28 Responses to “Who Are The Real HR/FB Outliers?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Telo says:

    “With this new perspective, Matt Cain no longer looks like a pitcher with completely unexplainable HR/FB numbers. ”

    When you only look at one year, this seems true. But in reality it takes probably at least 1000+ IP to get a decent guess at HR/FB true talent rates.

    “There are numerous theories why these pitchers outperformed their projected HR/FB rate.”

    When you are looking at one season: Luck. When you put 5+ full seasons together, then you can begin to connect some dots.

    Vote -1 Vote +1

  2. Elias says:

    Neat stuff, and will be useful info for my fantasy draft… I’m curious whether the significance tests account for uncertainty in the predicted rate as well as the actual rate?

    Vote -1 Vote +1

  3. test says:

    You have to watch out for the “underperformers” in this type of model – anyone with the “skillset” to give up a huge % of HR will not get to stick around and prove they weren’t an outlier with a larger sample. The top of the list, Manny Parra, is a perfect example. He’s had two dreadful years in a row as measured by ERA, and has given up a lot of HRs 3 years running. There’s essentially no such thing as a large sample size for a guy who gives up a ton of homers despite predictors saying otherwise, because they can’t keep a job, generally. I don’t follow the Brewers, is he even a possible starter anymore?

    Vote -1 Vote +1

    • grandbranyan says:

      Parra is being used out of the bullpen this year, where he actually performed quite well last year with the obvious small sample caveat.

      If he can repeat his performance from last year and Hawkins comes back kinda healthy they could have a pretty nice pen combined with Axford, Braddock, Saito and Loe plus longer outings out of the starters with the Greinke and Marcum additions.

      Vote -1 Vote +1

  4. Ben H says:

    So is there any chance of using this (or a similar) model’s projected HR/FB in pitchers’ xFIP calculations instead of the standard 10._%?

    Vote -1 Vote +1

  5. dudley says:

    there are any number of other problems with introducing a lag: multicolinearity, for example. if any independent variable correlates with HR/FB rate, it would also correlate highly with a lagged HR/FB rate, which would really mess up your results. i completely agree with excluding it.

    Vote -1 Vote +1

  6. Sandy Kazmir says:

    Any chance you an link to a workbook? I’m curious where James Shields falls in this exceptional look.

    Vote -1 Vote +1

  7. B N says:

    This seems like an interesting start, to say the least. I would have to say a few things though.

    1. The total spread in expected HR/FB rates is pretty low, ranging from 8.5% to 11.6% (3.1% total). Is this about what we should expect as true estimates of pitcher skill set/park effects? Or are we just being conservative because there’s a lot of variance we can’t catch?

    2. Is there a way to apply this to looking at multiple years for a pitcher? You’re going to run into an issue where they might change teams (and home park), but I wonder if there’s a way to adjust for that. With a sample set large enough for a multi-year estimate, one could probably really say something about a pitcher’s true talent in this area.

    Vote -1 Vote +1

    • Great questions.

      1) It really depends on how much you trust the model. If you think it’s pretty good, then it looks like 8.5 to 11.6 might be the realistic range. If you think it’s incomplete, then there might be pitchers who can consistently live outside that range.

      2) This is a possible followup piece.

      Vote -1 Vote +1

      • gabriel says:

        I would think that looking at multiple pitcher years would be a good way of evaluating how accurately the model performs: if we see an unexpected number of outliers from their predicted level, then there’s good reason to think that the model doesn’t capture everything.

        I, for one, would really be interested to see the results.

        Vote -1 Vote +1

  8. David Pinto says:

    Maybe we should call this predictor the Rosenheck Index, since Dan got the ball rolling.

    Vote -1 Vote +1

  9. bender says:

    Wasn’t league wide hr/fb much lower this year? Like 9.4%?

    Vote -1 Vote +1

  10. VivaAyala says:

    In other words, why Matt Cain Is Good But Also Lucky.

    Great article. Whether or not there may be another helpful variable out there, I already prefer this model to using the league average rate in xFIP (discounts individual pitcher skill too much) and to using a pitcher’s career HR/FB rate (discounts the possibility of luck in pitcher’s results too much). To me, the idea that there is a signficant but not enormous spread in true talent HR/FB intuitively makes sense.

    More Mariners-specific, I was pleasantly surprised to see Jason Vargas’ name on the list of 10 pitchers with a lower predicted HR/FB. While he is still due for some significant regression, this gives me hope he can continue some of his success from last year. I do find it odd, though, that he does so well in this model even though he does not strike too many out nor throw hard. Is it all SafeCo + lefty starter, or is something else in play here? I dunno.

    Vote -1 Vote +1

  11. wobatus says:

    I wonder what League’s predicted numbers look like compared to his actual, for his career (obviously with relievers we get tiny annual samples and even career samples can’t be all that helpful). Dito with Betancourt. Opposite ends of the spectrum

    Vote -1 Vote +1

  12. FYI, Correia was most probably affected greatly by his close brother’s suicide, great article on Correia and his family in ESPN I believe.

    Vote -1 Vote +1

  13. ElJimador says:

    Interesting to see Jonathan Sanchez on the list for lowest projected HR/FB rates and his doppleganger de la Rosa on the list for highest projected. Is there any significant difference between those 2 besides their home ballparks?

    Vote -1 Vote +1

  14. Eric M. Van says:

    If you look at 81 pitchers, you would expect 4 to appear to be significant at the 95% level. So there’s no evidence that the residuals in the model are anything but random. Relax!

    Vote -1 Vote +1

  15. goyo70 says:

    Does a particularly good change-up influence HR/FB %? When batters are fooled, do they softly loft more balls? I don’t know if checking the ratios of pitch types will resolve this, but I’d be curious to know if a good change-up and a large park are common denominators of most of the “outperforming” group.

    Vote -1 Vote +1

  16. Colour me confused says:

    How can Moyer be projected to have the 10th highest HR/FB% at 10.7% while maintaining that the league average is 10.6%? Surely there’s not hundreds of pitchers out there who will have exactly a 10.6% average to completely negate the effect that those above and below would have on this number.

    Vote -1 Vote +1

  17. Matt Swartz says:

    Hey Jesse, I was curious if you saw my article at BP back in December (http://www.baseballprospectus.com/article.php?articleid=12584, which should be free content). I also did a regression model to predict HR/FB as a function of DIPS stats, and found pretty similar results to you, but I used a different data set which reinforces both of our findings I think. One thing that I did differently was only looking at HR per outfield fly ball, which led to BB rate being insignificant. I liked your inclusion of fastball velocity, because it shows that the K rate is significantly negatively correlated with HR/FB even controlling for the velocity effect. You can email me at MatthewTSwartz at gmail if you’re interested in discussing further. Good work though; great minds think alike, I guess :)

    Vote -1 Vote +1

  18. Arjun says:

    Which two of the three characteristics does Lannan have? He’s a soft tosser, but Nats Park is pretty neutral and he has good control.

    Vote -1 Vote +1

  19. Half Full says:

    Why is there only a sample of 81 pitchers from the criteria you stated? When I searched for starters with 80+ IP on 1 team I found 145 samples. Did I miss something? I’m also curious (after reading Colour me confused’s comment) as to why the average predicted HR/FB (~10%) is not the same as the historical 10.6% HR/FB.

    Vote -1 Vote +1

  20. Mike Fast says:

    It’s because the league average HR/FB is not really 10.6% and hasn’t been for several years now. The league average was around 11% from 2002-2006 and has been around 10% from 2007 and on. In 2010 it was 9.4%, as bender mentioned above.

    Vote -1 Vote +1

  21. studes says:

    I’m repeating myself here, but IFFB% shouldn’t be included at all. Of course infield flies aren’t home runs. If you can break them out (which Fangraphs can do), why are they in the dataset? It would be like running this analysis on all batted balls and finding out that pitchers who give up lots of groundballs give up less homers. Duh.

    The other issue is, as Matt points out, IFFB% is correlated with some of your other stats (I think I found a correlation with strikeouts). So you think you’ve found something out with your model, but it may not be true.

    Take infield flies out.

    Vote -1 Vote +1

  22. Clayton Kershaw says:

    There is no way I’m giving up HRs over a 9.2% rate.

    Vote -1 Vote +1

  23. toby says:

    This series of articles (Cameron’s and Jesse’s) and the ensuing debates have been great fun. I’ve been thinking through some different real-world narratives to make sense of this (something sabr folks probably need to do a better job of if they’re ever going to make friends with traditionalists), and also wondering about a few things I haven’t seen directly addressed.

    Jesse, when you were looking league-wide at all the pitching metrics that might correlate with HR/FB rates you found that BB rate was important albeit just barely (tiny coefficient, tiny elasticity, 15% chance of random coincidence), and included it in your predictive model. You noted that this made some sense to you since a better BB rate implies better control, which might mean a pitcher serves up fewer mistake meatballs.

    This seems intuitively correct, but (as he noted in the comments section) over at Baseball Prospectus Matt Swartz found no correlation of BB rate to HR/OFFB rate. He simply (and rightly, I’d say) threw out the IFFB contribution to overall FB rate, the part that (mostly) measures the ability (yes, it’s a separate persistent skill) to induce pop-ups, whereas you included it as an (obviously) predictive factor vis-a-vis HR/total FBs. One might speculate this pop-up induction skill lies in being able to consistently and precisely throw pitches up in the zone juuuuust a bit TOO high to be squared up (probably with good stuff and therefore K rate, since I recall a THT article a while back correlating K rate and pop-up induction), and that guys with good BB rates can do this ever-so-slightly more than guys with lesser BB rates due to their underlying control. But once the ball is squared up enough to go to the outfield, there’s no difference.

    OK, so, I’m going along reading the debate and thinking about the guy/team that started this whole thing — Matt Cain and the Giants and their organizational philosophy/coaching/pitch calling and persistently high-ish BB rates (as found by Cameron in his follow up piece) — and I came to this: What about guys whose actual control — that is, actual ability to throw a pitch with full “stuff” where they want — is better than their BB rate indicates due to their being called upon (or choosing) to pitch around hitters more often than league average? What about guys who walk more people because they INTENTIONALLY throw more balls and/or borderline pitches than a league average pitcher with a league average approach would, many of these near-misses (somewhat) hittable? In effect, can some guys and teams trade walks and a higher out-of-zone contact rate for pop-ups and HR suppression by intentionally just missing? Possibly because their less-homerable near-misses are tastier to hitters due to the impression they give of NOT being as wild as their BB numbers indicate?

    It would then be wrong to assume BB rate is good proxy for the underlying skill of “true control”, or at least it would be when the guy with above average true control makes it his mission to avoid home runs in a way such that his walk rate goes up, as so many Giants have seemed to do. Thinking about it, there are guys with poor true control who often miss randomly while just trying to throw a freaking strike, including meatball-style, thus having both a less than stellar walk rate and (over time, a slightly) highish HR/FB rate. These two types might exist (neither posting sexy BB rates while having wildly different OFFB outcomes) along with bunches and bunches of average dudes with average control and stuff and approaches and results, along with guys with mediocre stuff and accordingly (over time, slightly) high HR/FB rates who while lacking pinpoint “true control” over their best (so-so) pitches can at least “just throw strikes” to minimize free passes; these latter guys balanced in the aggregate by elite guys with big K-getting, HR-suppressing stuff they have good control over (and hence low BB rates), and in the end it all washes out and we get no correlation between “control” as it putatively manifests itself in the form of BB rate and overall HR/FB rate, except to a tiny extent in IFFB induction skill.

    This matters because of the importance of explaining stuff like this to traditionalists/players/etc. Putting this all another way, how many of traditionalists are going to buy the statement “a guy with good control is no better at suppressing home runs than a guy with below average control”? But what if control per the old-school, toolsy understanding is a skill that CAN be applied to suppress HR/FB rates — and probably thereby lead to more successful pitching generally — it’s just that there are widely differing ways in which that skill is in practice applied persuant to a player’s/team’s/organization’s coaching/philosophy/pitch-calling and therefore totally disparate ways it manifests itself statistically, making BB rate a sometimes poor proxy for true underlying control and likewise a useless number for figuring xHR/OFFB (or whatever you want to call it). And what if, in the end, some players/organizations get better results from similar skill sets? Getting back to the original Cain/Giants thing, if an atypical approach is taken and/or atypical coaching/pitch-calling persists year after year in an organization, so would the statistical misrepresentation of that team’s pitchers’ true control, and in the Giants’ case so would xFIP understate the real skill of their pitchers.

    As a half-time Twins guy the pitcher who jumped out at me in your numbers was Scott Baker, an extreme flyball pitcher with very good statistical control (i.e. BB/9), pretty good strikeout rates, a good career pop-up rate, good O-zone contact (which “should” suppress HR/FB), etc. But last year he got taken deep at an above-average rate per flyball and his extreme flyball tendencies hurt him accordingly. (Similar peripherals in 2007, 2008 and 2009 yielded better HR/FB results, despite the Metrodome’s short right field). Now then, watch enough Twins games and you know all-too-well that there’s one overriding philosophy the organization preaches to its pitchers: throw strikes and trust your stuff. So what if Baker’s true control is all being channeled into obsessive strike throwing at the behest of his superiors, when in fact a little more nibbling might actually serve him well (again: slightly, over enough time)? It’s tempting to imagine Baker going to the Giants and being told: better to miss and walk a few when facing LH bombers than to serve up the ding dong because you absolutely positively don’t want to walk anybody. If his BB rate climbs and his HR/FB falls in the right proportion, he’s a better pitcher with the same underlying skills.

    To totally change track and be a bit more skeptical regarding the existence of a HR/FB skill, there is one other thing bugging me about Cain/the Giants. Granted, Cain’s road numbers aren’t far off from his home numbers (per paapfly’s original piece), but what about the parks/opposition he faces most frequently? Petco obviously comes to mind. They also play at Oakland once a year during interleague (and host the A’s, a team that hasn’t been built around power much during Cain’s tenure). I get that weighting every pitcher’s numbers based on the actual number of PAs they pitched in each park would be way too much to ask, but what about checking whether a pitcher’s DIVISION correlates with his HR/FB rate, beyond whatever’s captured by his home park’s correlation? You’d capture a lot of information about quality of opposition and divisional ballpark effects that way.

    Vote -1 Vote +1