Contact-Quality Data and Its Application

Since Baseball Info Solutions’ contact-quality data was uploaded here on FanGraphs, many attempts have been made to predict BABIP using said data without a great deal of success. So I tried breaking down the data by type of ball in play using the splits function on the leaderboards and results seem promising (for fly balls at least).

Data from 2012 season onward was used for hitters with minimum 250 PA as a qualifier (completely arbitrary).

Data on fly balls showed the best r-squared at 0.79 with the control variables being hard%, soft%, pull% and speed scores.

The equation:

xAVG(FB) =.7387*hard% + .0989*soft% + .0596*pull% + .0015*Spd – .0809

The usual suspects top the xAVG list: Paul Goldschmidt, Joey Votto, Chris Davis, Ryan Braun and Miguel Cabrera. But the most puzzling fact was Mike Trout’s .266 xAVG vs a .342 AVG. What does Trout do differently to beat the formula? I don’t know.

xAVG on groundballs correlated less well with average on grounders with an r-squared of 0.48. Though if one sets the PA qualification to 600 r-squared improves to 0.52. The lower r-squared on groundballs probably has to do with the fact that success on groundballs depends on not only hitting them hard but also hitting them in the gaps in the infield and no variable captures that effectively.

xAVG(GB) =.5096*hard% – .0012*soft% – .0036*pull% + .2328*oppo% + .0096*Spd + .0892

Mike Trout is restored to the place where he belongs, the top of the xAVG list with A.J Pollock, Adam Eaton, Carlos Gomez and Willie Bloomquist in the top five. Yasiel Puig’s xAVG shows the biggest difference from his average, probably because he has mastered hitting balls in the gaps.

Data on liners was the least promising with an r-squared of 0.21 between xAVG and average. Moreover the constant in the linear equation was the biggest term, meaning average on liners is mostly random. So there is only a slight positive effect on hitting liners hard and having a high average on liners.

Overall, contact-quality data is promising and we can get better estimates as we get more and more years’ worth of data. Data from 2002-2010 wasn’t used because it was manually collected and results-based while 2011 seems to differ from 2012-2015 data as league-average hard% seems to be 5% lower than normal.

Print This Post

newest oldest most voted

Hello, everything is going perfectly here and ofcourse every one is
sharing information, that’s really fine, keep up writing.

Serbian to Vietnamese to French and back
Serbian to Vietnamese to French and back

Well, everything is perfect here and of course each exchange of information, really good, please write.