## Graphing BABIP Against Speed

Faster players get more hits on their ground balls. That should be no surprise. There is a benefit to having speed in that you can beat more infield hits than slower players. That’s a fairly straight forward assertion, but ultimately I was a bit surprised that the gap is actually quite small.

I included the formula and fit from a linear fit. Looking at various other curves, none were demonstrably better. First, you can see that this is no strong correlation. Having a higher speed score offers some apparent benefit, but by no means is it a guarantee.

While there is a clear upward trend since the focus in on just a fraction of a fraction, there macro-level differences aren’t that massive. Over 150 ground balls, which is a roughly average number per season for the average player, the difference between a 10th percentile speed score (2.2) and a 90th percentile speed score (6.9) comes out to just four hits.

There are outliers to be sure. That very noticeable dot at the top is Matt Kemp‘s 2007 season when he posted a .442 BABIP on 104 ground balls. On the bottom rung is Eric Hinske‘s 2003 mark of a .105 BABIP on ground balls despite a speed score that year of 6.0.

As mentioned in the introduction, the general trend of BABIP on ground balls rising as players get faster is no shock. That is probably what anyone would have guessed, but I had yet to see it verified in the data this way. Go forth now and speak with slightly more conviction when the topic is broached.

Print This Post

Thinking of beating out a ground ball between say Reyes and S Drew…how quantifiable is the difference in speed? Reyes has a career speed score of 8.5 while Drew lags behind at 5.8. But from home to first, what is the difference? Maybe it’s 0.5 seconds, maybe less? The reaction times of the fielders, the strength of the throw to first, etc…does it make the 0.5 seconds inconsequential mostly?

Take Drew’s 2010 numbers. He was 56/184 on grounders. How many were the “bang bang plays”; 5-10%? The difference in speed may be mitigated by other factors in those cases as well….fielders, throw to first, etc. So I’m not surprised to see the total difference in hits to be 3-5 or so.

I wonder if another factor could be line drive score. That is, the grouping into 3 distinct groups for a batted ball (ground ball, line drive, and fly ball) is limiting. A hard ground ball is classified as a just a ground ball. Does someone who hits hard line drives also tend to be more likely to hit hard ground balls? Is there a “solid contact” statistic? Does the fact that Matt Kemp tends to hit the ball harder than say Juan Pierre cause some difference?

Is there any data that measures speed by tracking the average time it takes a player to get down the first baseline?

You should separate lefties and righties; I’d be willing to bet that your stats are getting muddled since lhbs have a much lower babip on grounders than righties.

This is exactly what I was going to say. LHB also get down the line much quicker. Is this factored into Speed Score at all?

.. but, isnt that offset a little by the RH bunted hits, vs the LH bunted hits.. LH have to lean a bit down the 3b line with the bat, and change direction/momentum to down the 1b line. RH are leaning down the 1B line with the bat AND after contact they still go in that direction. Yes, LH get down the line faster on conventional swings, but RH get down faster on bunts. AND, your speed guys are more likely to bunt for a hit than a “non speed” guy.

I’m guessing there aren’t enough bunt-for-a-hit attempts to make this is significant factor.

do LHB really have lower babip’s on grounders than RHB? i hadnt heard this before, whats the reasoning?

To be honest, I don’t have the stats in front of me; i was assuming that since righties generally pull the ball more than lefties push it, they’d get on more because of the higher distance between the fielder and the bag. simply put, because the left-side hole is much bigger than the right-side hole. Though, now that I think about it that may be offset by the fact that lefties get out of the box faster. I’d really have to examine the stats, but i’m not quite sure where to find them.

You might also want to distinguish lefties who face the infield shift versus those who don’t; that seems like it could be a factor.

The L/R pull effect and distance for the throw should be pretty well off set with what I would assume is a much smaller effective fielding range on the right side with the generally poorer fielding 1st basemen and 2nd basemen, as opposed to 3rd basemen and the SS respectively. Plus you have the 1st baseman often holding a runner on 1st, reducing his range.

I say we just split L/Rs up and see the result and put some of this anecdotal guess work to bed.

I have crunched the numbers on this for my baseball game, don’t have the notebook in front of me, but when you hit the ball to the 3B your chance of getting to base go way up compared to hitting it the 1B. The 2B and SS are about a wash, the only other place better than that is to hit it right up the middle past the pitcher.

There’s the obvious time issues at 3B, you have to field the ball relatively cleanly in order to make the throw in time. And if you’re a faster runner this makes the window close even faster for a 3B or SS. Though there could be lots more.

Speed score might be the best thing we have to try and ascertain how fast players are, but it’s not really giving us an ordinal ranking of who the fastest are. If we had something like elapsed time from moment of contact to first, I’m sure you’d get a much higher correlation there.

(some day in the future…….)

I do wonder though how many players there are who get high speed scores based solely solid fundamentals, I really doubt there’s enough of them to skew this sample though.

More likely it is something like line drive % might inform this graph a little more. I imagine there is a significant number if guys in this sample who are belipow average hitters who stic around because of their speed, which makes me think their BABIP numbers could be deflated by weak contact. LD% might give us some indication of who’s just not hitting the ball hard enough to matter. Maybe just cutting everybody off the list with a LD% below a certain threshold.

Or attempt to come up with a hard/soft hit GB correction based on LD rate? It would be somewhat subjective, as we don’t have data to prove such a thing, but a reasonable enough assumption might be made to improve the accuracy of the result….maybe…

I feel like a problem when calculating speed vs GB BABIP is that it doesn’t include any analysis of defensive values…a ground ball to Adrian Beltre at third is less likely to result in an infield hit than a similarly hit ground ball to Michael Young at third (just for one example). It may be too complicated to calculate, and I’m not sure what values you’d use (especially since you’d have to analyze every individual grounder to every individual defender to make it work), but some sort of Speed/UZR vs GB BABIP *might* have more distinct results than this graph. I do admit that it’s only a hypothesis, though.

If you control for power I think you will find speed not having anything to do with babip, or at least not enough to be of any significance.

HR/FB and GBBABIP?

Guys that hit the ball hard have higher GBBABIPs?

Here’s what I’m guessing the real story is. Speed actually confers a somewhat bigger advantage to BABIP on ground balls than is shown in the simple bivariate regression shown above. But the speedsters “give back” some of that advantage because, as a group, they don’t hit the ball as hard as the average batter. (Think of all the Juan Pierres who are in the group, whereas the Ryan Howards are at the other end of the spectrum.) I.e., some of their softer ground balls are finding infielder’s mitts whereas the stronger hitters are punching them through the infield.

I bet if you were to do a *partial* correlation between GB-BABIP and Speed Score, while covarying out something like ISO (isolated power), you’d see a much more impressive relationship. And that relationship would more faithfully represent the advantage a generic hitter gets on this ground balls due to his speed. I believe you’ll need a more sophisticated stat package than Excel to do this one, but XL may have partial correlation in one of the add-in packages.

I agree, and I also think we have to consider defensive positioning. The defense knows who has the wheels, and may cheat in or break in quicker; conversely, they may lay back more against hard-hitting slow pokes.

Should you be using xBABIP?

Is it just me, or is there no mention at all of what the data set is? Is this from 2011? 2010? Are you testing career rates? Seasonal rates? Is there a minimum number of PA to be included in the set?

Assorted single season data, as he mentions Matt Kemp’s 2007 and Eric Hinske’s 2003. Not sure of minimum PA, but it seems pretty decent, as there’s no 100 PA guys there with .600 GBBABIPs

Have you thought of doing an ANOVA analysis by quartiles? That might allow you to just compare the fastest group to the slowest group. I think that all the variation in the middle might be what’s killing your model. I’d just break up the speed scores by quartile and then see if there’s any differences there. I did that with LD% vs babip on pitchers and I got significant differences between the top and bottom groups. That’s really all you’re trying to prove anyways, right? With things like speed, we don’t really care about the guys with mediocre speed; it’d just be nice to find ways to profile the extremes, like juan pierre and Pablo Sandoval

This is not particularly surprising. People always talk about speed when BABIP comes up, but clearly the biggest component to BABIP is ability to hit the ball hard, consistently. Look at the career BABIP leaders among active players: Ichiro, yes, but also Votto, Holliday, Wright, Hamilton, Mauer, Cabrera.

Ground balls should obviously be skewed more towards benefitting speedsters than other batted ball types, but even so, Juan Pierre has a league-average career GB BABIP because for every infield hit he beats out, he makes up for by how few of his grounders are hard-enough hit to find a hole.

The BABIP of those guys I’m not sure is necessarily an indicator that power leads to a higher BABIP, as the reason those players have some of the best BABIP’s careerwise is due to the fact that those players are the best in baseball at hitting line drives. All of those players have career line drive percentages of 21-22%+. While other players like Albert pujols/arod who have comparable if not greater power than some of those players listed are lower on the list, because they simply hit fewer line drives.

Which is why the author was looking at gb BABIP.

But speed score uses ifh% as a criteria.

So even if speed didn’t increase babip speed score would, because speed score is a function of ifh.

A better study would be multiple regression of all the variables of speed score. It could be that the only relationship is ifh% and none of the other factors matter.

I thought this might be the case. This basically means that these regression are meaningless.

Not true. Infield hit% is NOT used for these speed scores. From the glossary link:

“Speed score (Spd) is a statistic developed by Bill James that rates a player on speed and baserunning ability. Different locations include slightly different components, but the FanGraphs version consists of, “…Stolen Base Percentage, Frequency of Stolen Base Attempts, Percentage of Triples, and Runs Scored Percentage.”

You have to remember, you’re using data gathered by a large group of people using thier own judgement.

One man’s ground ball is another man’s liner.

Is the straightforward conclusion here not “Speedier players don’t leg out quite as many hits on groundballs as we thought”, but more likely “The Fangraphs “Speed” metric doesn’t measure ability to leg out groundballs very well”?

Two things. One, what impact do defensive shifts have on the data given they are exclusively employed against power hitters, and thus likely low speed players. Second, since we’re just taking BABIP into account here, does the lack of home runs in the speed guys production also skew the data? I’m not sure what the effect would be exactly in either case, or if it would be more than negligible, but I am curious.

This is awesome. I’ll lay an open bet down: A Player’s weight, pitchers excluded, will correlate better than anything you can dig up.

Just got one of those silly feelings aboot it.

How about BABIP vs. batting average :P

Who’s the outlier on the far right?

Dave Roberts, 2004? Speed score of 9.8 in 371 PA.

I understand he had a stolen base in the postseason that year, as well.

(1) What sample are you using?

(2) Perhaps the composite “Speed” score is not the best predictor. I would be interested in seeing what the formula would be if we entered all the coefficients that makeup the “Speed” stat in a step-wise regression and if that would increase our R-squared. Maybe there are factors in Speed that just aren’t good for predicting BABIP. Let’s not just throw out this idea.

What is the slope if you remove the outlier all the way on the right?

The “outlier” on the right isn’t really an outlier. It’s not that far off the line. The reason the slope isn’t good is the point. It’s hard to predict babip from speed score. There is a lot of variation, and only some of it is around the slope

high leverage points are usually very close to the line. After all, they have high leverage. The question is, what happens when you remove them.

But it has more leverage than any other point there.

I doubt it has enough leverage over what looks like hundreds of other data points to move the slope or the R^2 enough to really change any practical interpretation of the results.

Even though stats have grown a lot, the problem I have with this, is the problem I have with a lot of stats. Too many variables. We have been “putting up the microscope” and getting more and more particular, ERA+, adjusted ERA+, all the other shit I don’t even know. You can’t get a totally exact measurement. Waaaay too many variables. The amount of time between the last time the infield was watered, where the infield was playing, the pitcher, the handedness of the hitter, time of day, maybe sun in the eye of a third baseman, quality of fielder, accuracy of the umpire, etc. Tons of stuff. Even the “speed score” can’t necessarily be trusted as accurate.

Not only that but let’s say stats show that speed doesn’t increase your average. That may be true, however let’s say it stresses the defense out and they aren’t as mentally stable. They make more errors (would BABIP be accounted for here?), maybe the next batter gets a better pitch selection, maybe the defense is more edgy, lots of intangibles that you can’t really measure.

look up the Central Limit Theorem

I thought speed scores weren’t considered significant over the course of one season; at least in the Bill James handbook, he says that they are better measured after 2 seasons.

I agree with the comment above about breaking out each individual component of speed score…..try a model with all 5-6 components of speed score, and see if any of the coefficients are not significant (likely).

BTW, which version of speed score is being used here?

I’d also be curious to see this done by batter handedness; in Voros’ McCracken’s original work about BABIP for pitchers, LHP had a BABIP ~.002 lower (better) than RHP.

One thing to consider is that players with more power (and perhaps lower speed scores) will hit harder groundballs, which are more likely to get out of the infield and make the player’s speed unimportant as to whether or not it turns into a hit… it might be worthwhile to take a look at BABIP on groundballs fielded by infielders vs. speed score. My guess is that would carry a higher correlation than just BABIP on all groundballs.

Hypothesis: ground balls hit to opposite field will more likely be hits than be outs.

If so, then the distribution ground balls will affect the overall G.B. BABIP.

Willie Keeler figured this out at the turn of the prior century.

Quite small? That’s around a 6% difference in BABIP from the start of your line to the end of it. I mean maybe considering we’re going the full range from slowest to fastest to get that 6% means it’s not a huge difference, but 6% on GBs is the separation between a lousy infielder and a good infielder.