An Unsolicited Follow-Up Study of Pull%

I’m always looking for new angles to unlock the mysteries of BABIP, so I was intrigued by Jeff Sullivan’s exploration of pull rates against pitchers.  So I grabbed the data from, and set to work subjecting it to my usual rigmarole of correlations and multiple regressions.  You know how they say if your only tool is a hammer, everything looks like a nail to you?  Well, plug your ears — there’s about to be a lot of wild, uncontrolled pounding going on in here…

I’ll cut right to the chase — did I find anything interesting relating to pitchers’ overall effectiveness when it comes to their Pull%, Middle%, and Opposite%, as I’m calling them?  Well, I found one decent connection that will seem obvious and stupid after you think about it, and a slight but kind of interesting connection.  I’ll provide you with some correlation tables that have left few stones unturned.  But, mainly, the research might help to set some things straight about how important this stuff actually is for pitchers.

My sample was composed of the career numbers of active pitchers, with a semi-arbitrary minimum of 623 batted balls to analyze per pitcher (long story… not important)(n=244).  Remember that an “AB” in the context of this study refers only to ABs that resulted in batted balls hit against each pitcher, including home runs (but not uncaught foul balls).  That means if I’m talking about a batting average, it refers to the batting average on batted balls, not the pitcher’s overall batting average against.  It also means strikeouts are out of the equation here.  So are walks — so don’t you worry about on-base rates.  And of course, it’s also going to mean that AVG and SLG, like Pablo Sandoval, are going to be a lot bigger than you’re used to seeing in baseball.

Some Context

As Jeff said, hits that are pulled get hit a lot harder than those that aren’t.  Jeff explained that this has to do with the bat speed being greater towards the end of the swing, but it probably also has to do with bat angles causing more fly balls to the opposite field, as explained here.  As you probably know by now, fly balls have a lower batting average on balls in play, generally being easier putouts (other than home runs, which are removed from the equation for the purposes of BABIP).  I believe there’s even more to it than bat speed and angle, which I’ll get to near the end.  Anyway, here are the results from my sample:

Hit Direction  BABIP AVG SLG
Pulled 0.361 0.408 0.726
Middle 0.265 0.285 0.420
Opposite 0.296 0.323 0.507

So it seems balls hit up the middle of the field have the worst results for hitters.  This probably has to do with both defensive alignments and the depth of center fields relative to left and right fields.

It’s Correlation Time

So, yes, it matters where the individual ball goes.  Does that mean you can say “X is a ‘pull pitcher,’ so his BABIP should be higher”?   How much of a difference does hit location really matter to a pitcher’s overall results, in the grand scheme of things?  To take a deeper look at that, let’s look at the correlation table for my whole sample:

AVG 0.939
SLG 0.398 0.679
ISO -0.025 0.304 0.906
Pull% -0.147 -0.039 0.229 0.320
Mid% 0.218 0.070 -0.271 -0.392 -0.508
Opp% -0.043 -0.022 0.003 0.016 -0.610 -0.373
ROE/AB 0.181 0.083 -0.176 -0.276 -0.028 0.189 -0.143

Before we make too much of that, let’s also look at one where the minimum batted balls for inclusion have been raised to 2000 (n=114):

AVG 0.911
SLG 0.342 0.686
ISO -0.019 0.375 0.932
Pull -0.008 0.132 0.291 0.304
Mid 0.078 -0.157 -0.439 -0.482 -0.494
Opp -0.067 0.015 0.126 0.153 -0.547 -0.457
ROE/AB 0.117 0.006 -0.236 -0.304 -0.038 0.196 -0.151

Part of the effect of raising the minimum to 1500 batted balls is the exclusion of one Jeremy Hellickson, the outlier among outliers.  But I think what the data suggests, overall, is that the best conclusion that can be drawn from pitchers’ Pull/Mid/Opp breakdowns is this: some pitchers have more balls hit up the middle against them, and these pitchers tend to allow lower ISO rates (ISO–isolated power–is SLG – AVG).  I think the main reason for this is that home runs are harder to hit to center field, due to the distances involved.

As for BABIP, I’m sorry to say it looks like a dead end, so far.  It seems to have no direct connection at all to Pull/Mid/Opp.  One thing I wanted to point out, though, is that BABIP is correlated with SLG but not ISO — this is due to BABIP’s only connection to SLG being the AVG component of SLG, which is removed in ISO.  Yet ISO is still correlated with AVG.  That’s because home runs are a part of both AVG and ISO but not BABIP.  Just something to keep in mind when you’re analyzing the trends here.

I decided to test out ROE (reached on error) data since it came with the rest of the data.  It’s a weak-ish connection, but a pretty consistent one through the different minimum AB samples I ran correlations on (I tried a few more): pitchers who have more batters reach base on error tend to allow lower ISOs.  I think the explanation is probably a pretty simple one: more ground balls = fewer extra base hits, plus more errors.  I won’t go into that anymore since it’s quite a tangent.

I ran multiple regressions to see if there might be some hidden interactions between Pull/Mid/Opp, but it appears there aren’t.

Righty Batters vs. Lefty Batters

I was curious whether a pitcher’s Pull/Mid/Opp rates had any consistency between righty and lefty batters faced.  I also wanted to find out whether each hit location type got different results depending on the batter’s handedness.   The results:

Righty Pull vs. Lefty Pull 0.215 -0.012 0.063 0.248
Righty Mid vs. Lefty Mid 0.359 0.168 0.140 0.010
Righty Opp vs. Lefty Opp 0.491 0.372 0.394 0.375
Righty Pull vs. Lefty Opp -0.227 0.045 -0.066 -0.210
Righty Opp vs. Lefty Pull -0.161 -0.011 -0.084 -0.195

Conclusions: by far, the strongest connection here for pitchers overall are their opposite field rates and results between righty and lefty opposition.  To clarify, the results (BABIP, AVG, and SLG) are not at all influenced by the rates (e.g. Pulled Batted Balls by Righties/Total Batted Balls by Righties) in my calculations.

What’s Going on Here?

I’ve taken the liberty of running correlations between Pull/Mid/Opp and pretty much all of the rate stats FanGraphs has for pitchers, to see what odd connections might pop up.  It’s a little sloppy, since I’m matching up 2003-2012 FanGraphs data (well, 2007-2012 in the case of PitchF/X data) with career numbers, some of which extend beyond 2003.  I grovel at your feet for forgiveness for that, my masters.  I’m only giving you correlations above the 0.3 cutoff that is generally regarded as better than “weak.” Here you go:

Correlation to Pull%
Fastball% -0.472
wFB/C -0.441
FBv -0.437
vFS (pfx) -0.430
vFA (pfx) -0.425
FA% (pfx) -0.415
wFA/C (pfx) -0.407
HR/FB 0.397
wFB -0.383
FIP 0.367
CH% (pfx) 0.367
HR/9 0.356
wFA (pfx) -0.355
CH% 0.350
vCH (pfx) -0.321
FIP- 0.318
BUH% 0.313
CHv -0.312

Main conclusion: throwing hard prevents batters from pulling the ball.  Not a shocker.  Changeups get pulled.


Correlation to Middle%
PU% -0.532
IFFB% -0.522
GB% 0.482
FB% -0.472
GB/FB 0.456
Z-Contact% 0.431
wFS/C (pfx) 0.354
Z-Contact% (pfx) 0.348
SI% (pfx) 0.333
SwStr% -0.317
HR/9 -0.305
FA-Z (pfx) -0.304

You won’t find popup percentage, “PU%,” in FanGraphs’ glossaries; I introduced it last week, defining it as IFFB% * FB%, or IFFB/Batted Balls.  Pitchers with a lot of infield flies tend not to be up-the-middle types, for whatever reason.  Part of that (probably most of it) has to do with all of the foul popups that become “batted balls” when they’re caught.  That’s kind of a cause for concern — infield flies are undoubtedly making Pull and Opposite results look better than they really are.  I say that because the fact that they were popups were the only thing that made them “in play” — they’d be non-factors otherwise, like foul liners and grounders.

I think we should also conclude that beyond the obvious distance-to-the-centerfield-wall issue, the lower home run rates for “Middle” pitchers have to do with their higher ground ball rates.

Z-Contact%, by the way, goes hand-in-hand with PU%, as I pointed out here.

Moving on:

Correlation to Opposite%
FA% (pfx) 0.584
wFB/C 0.526
wFA/C (pfx) 0.495
K% 0.492
SI% (pfx) -0.487
K/9 0.484
Z-Contact% -0.477
GB% -0.471
wFB 0.460
PU% 0.453
GB/FB -0.448
IFFB% 0.436
HR/FB -0.434
Relieving 0.433
wFA (pfx) 0.429
FB% 0.425
FA-Z (pfx) 0.424
inLI 0.424
Fastball% 0.406
pLI 0.405
Z-Swing% 0.400
Z-Contact% (pfx) -0.398
FIP- -0.391
LOB% 0.388
exLI 0.384
gmLI 0.382
H/9 -0.374
Outside%* -0.373
FIP -0.370
SIERA -0.363
LD% 0.362
Zone% 0.362
Swing% 0.354
ERA- -0.353
BUH% -0.349
FBv 0.343
ERA -0.330
tERA -0.321
wSF -0.316
Zone% (pfx) 0.315
vFA (pfx) 0.310
Pace 0.308
SwStr% 0.305
CH% -0.304
K/BB 0.301
Heart%* 0.300

*Bill Petti and Jeff Zimmerman’s Edge%, Heart%, and Outside% are 2008-2012, and aren’t completely finalized as of this writing.

Conclusions: well, this is why opposite field rates are more predictable, I’m guessing — see how many fairly strong correlations it has.  Basically, opposite field is where popups and other fly balls tend to end up, but where not many grounders go.  With a good “rise” on  hard, frequently-thrown fastballs is how it’s generally done.  Pitchers who induce a lot of them tend to throw either over the middle of the plate or inside edge, per Zimmerman and Petti’s Edge% numbers.

In the “Explaining Popups” section of a Community Research article I wrote, I concluded that fooling hitters into swinging underneath pitches is the main explanation behind popups, and you’ll notice that the same factors involved in my formula there are very important to Opposite%.

Final Thoughts

Besides the likely Middle%-HR connection, the most significant thing about Pull%/Middle%/Opposite% breakdowns for pitchers is really the LD%/GB%/PU% connection they translate to.  To a defense, a batted ball’s trajectory and speed make a lot more difference to its degree of fielding difficulty than does the third of the field it was hit to (I mean, unless there’s a major shift that it’s beating).  Still, I think batted ball direction trends lend some useful context to the complex dynamics that are involved.

Hope you’ve enjoyed this over-analysis!

Print This Post

Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?

Sort by:   newest | oldest | most voted
3 years 2 months ago

Would you say that the magnitude of the effects you’ve explored is great enough that pitcher pull tendencies should be considered for infield shifts? Or is the variability there washed out by the identity of the hitter?

3 years 2 months ago

Much of the numbers you came up with pass the sniff test, and the low ISO to the Middle definitely makes sense with the longer fences, but do you think there is any connection between the extremely low BABIP to Middle and the typically better and faster defenders who patrol Center as compared to the corner spots?

3 years 2 months ago

The presence of a pitcher and two rangy middle infielders (versus lumbering cornermen) up the middle would figure to make it very difficult for groundballs and lower line drives to get through that zone for a hit, and that could explain the drop in batting average (and thus BABIP) for that zone. There are also groundballs down the foul lines that go for doubles and even triples sometimes but a groundball double or triple is extremely rare up the middle. Similarly, the linedrive and flyball doubles hit to the middle typically have to get past the centerfielder usually whereas the line drive and flyball doubles down the lines don’t have to get past the corner outfielders given that the fielders are being drawn away from second base.

3 years 2 months ago

Probably has more to do with balls which are playable by multiple outfielders – if you hit a ball way to the left of the RF it’s a double, but if you hit it way to his right the CF catches it.

3 years 2 months ago

Doing this sort of thing (multiple regression analyses, specifically) with minor league pitching prospects has revealed that the percentage of flies and liners hit to the batter’s pull-field third against a pitcher usually carries more statistical weight than their groundball/flyball/linedrive rates do in terms of slugging against on batted balls. And slugging against on batted balls (note that does encompass BAA on batted balls as a major constituent) is very important since it has an enormous contribution to a pitchers’ run allowance. One thing I’ve learned from this is that prospects with similar groundball, walk, and strikeout rates can be very different pitchers in terms of overall statistical outcomes when you start to examine how frequently they get pulled in the air when they do get hit in the air.

I do expect that there will be some differences at the major league level. It would be helpful if there was a way to filter out groundballs and popups before doing these sorts of analyses with Fangraphs data – I haven’t been able to accomplish this with the Fangraphs leaderboard tables and have thus relied on the Retrosheet play-by-play archives for the MLB analyses I’ve got going.

3 years 2 months ago

Why does nobody who runs regressions on fangraphs report statistical significance?

2 years 11 months ago

Hey, I know I’m pretty late to comment on this. Just wondering where on Baseball-Reference you found the pull % stats?