An Unsolicited Follow-Up Study of Pull%

I’m always looking for new angles to unlock the mysteries of BABIP, so I was intrigued by Jeff Sullivan’s exploration of pull rates against pitchers.  So I grabbed the data from baseball-reference.com, and set to work subjecting it to my usual rigmarole of correlations and multiple regressions.  You know how they say if your only tool is a hammer, everything looks like a nail to you?  Well, plug your ears — there’s about to be a lot of wild, uncontrolled pounding going on in here…

I’ll cut right to the chase – did I find anything interesting relating to pitchers’ overall effectiveness when it comes to their Pull%, Middle%, and Opposite%, as I’m calling them?  Well, I found one decent connection that will seem obvious and stupid after you think about it, and a slight but kind of interesting connection.  I’ll provide you with some correlation tables that have left few stones unturned.  But, mainly, the research might help to set some things straight about how important this stuff actually is for pitchers.

My sample was composed of the career numbers of active pitchers, with a semi-arbitrary minimum of 623 batted balls to analyze per pitcher (long story… not important)(n=244).  Remember that an “AB” in the context of this study refers only to ABs that resulted in batted balls hit against each pitcher, including home runs (but not uncaught foul balls).  That means if I’m talking about a batting average, it refers to the batting average on batted balls, not the pitcher’s overall batting average against.  It also means strikeouts are out of the equation here.  So are walks — so don’t you worry about on-base rates.  And of course, it’s also going to mean that AVG and SLG, like Pablo Sandoval, are going to be a lot bigger than you’re used to seeing in baseball.

Some Context

As Jeff said, hits that are pulled get hit a lot harder than those that aren’t.  Jeff explained that this has to do with the bat speed being greater towards the end of the swing, but it probably also has to do with bat angles causing more fly balls to the opposite field, as explained here.  As you probably know by now, fly balls have a lower batting average on balls in play, generally being easier putouts (other than home runs, which are removed from the equation for the purposes of BABIP).  I believe there’s even more to it than bat speed and angle, which I’ll get to near the end.  Anyway, here are the results from my sample:

Hit Direction  BABIP AVG SLG
Pulled 0.361 0.408 0.726
Middle 0.265 0.285 0.420
Opposite 0.296 0.323 0.507

So it seems balls hit up the middle of the field have the worst results for hitters.  This probably has to do with both defensive alignments and the depth of center fields relative to left and right fields.

It’s Correlation Time

So, yes, it matters where the individual ball goes.  Does that mean you can say “X is a ‘pull pitcher,’ so his BABIP should be higher”?   How much of a difference does hit location really matter to a pitcher’s overall results, in the grand scheme of things?  To take a deeper look at that, let’s look at the correlation table for my whole sample:

BABIP AVG SLG ISO Pull% Mid% Opp%
BABIP
AVG 0.939
SLG 0.398 0.679
ISO -0.025 0.304 0.906
Pull% -0.147 -0.039 0.229 0.320
Mid% 0.218 0.070 -0.271 -0.392 -0.508
Opp% -0.043 -0.022 0.003 0.016 -0.610 -0.373
ROE/AB 0.181 0.083 -0.176 -0.276 -0.028 0.189 -0.143

Before we make too much of that, let’s also look at one where the minimum batted balls for inclusion have been raised to 2000 (n=114):

BABIP AVG SLG ISO Pull Mid Opp
BABIP
AVG 0.911
SLG 0.342 0.686
ISO -0.019 0.375 0.932
Pull -0.008 0.132 0.291 0.304
Mid 0.078 -0.157 -0.439 -0.482 -0.494
Opp -0.067 0.015 0.126 0.153 -0.547 -0.457
ROE/AB 0.117 0.006 -0.236 -0.304 -0.038 0.196 -0.151

Part of the effect of raising the minimum to 1500 batted balls is the exclusion of one Jeremy Hellickson, the outlier among outliers.  But I think what the data suggests, overall, is that the best conclusion that can be drawn from pitchers’ Pull/Mid/Opp breakdowns is this: some pitchers have more balls hit up the middle against them, and these pitchers tend to allow lower ISO rates (ISO–isolated power–is SLG – AVG).  I think the main reason for this is that home runs are harder to hit to center field, due to the distances involved.

As for BABIP, I’m sorry to say it looks like a dead end, so far.  It seems to have no direct connection at all to Pull/Mid/Opp.  One thing I wanted to point out, though, is that BABIP is correlated with SLG but not ISO — this is due to BABIP’s only connection to SLG being the AVG component of SLG, which is removed in ISO.  Yet ISO is still correlated with AVG.  That’s because home runs are a part of both AVG and ISO but not BABIP.  Just something to keep in mind when you’re analyzing the trends here.

I decided to test out ROE (reached on error) data since it came with the rest of the data.  It’s a weak-ish connection, but a pretty consistent one through the different minimum AB samples I ran correlations on (I tried a few more): pitchers who have more batters reach base on error tend to allow lower ISOs.  I think the explanation is probably a pretty simple one: more ground balls = fewer extra base hits, plus more errors.  I won’t go into that anymore since it’s quite a tangent.

I ran multiple regressions to see if there might be some hidden interactions between Pull/Mid/Opp, but it appears there aren’t.

Righty Batters vs. Lefty Batters

I was curious whether a pitcher’s Pull/Mid/Opp rates had any consistency between righty and lefty batters faced.  I also wanted to find out whether each hit location type got different results depending on the batter’s handedness.   The results:

Rate BABIP AVG SLG
Righty Pull vs. Lefty Pull 0.215 -0.012 0.063 0.248
Righty Mid vs. Lefty Mid 0.359 0.168 0.140 0.010
Righty Opp vs. Lefty Opp 0.491 0.372 0.394 0.375
Righty Pull vs. Lefty Opp -0.227 0.045 -0.066 -0.210
Righty Opp vs. Lefty Pull -0.161 -0.011 -0.084 -0.195

Conclusions: by far, the strongest connection here for pitchers overall are their opposite field rates and results between righty and lefty opposition.  To clarify, the results (BABIP, AVG, and SLG) are not at all influenced by the rates (e.g. Pulled Batted Balls by Righties/Total Batted Balls by Righties) in my calculations.

What’s Going on Here?

I’ve taken the liberty of running correlations between Pull/Mid/Opp and pretty much all of the rate stats FanGraphs has for pitchers, to see what odd connections might pop up.  It’s a little sloppy, since I’m matching up 2003-2012 FanGraphs data (well, 2007-2012 in the case of PitchF/X data) with career numbers, some of which extend beyond 2003.  I grovel at your feet for forgiveness for that, my masters.  I’m only giving you correlations above the 0.3 cutoff that is generally regarded as better than “weak.” Here you go:

Correlation to Pull%
Fastball% -0.472
wFB/C -0.441
FBv -0.437
vFS (pfx) -0.430
vFA (pfx) -0.425
FA% (pfx) -0.415
wFA/C (pfx) -0.407
HR/FB 0.397
wFB -0.383
FIP 0.367
CH% (pfx) 0.367
HR/9 0.356
wFA (pfx) -0.355
CH% 0.350
vCH (pfx) -0.321
FIP- 0.318
BUH% 0.313
CHv -0.312

Main conclusion: throwing hard prevents batters from pulling the ball.  Not a shocker.  Changeups get pulled.

Next:

Correlation to Middle%
PU% -0.532
IFFB% -0.522
GB% 0.482
FB% -0.472
GB/FB 0.456
Z-Contact% 0.431
wFS/C (pfx) 0.354
Z-Contact% (pfx) 0.348
SI% (pfx) 0.333
SwStr% -0.317
HR/9 -0.305
FA-Z (pfx) -0.304

You won’t find popup percentage, “PU%,” in FanGraphs’ glossaries; I introduced it last week, defining it as IFFB% * FB%, or IFFB/Batted Balls.  Pitchers with a lot of infield flies tend not to be up-the-middle types, for whatever reason.  Part of that (probably most of it) has to do with all of the foul popups that become “batted balls” when they’re caught.  That’s kind of a cause for concern — infield flies are undoubtedly making Pull and Opposite results look better than they really are.  I say that because the fact that they were popups were the only thing that made them “in play” — they’d be non-factors otherwise, like foul liners and grounders.

I think we should also conclude that beyond the obvious distance-to-the-centerfield-wall issue, the lower home run rates for “Middle” pitchers have to do with their higher ground ball rates.

Z-Contact%, by the way, goes hand-in-hand with PU%, as I pointed out here.

Moving on:

Correlation to Opposite%
FA% (pfx) 0.584
wFB/C 0.526
wFA/C (pfx) 0.495
K% 0.492
SI% (pfx) -0.487
K/9 0.484
Z-Contact% -0.477
GB% -0.471
wFB 0.460
PU% 0.453
GB/FB -0.448
IFFB% 0.436
HR/FB -0.434
Relieving 0.433
wFA (pfx) 0.429
FB% 0.425
FA-Z (pfx) 0.424
inLI 0.424
Fastball% 0.406
pLI 0.405
Z-Swing% 0.400
Z-Contact% (pfx) -0.398
FIP- -0.391
LOB% 0.388
exLI 0.384
gmLI 0.382
H/9 -0.374
Outside%* -0.373
FIP -0.370
SIERA -0.363
LD% 0.362
Zone% 0.362
Swing% 0.354
ERA- -0.353
BUH% -0.349
FBv 0.343
ERA -0.330
tERA -0.321
wSF -0.316
Zone% (pfx) 0.315
vFA (pfx) 0.310
Pace 0.308
SwStr% 0.305
CH% -0.304
K/BB 0.301
Heart%* 0.300

*Bill Petti and Jeff Zimmerman’s Edge%, Heart%, and Outside% are 2008-2012, and aren’t completely finalized as of this writing.

Conclusions: well, this is why opposite field rates are more predictable, I’m guessing — see how many fairly strong correlations it has.  Basically, opposite field is where popups and other fly balls tend to end up, but where not many grounders go.  With a good “rise” on  hard, frequently-thrown fastballs is how it’s generally done.  Pitchers who induce a lot of them tend to throw either over the middle of the plate or inside edge, per Zimmerman and Petti’s Edge% numbers.

In the “Explaining Popups” section of a Community Research article I wrote, I concluded that fooling hitters into swinging underneath pitches is the main explanation behind popups, and you’ll notice that the same factors involved in my formula there are very important to Opposite%.

Final Thoughts

Besides the likely Middle%-HR connection, the most significant thing about Pull%/Middle%/Opposite% breakdowns for pitchers is really the LD%/GB%/PU% connection they translate to.  To a defense, a batted ball’s trajectory and speed make a lot more difference to its degree of fielding difficulty than does the third of the field it was hit to (I mean, unless there’s a major shift that it’s beating).  Still, I think batted ball direction trends lend some useful context to the complex dynamics that are involved.

Hope you’ve enjoyed this over-analysis!




Print This Post



Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?


22 Responses to “An Unsolicited Follow-Up Study of Pull%”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. rusty says:

    Would you say that the magnitude of the effects you’ve explored is great enough that pitcher pull tendencies should be considered for infield shifts? Or is the variability there washed out by the identity of the hitter?

    Vote -1 Vote +1

    • Steve Staude. says:

      Great question. Well, I haven’t given this enough research to say how much of a role the hitter plays in it all, but that’s something I may have to look into.

      I think teams should probably just keep doing what I’m sure they’re already doing — looking at the pitcher’s and batter’s spray charts. All I can say is that it probably makes less sense to use an extreme pull shift when the pitcher is a flamethrower with fly ball tendencies.

      Vote -1 Vote +1

  2. Tim says:

    Much of the numbers you came up with pass the sniff test, and the low ISO to the Middle definitely makes sense with the longer fences, but do you think there is any connection between the extremely low BABIP to Middle and the typically better and faster defenders who patrol Center as compared to the corner spots?

    Vote -1 Vote +1

    • Steve Staude. says:

      Great point — I’m sure there’s a lot of truth to that, when you’re looking at an individual batted ball. It probably also has to do with the infield defense being generally more clustered towards the middle.

      However, the overall trend (though extremely weak) is kind of the opposite — pitchers who give up more batted balls up the middle tend to allow higher BABIPs. I think that just means that there are factors that are a lot more important to their BABIP than where balls get hit against them, so the whole Pull/Mid/Opp effect is getting drowned out.

      Vote -1 Vote +1

    • reillocity says:

      The presence of a pitcher and two rangy middle infielders (versus lumbering cornermen) up the middle would figure to make it very difficult for groundballs and lower line drives to get through that zone for a hit, and that could explain the drop in batting average (and thus BABIP) for that zone. There are also groundballs down the foul lines that go for doubles and even triples sometimes but a groundball double or triple is extremely rare up the middle. Similarly, the linedrive and flyball doubles hit to the middle typically have to get past the centerfielder usually whereas the line drive and flyball doubles down the lines don’t have to get past the corner outfielders given that the fielders are being drawn away from second base.

      Vote -1 Vote +1

    • Tim says:

      Probably has more to do with balls which are playable by multiple outfielders – if you hit a ball way to the left of the RF it’s a double, but if you hit it way to his right the CF catches it.

      Vote -1 Vote +1

  3. reillocity says:

    Doing this sort of thing (multiple regression analyses, specifically) with minor league pitching prospects has revealed that the percentage of flies and liners hit to the batter’s pull-field third against a pitcher usually carries more statistical weight than their groundball/flyball/linedrive rates do in terms of slugging against on batted balls. And slugging against on batted balls (note that does encompass BAA on batted balls as a major constituent) is very important since it has an enormous contribution to a pitchers’ run allowance. One thing I’ve learned from this is that prospects with similar groundball, walk, and strikeout rates can be very different pitchers in terms of overall statistical outcomes when you start to examine how frequently they get pulled in the air when they do get hit in the air.

    I do expect that there will be some differences at the major league level. It would be helpful if there was a way to filter out groundballs and popups before doing these sorts of analyses with Fangraphs data – I haven’t been able to accomplish this with the Fangraphs leaderboard tables and have thus relied on the Retrosheet play-by-play archives for the MLB analyses I’ve got going.

    Vote -1 Vote +1

    • Steve Staude. says:

      Very interesting. I haven’t worked with minor league data before, but am I correct in saying the talent spread in pitching there is much greater? More distinct BABIP differences and whatnot?

      Anyway, do you have any ideas why it is that the pull fly balls are more significant than the other ones, for example (is that what you’re saying)?

      I’m definitely with you on wishing we had more detailed batted ball breakdowns. Hopefully people will realize that batted ball data is underappreciated, and that will get the ball rolling.

      Vote -1 Vote +1

      • reillocity says:

        One thing that’s different about the minors is that the data is spread out across multiple levels each of which have their unique nuances. As you were getting at, the major league data is more condensed down to things like ballpark effects (I’m not sure that minor league pitchers’ BABIPs vary as much on their skill as they do on the things out of their control). I’m finding that a third of the variation in minor league pitchers’ slugging against on batted balls is environment (levels+parks+opponents), a third is batted ball stuff (variants of flyball/linedrive/groundball rates and the aerial pull rate), and a third is unexplainable or yet to be explained (BABIP luck, defense, and the like).

        The HR directions in the majors and minors seem a bit different; in 2012 the MLB HRs went 67%/23%/11% in terms of Pull/Center/Oppo (this you can do at Fangraphs) whereas a subset of the 2005-2012 minors data I’ve looked at suggests that the ratio may be closer to 73%/13%/14% in the minors (due to closer corner fences perhaps, or deeper CFs).

        Vote -1 Vote +1

  4. Corey says:

    Why does nobody who runs regressions on fangraphs report statistical significance?

    Vote -1 Vote +1

    • Steve Staude. says:

      Honestly, I don’t know the correct n to use. It can’t be as simple as the number of players in the sample, as that completely disregards the playing time per player. So if you can tell me how it’s done, I’ll gladly do that.

      Vote -1 Vote +1

      • Corey says:

        n is the number of players in your sample, that should be reported as well, statistical significance is the probability that the difference between your correlation and no correlation is due to random noise, that’s your p-value. You should get a chart when you run the regression (I have no idea what program you’re using) that looks something like this (please format reasonably!)

        Variable Name Coefficient T P

        If T is greater than 1.96 then P will be less than 0.05 which means your correlation (seen in the coefficient line) is >95% likely not due to random noise. Giving a list of correlations provides almost no information unless you provide some indication of which ones are statistically significant.

        Vote -1 Vote +1

  5. Steve Staude. says:

    In my defense, I did report in the third paragraph that “(n=244)”.

    My issue is that treating a pitcher with 600 batted balls the same as one with 6,000 as far as confidence testing goes doesn’t seem right to me. My issue is that the confidence levels of the individual samples have to be taken into account somehow.

    I mean, which would you take more seriously: a sample of 300 pitchers with an average of 10 innings pitched, or one with 100 pitchers with an average of 100 innings?

    Vote -1 Vote +1

    • Steve Staude. says:

      Whoops, that was to Corey, sorry.

      Vote -1 Vote +1

      • Corey says:

        Well, while I think sample size is also important, my real complaint was the lack of any indicator of statistical significance of your results. I think the only solution to your point about treating people with few plate appearances or innings the same as people with lots of them is to eliminate people with few plate appearances or innings from the dataset, maybe a more sophisticated statistician than myself has a more elegant solution to that problem, it’s just that you need to indicate whether your results are statistically meaningful. Sample size is of course related to statistical significance but you’ve really got to report both.

        Vote -1 Vote +1

        • Steve Staude. says:

          Yeah, I’ve taken enough stats classes to know the importance of significance, but this seems like a special case to me, as the applications I learned it for had samples that were on equal footing in terms of how reliable they were. Anyway, I’d rather not report it at all than to list something misleading or incorrect.

          I’ve used IP-weighted correlations before in an article. Maybe something like that’s the solution… and then I can just do the standard significance testing on those results. It’s a pain to do, though.

          Vote -1 Vote +1

        • Tim says:

          Probably you should bin the innings into congruent samples and then report n from there. So if you’re combining stats from one pitcher who threw 60 innings, one who threw 180, and one who threw 240, your n is 8 bins of 60 innings, even though they were only thrown by three real-life pitchers.

          (This is assuming that you can’t just treat it as 480 innings thrown by a single frankenpitcher, and I think you might be able to get away with that. I’m not convinced that any individual pitcher’s variance matters at all.)

          Vote -1 Vote +1

        • Steve Staude. says:

          That makes a lot of sense to me, Tim. That would put everything on equal footing. The problem for me at this point is implementation — I use Excel, and I have no idea how I’d do that efficiently in an automated fashion (duplicating a pitcher’s rates in x number of rows based on their number of batted balls). Would it take some fancy VBA stuff? Does anybody have any ideas, or alternatives to Excel that could accomplish that?

          What you’ve said is making me wonder: even if I use weighted correlations instead of your method, perhaps the n should be based on a least common denominator number of batted balls (or IP), as in your method.

          Vote -1 Vote +1

  6. bosoxbro says:

    Hey, I know I’m pretty late to comment on this. Just wondering where on Baseball-Reference you found the pull % stats?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>