## Pitching Outside the Box, Literally

So long as you didn’t bring the party too hard over the Christmas holiday, there’s a chance you remember the article I submitted for the readership’s consideration last week. In said article, I roundly praised research conducted by Lookout Landing’s Jeff Sullivan this past August — research in which he explored the relationship between pitcher contact rates and strikeouts. Moreover, I posted a Top 10 Leaderboard of the starting pitchers (50 IP and up) with the best Contact%.

Well, in the comments section of said article, user Toffer Peak brought to our collective attention a study done by user matthan over at DRaysBay. Matthan is the user name of Matt Hanna, and his work is an exciting complement to Sullivan’s as it gives us some idea of the importance of Out-of-Zone Swinging Strikes (OZSwStr%) relative to In-Zone Swinging Strikes (InZSwStr%).

The relevant article provides all the answers your little heart would desire — complete with a Google spreadsheet of every pitcher from last year — but the relevant content for our purposes is this formula that Hanna concludes is the best fit for calculating Expected Strikeout Percentage (eK%). Said formula goes:

eK%=(ClStr%*.9)+(Foul%*.5)+(InPly%*-.9)+(InZSwStr%*1.1)+(OZSwStr%*1.5)

The Adjusted R-Squared is: 91.4%

The surprising result here is the degree to which OZSwStr% is weighted over and above InZSwStr%. Nor does that even account for the fact that the average for OZSwStr% (4.89%) is already about twice as a high as InZSwStr% (2.73%).

Once we adjust for that difference as well, OZSwStr% comes out to roughly 2.5 times more important than InZSwStr%. If I’m being honest, I’ll say right now that that runs counter to what I would’ve guessed. My impression has always been, if a pitcher can throw a pitch past a swinging batter but still place said pitch within the strike zone, then he (i.e. the pitcher) would be truly unhittable. What Hanna’s research suggests is quite the opposite, in fact: A pitcher who is able to induce swings (and misses) at pitches out of the zone is, in fact, most likely to tally big strikeout numbers.

This research is quite relevant to the present interweb site, as FanGraphs carries both O-Contact% and Z-Contact% on every player page and in the leaderboards section.

And though, much like Forrest Gump, I’m not a smart man, I thought it might make sense to create a leaderboard in which O-Contact% (or OZSwStr%) was given its due. To that end, what follows is a Top 10 list of the starting pitchers with 50+ IP who led the league in what I’ll call Adjusted K. In fact, what I did was to find how many standard deviations all such pitchers were from the mean in both O-Contact% and Z-Contact%. I then multiplied the O-Contact% standard deviation by 2.44 and averaged it with the Z-Contact% standard deviation. Here are the results (SDO = Standard Deviations from O-Contact% mean / SDZ = Standard Deviations from Z-Contact% mean):

This list greatly resembles the one we looked at last week — with one exception, that is: Freddy Bloody Garcia. Granted, he only pitched 56 IP through nine starts last year, but it appears to be a skill he’s carried throughout his career, as his 46.5% lifetime mark suggests.

So that’s one thing. Now here’s another question of some interest, I think: Which pitchers posted the biggest O-Contact%/Z-Contact% splits in 2009? In other words, which pitchers are best at getting swinging strikes outside the strike zone despite allowing somewhat frequent contact within it. Truly, this would be a list of pitchers who use their talents most efficiently, getting swings and misses outside of the zone, where they are more valuable. Here’s a list of such pitchers (SD O-Z = Standard Deviation of O-Contact% minus the standard deviation of Z-Contact%):

Some of those guys are what you might describe as a pretty big deal. Carpenter and Wainwright, certainly, were at least part of the Cy Young convo in the National League — and both accomplished the feat while conceding a below-average contact rate on balls in the strike zone.

There are certainly other questions to ask about this work. I’m in Paris right now, though, so I’m probably not gonna ask them for at least a couple days.

Print This Post

Carson Cistulli occasionally publishes spirited ejaculations at The New Enthusiast.

### 19 Responses to “Pitching Outside the Box, Literally”

You can follow any responses to this entry through the RSS 2.0 feed.
1. Brendan says:

Alot of these guys have a pretty big hammer (especially Wainwright and Carpenter), which often crosses several planes of the strike zone and may end up in the dirt or outside of the zone.

With that basic thought, any correlation to which pitch specifically they get the most swings on outside the zone. Making a very basic assumption, I would hypothesize that curveballs and forkballs/changeups, would be the pitches swung and missed on outside the zone the most.

2. Brendan says:

Unless of course, the guy throws 100mph, then reaction time may cause hitters to guess much more on fastballs allowing the pitcher to stretch the zone.

3. Scottwood says:

Has there ever been a formula developed for expected BB%? That would have a lot of value, as well.

4. gnomez says:

Craig Stammen? Really?

5. geo says:

I find it interesting that the list of pitchers who induce the most misses out of the zone (the first list) are said to be those that will get the most strikeouts, yet of the 11 qualifying pitchers who had K/9 rates of 9.0 or above in 2009, only three are included (Lincecum, de la Rosa, and Vazquez…Gonzalez’s rate was above 9.0 but he doesn’t qualify…And Garcia, in fact, does not have particularly impressive K/9 rates throughout his career.) How do we explain the strikeouts of the other top guns?

6. Git 'er Dunn says:

Very interesting study, but I wonder about a couple things. I thought pitch f/x data were only available from 2007 and on. What are the O-Contact and Z-Contact %s based on for previous years? Also, I was under the impression 2007 and 2008 pitch f/x data were largely unreliable since they were still calibrating the cameras and normalizing the data across the various stadiums?

I’m also curious about who’s included the study — all the years for any pitcher that qualified in any year or only those pitchers that qualified each and every year?

7. Sean says:

Warning: this post is coming to you from intuitionville.

That said, it makes sense to me that guys with high OZSwStr% are also the big K guys. The point about being able to fool guys in the strike-zone is well made but as you can see its already damn near half the OZSwStr%. That says to me that in the MLB it’s really hard to consistently throw pitches inside the zone and not get hit.

Also, f you can get guys to chase your pitches you are being rewarded while limiting risk. Unless you’re Vlad you probably aren’t going to hit balls out of the zone with any authority.

As for the guys with high OZSwStr% and high Z-Contact% i’m willing to bet they simply struggle with their control. A guy like AJ Burnett is a perfect example of this: he can throw you that nasty curveball or if you are Chase Utley he could throw you a fastball right down the middle in the W.S.

8. Matt Hanna says:

Scottward,

I was the one that Carson referenced to in his article. I also looked into walks as well. The preliminary research found that the correlation between “pitch result” variables (call strike, in zone swinging strike, etc) and walks was a fair amount less than strikeouts. However the R-Squared for that was still in the 70′s if I recall correctly. I just ended up focusing more time on the strikeouts, but I’ll be more than happy to revisit it.

Carson brings up a good point. However the key is to get guys to swing outside the zone. That is highly intuitive as a poster mentioned. A pitcher that has the tools to get hitters to swing outside the zone is a pitcher that is going to rack up the strikeouts.

To get the data I simply used fangraphs and statcorner and essentially went back as far as I could go. I used qualified pitchers as my sample, however there were a few discrepancies in the sample due to differences between FG and SC.

My motivation into looking at this is because far too often we just use real life K or BB rates without question. We toss them into formulas that spit out great statistics, but we never question whether the component into that great statistic is due to luck. For example we have xFIP, but I also wondered how would those numbers be if we had a way to modernize the K or BB component?

I’m hoping that one day we will be able to look at a pitcher and say if you throw X pitch type at Y location Z% more per game then this is the expected result on your K/BB/HR/GB/FB rates. I think we have nearly all of the information we need to make a good educated guess at answering that question.

Also Geo, you may be interested in one of my spreadsheets. This has only first half’ish 2009 data, and it made need some formatting. However you can see there is a strong relationship between OZSwStr and K’s. Don’t forget there are other components as well. In this data (first half 2009), Mariano Rivera wasn’t a beast when it came to OZSwStr, but was great in terms of Call Strikes and Fouls. Those were the foundation of his K’s. Of course this study also shows if he could somehow convert those Fouls into Swinging Strikes that we’d expect his K’s to climb.

http://spreadsheets.google.com/ccc?key=tbL0hwKx5z8WY1TgmHu2Qww

Brendon,

I’d love to throw in pitch type combined with the pitch result of that type into a giant pot and see what spits out. It is easier to say a pitcher should convert fouls into swinging strikes. Implementation would be far more pratical if we could see the relationship with pitch type.

• Sandy Kazmir says:

Matt,

Carson’s post prior to this one inspired me to re-examine the work you did and to update for the entire season. Click the following attachment to see total 2009 for all pitchers at 300+ xOuts.

http://spreadsheets.google.com/ccc?key=0AhdYS83t3IB7dG96cEZycy1Jc1pVRjVxSWdWQUZQcVE&hl=en

I’ve also taken the time to include FIP, xFIP, K%, tRA*, and wOBA. Here’s my post about it that includes the link to the .xls

• Scottwood says:

Thanks for doing that. That is a great collection of stats. Only 2 of the top 10 pitchers in eK% were AL starters. And just 3 of the top 20. Quite an impressive display by Verlander, Lester and Greinke.

• Sandy Kazmir says:

Thanks Scottwood, as mentioned, these are not league nor park adjusted so the NL should look better, comparatively.

9. dyross says:

Hello,

Correct me if I’m wrong, but I believe this article misses something important:

OZSwStr% is a much different stat than OContact% (or 1 – OContact%). OZSwStr% is what percent of a pitchers’ pitches result in an swinging strike on a ball – (1 – OContact%) is how many of a pitchers’ pitches outside of the zone THAT ARE SWUNG AT are swinging strikes.

I think OZSwStr% is a much more important stat, because it is the pitcher’s skill in producing these events. Pitcher X could have a very low OContact% because he throws all his balls a foot out of the zone, but his OZSwStr% would not be high because few would chase such wild pitches. I think if you redid those leaderboards using OZSwStr% instead of (OContact%), the results would be significantly different.

Bud Norris, for example, fits the description of Pitcher X. His OContact% is well lower than league average, so he shows up second on the first leaderboard. Looking at his K/9 of 6.99, we might think he was unlucky in racking up so few strikeouts, and therefore, he might be a potential breakout candidate. However, his OSwing% is well below average as well, causing his OZSwStr% to be more pedestrian, and therefore his expected K/9 is closer to that 6.99.

• Carson Cistulli says:

You’re totally right. My bad. What I’ve submitted above presupposes a constant O-Swing% among all pitchers — which, obviously, that’s not the case. From just the sample I picked, there’s a range of 12.30% (Sidney Ponson) to 32.80% (Hiroki Kuroda). Certainly, pitching outside of the zone in such a way as to induce swings in the first place — that’s important.

Anyway, I re-ran the numbers to see how things would be different. Because Matt did it for all pitches (see his post below), I thought it might be interesting to look at just the OZ pitches that were whiffs. In one way, it makes sense: The idea is to trick the batters but to do it efficiently. The Top 10 list looks like this:

Rich Harden (15.80%)
Felipe Paulino
Chad Billingsley
Freddy Garcia
Ryan Dempster
John Smoltz
Billy Buckner
Javier Vazquez
Tim Lincecum
Randy Johnson (13.59%)

Again, that’s a list of pitchers who (a) threw a pitch out of the zone, (b) induced a swing with said out-of-zone pitch, and (c) received a whiff on said swing.

Next week, I’ll do a version weighted with in-zone swings and misses.

Sorry to’ve screwed up so royally. Thanks for being a reasonable critic.

10. Matt Hanna says:

Nice post Dyross. I know I defined OZSWSTR as the the % of the TOTAL pitches thrown that resulted in an out of zone swinging strike. I was not looking at how the % of the out of zone pitches that a hitter swung and missed. Essentially I wanted to convert all the numbers as a % of total pitches thrown. So in theory all the variables would add up to 100%. I also looked at it the other way as well, and I included things such as first pitch strikes. I just ended up leaning towards converted all the variables to a % of total pitches thrown. Doing so made me toss out things like first pitch strikes.

To get the number of swinging strikes out of the zone on the total pitches thrown some math is involved. Off the top of my head you have to start with Zone% and then do (1-Zone%) to get OZ pitches. Then you have to find the number of pitches (as a % of total pitches) that were swung at outside the zone. OZSW*(1-Zone). Now you can do (O-Contact *(OZSW*(1-ZONE))) to get how many balls were hit that were out of the zone. Then you subtract that from the how many balls were swung at outside the zone, and then you have OZSWStrikes.

If you wanted you convert everything as a # of pitches and work it out that way as well.

Essentially my goal was to get all the variables to add up to 100. For example:

(OZCON + INZCON+ OZSWSTR + IZSWSTR + CLSTR + Ball = 100%)
(OZSWSTR + IZSWSTR+ FOUL + INPLAY + CLSTR+ Ball = 100%)
(OZ + CLSTR + INZSWSTR + INZCON = 100%)

etc…

It isn’t going to add up to 100% perfectly but it should be within +/- 1%. Fangrpahs and Statcorner have slight discrepencies.

For example using AJ Burnett (I used him since it was the easiest to sort)..his 2009 numbers ended up like this…

To get OZSWSTRIKE:

O-Swing% Zone% O-Contact%
22.10% 49.90% 51.10%

OZSWING= 22.10 * (1-49.9)
OZCON= OZSWING * 51.10
OZSWSTR= OZSWING – OZCON=5.41%

OZCON INZCON OZSWSTR IZSWSTR ClStr% Ball%
5.66% 28.21% 5.41 % 3.03% 19% 39%

OZSWSTR IZSWSTR Foul% InPly% ClStr% Ball%
5.4% 3.0% 16.8% 17.1% 18.8% 38.8%

If you add the first line up it comes out to be 99.91% the second 99.94%

His K% was 21.79%

Using the formula below (which uses InPlay and Foul instead of INZCON and OZCON) you’ll get an…

eK%=(ClStr%*.9)+(Foul%*.5)+(InPly%*-.9)+(InZSwStr%*1.1)+(OZSwStr%*1.5)

ek%= 21.38%

• Scottwood says:

Do we have enough data to guess whether or not this is a predictive stat?

If it is predictive and if we could develop an expected BB%, then we would really be on to something.

• David Ross says:

I also think it would be interesting to see which peripherals affect K/BB instead of just K%. I surmise that high OZSwStr% may lead to higher K%, IZSwStr% could be the clincher to real dominance. Just a thought.

• joser says:

The peripherals that affect K% may be different (or at least differently-weighted) than those that affect BB%, making it more difficult to create predictions for the combination of K/BB. Combining two stats in some cases necessitates an increase in sample size for the same level of confidence as well.

• David Ross says:

Although I agree that it might require a larger sample size, I think it could have compelling results. It’s possible that some peripherals increase K% and BB% simultaneously, while others help just K%. In this case, the latter type would be much more predictive of effectiveness.

11. David Ross says:

Hey Matt,

I noticed that your article used OZSwStr% but the above post used OContact%, which is why I made the above comment. I think your results are quite impressive – it would be great to see a followup with a better expected FIP based on these peripherals. Even though you threw it out, a .7 R^2 for BB% is very compelling, and I would be interested in helping to come up with better formulas for both, and making a leaderboard of players who’s peripherals underperformed their, well, peripherals’ peripherals.

Happy Holidays,

Dave