FanGraphs Baseball


RSS feed for comments on this post.

  1. Alot of these guys have a pretty big hammer (especially Wainwright and Carpenter), which often crosses several planes of the strike zone and may end up in the dirt or outside of the zone.

    With that basic thought, any correlation to which pitch specifically they get the most swings on outside the zone. Making a very basic assumption, I would hypothesize that curveballs and forkballs/changeups, would be the pitches swung and missed on outside the zone the most.

    Comment by Brendan — December 29, 2009 @ 5:21 pm

  2. Unless of course, the guy throws 100mph, then reaction time may cause hitters to guess much more on fastballs allowing the pitcher to stretch the zone.

    Comment by Brendan — December 29, 2009 @ 5:23 pm

  3. Has there ever been a formula developed for expected BB%? That would have a lot of value, as well.

    Comment by Scottwood — December 29, 2009 @ 8:32 pm

  4. Craig Stammen? Really?

    Comment by gnomez — December 30, 2009 @ 12:12 am

  5. I find it interesting that the list of pitchers who induce the most misses out of the zone (the first list) are said to be those that will get the most strikeouts, yet of the 11 qualifying pitchers who had K/9 rates of 9.0 or above in 2009, only three are included (Lincecum, de la Rosa, and Vazquez…Gonzalez’s rate was above 9.0 but he doesn’t qualify…And Garcia, in fact, does not have particularly impressive K/9 rates throughout his career.) How do we explain the strikeouts of the other top guns?

    Comment by geo — December 30, 2009 @ 10:14 am

  6. Very interesting study, but I wonder about a couple things. I thought pitch f/x data were only available from 2007 and on. What are the O-Contact and Z-Contact %s based on for previous years? Also, I was under the impression 2007 and 2008 pitch f/x data were largely unreliable since they were still calibrating the cameras and normalizing the data across the various stadiums?

    I’m also curious about who’s included the study — all the years for any pitcher that qualified in any year or only those pitchers that qualified each and every year?

    Comment by Git 'er Dunn — December 30, 2009 @ 11:28 am

  7. Warning: this post is coming to you from intuitionville.

    That said, it makes sense to me that guys with high OZSwStr% are also the big K guys. The point about being able to fool guys in the strike-zone is well made but as you can see its already damn near half the OZSwStr%. That says to me that in the MLB it’s really hard to consistently throw pitches inside the zone and not get hit.

    Also, f you can get guys to chase your pitches you are being rewarded while limiting risk. Unless you’re Vlad you probably aren’t going to hit balls out of the zone with any authority.

    As for the guys with high OZSwStr% and high Z-Contact% i’m willing to bet they simply struggle with their control. A guy like AJ Burnett is a perfect example of this: he can throw you that nasty curveball or if you are Chase Utley he could throw you a fastball right down the middle in the W.S.

    Comment by Sean — December 30, 2009 @ 12:17 pm

  8. Scottward,

    I was the one that Carson referenced to in his article. I also looked into walks as well. The preliminary research found that the correlation between “pitch result” variables (call strike, in zone swinging strike, etc) and walks was a fair amount less than strikeouts. However the R-Squared for that was still in the 70′s if I recall correctly. I just ended up focusing more time on the strikeouts, but I’ll be more than happy to revisit it.

    Carson brings up a good point. However the key is to get guys to swing outside the zone. That is highly intuitive as a poster mentioned. A pitcher that has the tools to get hitters to swing outside the zone is a pitcher that is going to rack up the strikeouts.

    To get the data I simply used fangraphs and statcorner and essentially went back as far as I could go. I used qualified pitchers as my sample, however there were a few discrepancies in the sample due to differences between FG and SC.

    My motivation into looking at this is because far too often we just use real life K or BB rates without question. We toss them into formulas that spit out great statistics, but we never question whether the component into that great statistic is due to luck. For example we have xFIP, but I also wondered how would those numbers be if we had a way to modernize the K or BB component?

    I’m hoping that one day we will be able to look at a pitcher and say if you throw X pitch type at Y location Z% more per game then this is the expected result on your K/BB/HR/GB/FB rates. I think we have nearly all of the information we need to make a good educated guess at answering that question.

    Also Geo, you may be interested in one of my spreadsheets. This has only first half’ish 2009 data, and it made need some formatting. However you can see there is a strong relationship between OZSwStr and K’s. Don’t forget there are other components as well. In this data (first half 2009), Mariano Rivera wasn’t a beast when it came to OZSwStr, but was great in terms of Call Strikes and Fouls. Those were the foundation of his K’s. Of course this study also shows if he could somehow convert those Fouls into Swinging Strikes that we’d expect his K’s to climb.


    I’d love to throw in pitch type combined with the pitch result of that type into a giant pot and see what spits out. It is easier to say a pitcher should convert fouls into swinging strikes. Implementation would be far more pratical if we could see the relationship with pitch type.

    Comment by Matt Hanna — December 30, 2009 @ 5:54 pm

  9. Hello,

    Correct me if I’m wrong, but I believe this article misses something important:

    OZSwStr% is a much different stat than OContact% (or 1 – OContact%). OZSwStr% is what percent of a pitchers’ pitches result in an swinging strike on a ball – (1 – OContact%) is how many of a pitchers’ pitches outside of the zone THAT ARE SWUNG AT are swinging strikes.

    I think OZSwStr% is a much more important stat, because it is the pitcher’s skill in producing these events. Pitcher X could have a very low OContact% because he throws all his balls a foot out of the zone, but his OZSwStr% would not be high because few would chase such wild pitches. I think if you redid those leaderboards using OZSwStr% instead of (OContact%), the results would be significantly different.

    Bud Norris, for example, fits the description of Pitcher X. His OContact% is well lower than league average, so he shows up second on the first leaderboard. Looking at his K/9 of 6.99, we might think he was unlucky in racking up so few strikeouts, and therefore, he might be a potential breakout candidate. However, his OSwing% is well below average as well, causing his OZSwStr% to be more pedestrian, and therefore his expected K/9 is closer to that 6.99.

    Comment by dyross — December 31, 2009 @ 3:06 am

  10. Nice post Dyross. I know I defined OZSWSTR as the the % of the TOTAL pitches thrown that resulted in an out of zone swinging strike. I was not looking at how the % of the out of zone pitches that a hitter swung and missed. Essentially I wanted to convert all the numbers as a % of total pitches thrown. So in theory all the variables would add up to 100%. I also looked at it the other way as well, and I included things such as first pitch strikes. I just ended up leaning towards converted all the variables to a % of total pitches thrown. Doing so made me toss out things like first pitch strikes.

    To get the number of swinging strikes out of the zone on the total pitches thrown some math is involved. Off the top of my head you have to start with Zone% and then do (1-Zone%) to get OZ pitches. Then you have to find the number of pitches (as a % of total pitches) that were swung at outside the zone. OZSW*(1-Zone). Now you can do (O-Contact *(OZSW*(1-ZONE))) to get how many balls were hit that were out of the zone. Then you subtract that from the how many balls were swung at outside the zone, and then you have OZSWStrikes.

    If you wanted you convert everything as a # of pitches and work it out that way as well.

    Essentially my goal was to get all the variables to add up to 100. For example:

    (OZ + CLSTR + INZSWSTR + INZCON = 100%)


    It isn’t going to add up to 100% perfectly but it should be within +/- 1%. Fangrpahs and Statcorner have slight discrepencies.

    For example using AJ Burnett (I used him since it was the easiest to sort)..his 2009 numbers ended up like this…

    To get OZSWSTRIKE:

    O-Swing% Zone% O-Contact%
    22.10% 49.90% 51.10%

    OZSWING= 22.10 * (1-49.9)
    OZCON= OZSWING * 51.10

    5.66% 28.21% 5.41 % 3.03% 19% 39%

    OZSWSTR IZSWSTR Foul% InPly% ClStr% Ball%
    5.4% 3.0% 16.8% 17.1% 18.8% 38.8%

    If you add the first line up it comes out to be 99.91% the second 99.94%

    His K% was 21.79%

    Using the formula below (which uses InPlay and Foul instead of INZCON and OZCON) you’ll get an…


    ek%= 21.38%

    Comment by Matt Hanna — December 31, 2009 @ 10:38 am

  11. Hey Matt,

    I noticed that your article used OZSwStr% but the above post used OContact%, which is why I made the above comment. I think your results are quite impressive – it would be great to see a followup with a better expected FIP based on these peripherals. Even though you threw it out, a .7 R^2 for BB% is very compelling, and I would be interested in helping to come up with better formulas for both, and making a leaderboard of players who’s peripherals underperformed their, well, peripherals’ peripherals.

    Happy Holidays,


    Comment by David Ross — December 31, 2009 @ 12:40 pm

  12. You’re totally right. My bad. What I’ve submitted above presupposes a constant O-Swing% among all pitchers — which, obviously, that’s not the case. From just the sample I picked, there’s a range of 12.30% (Sidney Ponson) to 32.80% (Hiroki Kuroda). Certainly, pitching outside of the zone in such a way as to induce swings in the first place — that’s important.

    Anyway, I re-ran the numbers to see how things would be different. Because Matt did it for all pitches (see his post below), I thought it might be interesting to look at just the OZ pitches that were whiffs. In one way, it makes sense: The idea is to trick the batters but to do it efficiently. The Top 10 list looks like this:

    Rich Harden (15.80%)
    Felipe Paulino
    Chad Billingsley
    Freddy Garcia
    Ryan Dempster
    John Smoltz
    Billy Buckner
    Javier Vazquez
    Tim Lincecum
    Randy Johnson (13.59%)

    Again, that’s a list of pitchers who (a) threw a pitch out of the zone, (b) induced a swing with said out-of-zone pitch, and (c) received a whiff on said swing.

    Next week, I’ll do a version weighted with in-zone swings and misses.

    Sorry to’ve screwed up so royally. Thanks for being a reasonable critic.

    Comment by Carson Cistulli — December 31, 2009 @ 2:15 pm

  13. Do we have enough data to guess whether or not this is a predictive stat?

    If it is predictive and if we could develop an expected BB%, then we would really be on to something.

    Comment by Scottwood — December 31, 2009 @ 2:58 pm

  14. Matt,

    Carson’s post prior to this one inspired me to re-examine the work you did and to update for the entire season. Click the following attachment to see total 2009 for all pitchers at 300+ xOuts.

    I’ve also taken the time to include FIP, xFIP, K%, tRA*, and wOBA. Here’s my post about it that includes the link to the .xls

    Comment by Sandy Kazmir — December 31, 2009 @ 4:24 pm

  15. Thanks for doing that. That is a great collection of stats. Only 2 of the top 10 pitchers in eK% were AL starters. And just 3 of the top 20. Quite an impressive display by Verlander, Lester and Greinke.

    Comment by Scottwood — December 31, 2009 @ 4:53 pm

  16. I also think it would be interesting to see which peripherals affect K/BB instead of just K%. I surmise that high OZSwStr% may lead to higher K%, IZSwStr% could be the clincher to real dominance. Just a thought.

    Comment by David Ross — December 31, 2009 @ 5:59 pm

  17. The peripherals that affect K% may be different (or at least differently-weighted) than those that affect BB%, making it more difficult to create predictions for the combination of K/BB. Combining two stats in some cases necessitates an increase in sample size for the same level of confidence as well.

    Comment by joser — December 31, 2009 @ 6:40 pm

  18. Thanks Scottwood, as mentioned, these are not league nor park adjusted so the NL should look better, comparatively.

    Comment by Sandy Kazmir — December 31, 2009 @ 6:59 pm

  19. Although I agree that it might require a larger sample size, I think it could have compelling results. It’s possible that some peripherals increase K% and BB% simultaneously, while others help just K%. In this case, the latter type would be much more predictive of effectiveness.

    Comment by David Ross — January 1, 2010 @ 3:01 am

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *

Close this window.

0.119 Powered by WordPress