Adventures in Swinging Strike Rate vs. K Rate

A few weeks ago, Eno Sarris took a look at a few batters with high swinging-strike rates and average strikeout rates, showing that a batter with a penchant for (or weakness in) whiffing on pitches doesn’t necessarily post as a high number of strikeouts as you would expect. Josh Hamilton, Delmon Young, and Vladimir Guerrero were identified as players who combine decent strikeout rates with high swinging-strike rates. These batters are characterized by their below-average walk rates while being known as free-swingers. Their aggressive approach presents both fewer strikeout opportunities and fewer walk opportunities as they try to put the ball in play early in the count.

This got me thinking: Since there are batters who can avoid strikeouts who presumably swing early, are there batters who get too many strikeouts because they don’t swing enough? I mean, clearly swinging strikes are not the only way to strike out a batter, and a batter who leaves his bat on the shoulder too often will get lots of called strikes. A conservative approach with few swings at anything in the hopes of drawing a walk could backfire. Such batters do exist — it’s just about identifying who they are.

I plotted 2010 batters with 200+ PA and their K/PA against SwStr%. Take a look below:

Please click here for an adjusted and embiggenzified image if you want to see the names. It may be more useful to right click and open the link and view it in a new window or tab.

There are plenty of outliers worth looking at. Based on their SwStr%, other batters with lower K rates than you would expect (a la Vlad) include Jake Fox, Juan Uribe, A.J. Pierzynski, and Pedro Feliz. On the other side, batters with higher K rates than expected include Brett Gardner, Eric Patterson, and Wes Helms.

Just eyeballing this scatter plot tells us that there is indeed a decent positive relationship between SwStr% and K rate for batters (and why not?). Note also that if we ignore outliers such as Mark Reynolds (chuckle), Rick Ankiel, Miguel Olivo, Fox, Guerrero, and Patterson, the variance in K/PA for any particular value of SwStr% appears to be consistent (that is, as SwStr% increases, the variance in K/PA is approximately the same). In the statistics world, data behaving as described is known to exhibit homoscedasticity as opposed to heteroscedasticity, where the variance dramatically differs with the x value.

A regression on this relationship shows a positive trend between the two stats with a decent correlation coefficient of 61.6%. Using the regression model to predict K/PA, I found the “expected K rate” or expected K/PA based on SwStr%.

Here are the top batters with 500+ PA who struck out “less” than expected, sorted by the difference between expected K rate and actual K rate. K/PA is actual K/PA while xK/PA is expected K/PA:

Name PA Swing% Contact% SwStr% K/PA xK/PA Diff
Vladimir Guerrero 643 60.6% 80.3% 11.3% 9.3% 22.2% -12.8%
A.J. Pierzynski 503 56.7% 86.3% 7.5% 7.8% 16.8% -9.0%
Josh Hamilton 571 55.3% 75.1% 13.3% 16.6% 25.0% -8.4%
Juan Uribe 575 54.8% 76.8% 12.4% 16.0% 23.7% -7.7%
Delmon Young 613 59.0% 82.4% 10.2% 13.2% 20.6% -7.4%
Brandon Phillips 687 52.7% 81.9% 9.3% 12.1% 19.3% -7.2%
Vernon Wells 646 50.8% 81.1% 9.6% 13.0% 19.7% -6.7%
Pablo Sandoval 616 57.8% 82.8% 9.3% 13.1% 19.3% -6.2%
Jeff Francoeur 503 60.4% 80.5% 11.3% 16.1% 22.2% -6.1%
Carlos Quentin 527 50.6% 77.5% 11.0% 15.7% 21.7% -6.0%

And here are the batters who struck out “more” than expected:

Name PA Swing% Contact% SwStr% K/PA xK/PA Diff
Brett Gardner 569 31.0% 90.6% 2.9% 17.8% 10.2% +7.6%
Casey Blake 571 41.8% 80.2% 8.0% 24.2% 17.5% +6.7%
Colby Rasmus 534 46.7% 75.7% 10.9% 27.7% 21.6% +6.1%
Drew Stubbs 583 43.9% 72.3% 11.7% 28.8% 22.7% +6.1%
Bobby Abreu 667 32.9% 83.1% 5.4% 19.8% 13.8% +6.0%
Justin Upton 571 41.5% 74.3% 10.2% 26.6% 20.6% +6.0%
Adam LaRoche 615 45.2% 74.1% 11.3% 28.0% 22.2% +5.8%
Austin Jackson 675 47.0% 79.4% 9.4% 25.2% 19.5% +5.7%
Adam Dunn 648 45.0% 68.2% 13.8% 30.7% 25.7% +5.0%
Mark Reynolds 596 47.0% 62.2% 17.1% 35.4% 30.4% +5.0%

One consequence of homoscedastic data finds itself in the tables above. SwStr% appears to have no bearing on whether the batter struck out more than he was expected to or less than he was expected to. It also should have no bearing on how closely the expected K rate predicted the actual K rate.

Another trend to note is that the first group of batters swing at pitches a lot more often than the second group of batters. Guys like Brett Gardner and Bobby Abreu in the second group swing so rarely that merely an average or slightly below average K rate will place them high on this list (low Swing% leads to low SwStr%, which predicts a low xK/PA via the regression model).

So what does this all mean? I’m not exactly sure yet. This running commentary demands more work to be done in this department, and there are plenty of interesting studies to continue from this:

– How do low-swing and high-swing batters distribute swings based on the count?
– Which batters strike out via swinging strikes the most? Via called strikes?
– Can swing rate and swinging-strike rate (and others) predict strikeout rate?
– Is there such a thing as batters who should swing more in order to avoid strikeouts?
– Aggressive vs. conservative approach: Which to use based on ability to make contact?
– And how about pitchers?
– Etc. (Any other thoughts?)

Concerning the third point, you might expect batters who swing a lot tend to also strike out a lot. Turns out that there is little correlation between the two when you consider how varied Major League hitters are at making contact and putting the ball in play. Multicollinearity would also play a role in a potential multiple regression model that uses swing rate and swinging strike rate to predict strikeout rate.

At this point, I suppose the end goal is to find out which batters swing at pitches purposefully and which do so recklessly and how such approaches helped or hurt the batter in terms of strikeout rate. More on this to be continued. Feel free to post your ideas, thoughts, or criticisms below on investigating the relationship between plate discipline statistics and strikeout rate.

Print This Post

Albert Lyu (@thinkbluecrew, LinkedIn) is a graduate student at the Georgia Institute of Technology, but will always root for his beloved Northwestern Wildcats. Feel free to email him with any comments or suggestions.

23 Responses to “Adventures in Swinging Strike Rate vs. K Rate”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Rick says:

    What’s the correlation between P/PA and K/PA? I imagine it’s quite high. If you do things that lead to a lot 2 strike counts, you’re likely to strike out. And the way to get to 2 strikes it to either take a lot of strikes (which generally means taking a lot of pitches) or swinging and missing.

    The guys who exceed expectation do both. The guys who are below expectation swing a lot but don’t take a lot of pitches. Seems pretty straight forward. Strikeout rate and pitch taking both drive striking out. The homoscedasticity seen here would seem to suggest a very low correlation between the two. It would be interesting to see the scatterplot of SwStr% and Sw%

    Vote -1 Vote +1

  2. AlexS says:

    Great stuff Albert, you outed yourself as a Simpsons fan as well.

    Vote -1 Vote +1

  3. Telo says:

    A few weeks ago, Eno Sarris took a look at a few batters with high swinging-strike rates and average strikeout rates, showing that a batter with a penchant for (or weakness in) whiffing on pitches doesn’t necessarily post as a high number of strikeouts as you would expect.

    I love how this is some sort of revelation. Thank god for the groundbreaking work of Eno Sarris.

    Vote -1 Vote +1

    • Eno Sarris says:

      This sounds familiar, so I’ll just repeat what I said in the last comments section, with an addiiton. I didn’t present the piece as ground-breaking work, nor do I position myself as a pre-eminent researcher. I’m a noob writer just trying to take a critical eye to baseball.

      On that tip, doesn’t it make sense to question even basic assumptions about the game? Isn’t that how we got the genius that is The Book? I’m not comparing myself to Tango, I’m just saying that we can’t just say, oh, that’s obvious, no need to go look at that phenomenon.

      Especially in January and February haha.

      Vote -1 Vote +1

  4. philosofool says:

    I think these two studies highlight the need for Klooking% and Kswinging% on fangraphs.

    Vote -1 Vote +1

  5. Whelk says:

    I remember an article (here I think, though it might have been over at The Hardball Times) that I’m having trouble finding right now, that examined where pitchers got their swinging strikes and compared that with their strikeout rate. It found that the ability to generate swings out of the zone correlated better with K% than the ability to miss bats within the zone. It’s sort of interesting, then, that the batters who were able to “beat” their swinging strike percentage and avoid strikeouts were a group prone to chasing.

    Vote -1 Vote +1

    • Telo says:

      Sounds like reverse correlation. Guys with good breaking stuff who get batters to chase, will get more Ks. Not the other way around.

      To your second point:

      There are two factors involved in a batter striking out:

      – Swinging and missing
      – Taking a lot of pitches

      If you do em both? You strike out a lot.

      Do just one? You strike out an average amount.

      Do neither? Well, you ain’t striking out very often.

      This is obviously rough and dirty, but that last Eno Sarris article was a joke the way people reacted surprised to it (Albert included.) It’s such a basic concept it’s hard to believe some of the people reacting to it had ever played or watched a baseball game.

      Vote -1 Vote +1

  6. Jimbo says:

    Could there be a ‘swing score’ the same way there is a ‘speed score’?

    Or measure attributes of hitting apart from box-score results.

    First strike % could be high due to a patient hitter taking more (those who don’t mind hitting from behind in the count) or an aggressive hitter swinging at what might be the best pitch of the at bat.

    Same ratio (first strike %), but deeper analysis based on how a batter gets there.

    Apply similar logic to every possible count combination, then devise a patience index. Or does that already exist?

    Then you might be able to understand what it means when D. Barton and P. Fielder both post a 16% bb%. Completely different value if you can determine Barton’s mark is primarily HIS approach to pitchers, but Prince walks as much as he does due primarily to the pitchers’ approach to HIM.

    Sort of like FIP is to era, I would imagine in depth analysis along the lines mapped out in the article would eventually lead to similar bb/k visibility.

    Great stuff!

    Vote -1 Vote +1

    • Jimbo says:

      Example of where this would apply…

      Mark Reynolds “looks” like he walks a lot for a power bat. Granted, he does excel over some, but if you knew that his mark was more from being avoided than having a great eye…then regression is all the more likely if even a single ‘hole’ is found and starts to get challenged.

      Vote -1 Vote +1

  7. rodgers37 says:

    Very interesting article…after I read it I spent some time playing around with some data in Excel, and using my rudimentary knowledge of statistics, it seems that swing% and contact%, in tandem (using multiple linear regression), seem to predict K% fairly well. At a high level this makes sense to me because:

    1) The batter can’t strike out if they make contact
    2) The more often the batter swings, the more often the at-bat will end early in the count (when they can’t strike out)

    Of course there would definitely be some correlation between the two percentages (swing% and contact%) which might make it hard to develop a reliable model using multiple linear regression (just like you mention in the article with swing% and SwStr%).

    Vote -1 Vote +1

    • EK says:

      “it seems that swing% and contact%, in tandem (using multiple linear regression), seem to predict K% fairly well.”

      The formula is actually a lot simpler: 1-ct% = K%.

      Vote -1 Vote +1

      • rodgers37 says:

        I don’t think this is strictly true. Fangraphs defines contact% as follows:

        Contact% – Total percentage of contact made when swinging at all pitches

        At first glance I don’t think this should have the “1-ct% = K%” relation to K%. Checking a bunch of player profiles seems to confirm this.

        Vote -1 Vote +1

      • rodgers37 says:

        Just tried it out on some data – you’re right that 1-ct% seems to be a decent predictor – however using swing% as well in the regression better accounts for hitters like:

        1) Brett Gardner (see above) who have a very high ct% but strike out a decent amount because they take more pitches than most.

        2) Vladimir Guerrero who has an average ct% but strikes out infrequently because he swings at a lot of pitchers early in the count.

        Vote -1 Vote +1

      • EK says:

        I’m sorry Rodgers37, you are correct. I assumed contact % was contact rate, but that’s not the measure used in this site. 1-K% = Contact rate, which is a more important measure and which (as far as I can tell) does not seem to be reflected in a players page on this site.

        Vote -1 Vote +1

  8. Mike Savino says:


    This is what we should be doing during the offseason. Not complaining about comments.

    It seems to me that there should some sort sweetspot for lack of a better word. Batters who take a lot of pitches would probably strike out more than their projected k rate because they’ll strike out looking more often. Batters who have a high swSt% would underperform their projected k rate because they’d put the ball into play more (because they’re obviously swinging more). The guys in the middle would probably correlate well, but the guys on either end of the spectrum wouldn’t.

    I’m thinking.

    Vote -1 Vote +1

    • Mike Savino says:

      I think we should look at pitches per plate appearance as well as swSt%. I think that’ll tell a better story. lol, I’m actually sitting in Math 312 at university (which is statistics) so I probably should just run it myself.

      Vote -1 Vote +1

  9. mymrbig says:

    Three 2010 D-backs (Upton, LaRoche, Reynolds) on the “struck out more than expected” list. Wonder if that is a 2-strike approach thing or if their is some other correlation. Also wonder how much fluctuation their is year-to-year for individual players. Good stuff.

    Vote -1 Vote +1

  10. Cory says:

    Having followed Dunn for a couple seasons here in DC, I’m not surprised to find the data match the impression that he takes a lot of third strikes. It appears that when comparing him to a guy like Vlad, two things stand out:

    First, Vlad’s plate coverage (and tendency to use it to go for anything) is well known. Perhaps guys like Dunn don’t have that ability, whether because of weight, back problems, etc. Jim Edmonds struck me as a guy that took a lot of third strikes as well, and I know he had back problems.

    Second, Dunn seems to take the same approach to every ball, namely, if it looks in a good part of the zone, swing HARD! He rarely seems to change up an approach to go poking balls the other way, even when a massive shift is on. Perhaps guys like Dunn also can’t/won’t take slapping swings that may provide better coverage over a wider variety of locations. If so, Dunn’s approach makes sense given his below average speed and possible reluctance to turn his back on a reach or alter his normal timing and stroke.

    Vote -1 Vote +1

  11. cardhorn says:

    Don’t forget that it’s not just guys that take strikes and guys that swing and miss that strike out a lot. There are also the guys that hit a bunch of foul balls. Four foul balls, then a swing and miss. Not a bad contact rate. 100% strikeout rate.

    Vote -1 Vote +1

  12. Matt says:

    I did a fair amount of work probably over a year and a half ago regarding pitchers and their expected k rate based on pitch outcomes; in zone ss, oz ss, foul, call str, etc.

    I posted it over on draysbay. Im on my mobile so Im not going to hunt for the link. My handle there is matthan.

    Either way there is a strong relationship between pitch outcomes and k rate. Some variables sizeably stronger than others.

    In theory, it can be built on to minimize FIP by increasing K rate, due to pitch outcome optimization. Which can be accomplished by throwing different pitches in different locations maximizing their k rate.

    Vote -1 Vote +1

  13. bcp33bosox says:

    I wonder what Ted Williams would have done to any of the theories (or to possible future theories)…lol.

    BB% 20.6 %
    K% 9.2 %

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>