What to Expect When You’re Expecting Strikeouts
I tend to get rather obsessed about starting pitchers, strikeouts, and predictability – three things that go together like Tracy Morgan, funny stuff, and sanity. What keeps me up at night, and what I find particularly unnerving are sizable, unexplained variations from year to year. Why did Jhoulys Chacin go from 9.04 strikeouts per nine innings pitched in 2010 to 6.96 in 2011? Why did Ricky Nolasco go from 8.39 K/9 to 6.47? Jered Weaver from 9.35 to 7.56? When obvious indicators of age or injury aren’t there, the resulting chaos has me reaching for torches and pitchforks.
My background is mostly in higher education and doing social science research reveals a high degree of human behavior is surprisingly predictable. And perhaps what gets me so riled is that measuring athletic performance just isn’t the same as trying to predict, say, how people in Concrete, WA might vote on an initiative. But what we can attempt to do is take the information available to us and partially explain away the outliers — that is, if we have a model that says a pitcher ought to have performed at a certain level, we can then look outside the controlled variables for answers. Or something like that.
About a month ago, Bradley Woodrum nicely demonstrated the relationship between swinging strike rate and strikeout rate, heteroscedacicity be damned. While the fit of the model wasn’t perfect, it demonstrated a pretty interesting relationship, and then of course Bradley goes on to present the worst-case scenarios of strikeout rates, making it all the more interesting – but the relationship between the two variables is what got my noodle baking. There have also been a few notable looks at fastball velocity as it relates to strikeouts, most notably, Dave Cameron’s effort from a couple years back.
What I’m after is expected K/9, but not as a predictive tool looking forward, but more as a check, much the way we use expected BABIP. To keep it simple, I looked at the 90 qualified starting pitchers from 2011 and their corresponding K/9 in 2010 (which unfortunately cuts us down to 76 in the sample as not everyone qualified). In using 2010 K/9 perhaps we can help control for some of that volatility that we see year to year, but also capture pitchers that for one reason or another don’t fit the model when it comes to only swinging strike rate and fastball velocity (but it does run the risk of multicollinearity).
Performing a correlation on FBv, SwStr%, 2011 K/9, and 2010 K/9, the corresponding correlations look like this:
| K/9 2010 | SwStr% | FBv | |
|---|---|---|---|
| K/9 | 0.819** | 0.813** | 0.506** |
All three are statistically significant at .01.
Apologies for the quality of the following graphs, but clicking on them will make them more legible. The trend line for each, with their corresponding R-squared, looks as follows, starting with the trend line with K/9 2010:
Swinging Strike Rate and 2011 K/9:
FBv and 2011 K/9:
The R-squared in the K/9 2010 and SwStr% are right around .67 while the fit of the FBv is considerably lower at .256, as was expected. Taking all three as dependent variables with 2011 K/9 as the independent variable gives us a model that looks like this (with an R = .890; R squared = .792; Standard Error of the Estimate .689):
K/9 = -7.355 + (..089)*FBv + (39.726)*SwStr% + (.420)*K/9 2010
Running that against all qualified starters for 2011 give us an expected K/9 for 2011, helping to gauge whether some pitchers pitched above or beneath their predicted ability.
| Name | k92011 | k92010 | FBv | SwStr% | xK/9 | Difference |
|---|---|---|---|---|---|---|
| Daniel Hudson | 6.85 | 7.93 | 93.2 | 9.9% | 8.20 | -1.35 |
| Edwin Jackson | 6.7 | 7.78 | 94.5 | 9.3% | 8.02 | -1.32 |
| Ricky Nolasco | 6.47 | 8.39 | 90.5 | 8.9% | 7.76 | -1.29 |
| Carl Pavano | 4.14 | 4.76 | 89.0 | 7.1% | 5.39 | -1.25 |
| Luke Hochevar | 5.82 | 6.64 | 92.7 | 8.2% | 6.94 | -1.12 |
| Fausto Carmona | 5.2 | 5.31 | 92.5 | 7.9% | 6.25 | -1.05 |
| Cole Hamels | 8.15 | 9.1 | 91.7 | 11.3% | 9.12 | -0.97 |
| Josh Tomlin | 4.84 | 5.3 | 87.9 | 7.7% | 5.75 | -0.91 |
| Jhoulys Chacin | 6.96 | 9.04 | 91.0 | 8.2% | 7.80 | -0.84 |
| Dan Haren | 7.24 | 8.27 | 90.0 | 9.9% | 8.06 | -0.82 |
| Hiroki Kuroda | 7.17 | 7.29 | 92.0 | 10.3% | 7.99 | -0.82 |
| Ricky Romero | 7.12 | 7.46 | 92.1 | 9.6% | 7.79 | -0.67 |
| Jaime Garcia | 7.21 | 7.27 | 89.8 | 10.5% | 7.86 | -0.65 |
| Joe Saunders | 4.58 | 5.05 | 89.6 | 6.2% | 5.20 | -0.62 |
| Shaun Marcum | 7.09 | 7.6 | 86.9 | 10.3% | 7.66 | -0.57 |
| Jered Weaver | 7.56 | 9.35 | 89.1 | 9.1% | 8.12 | -0.56 |
| Wade Davis | 5.14 | 6.05 | 91.4 | 5.9% | 5.66 | -0.52 |
| Jake Westbrook | 5.11 | 5.68 | 90.0 | 6.3% | 5.54 | -0.43 |
| Bud Norris | 8.52 | 9.25 | 92.6 | 10.5% | 8.94 | -0.42 |
| Mat Latos | 8.57 | 9.21 | 92.8 | 10.6% | 8.98 | -0.41 |
| Josh Beckett | 8.16 | 8.18 | 93.1 | 10.5% | 8.54 | -0.38 |
| James Shields | 8.12 | 8.32 | 91.0 | 10.7% | 8.49 | -0.37 |
| John Lannan | 5.17 | 4.46 | 89.8 | 7.6% | 5.53 | -0.36 |
| Max Scherzer | 8.03 | 8.46 | 93.1 | 9.8% | 8.38 | -0.35 |
| Mike Pelfrey | 4.89 | 5.01 | 92.1 | 5.5% | 5.13 | -0.24 |
| John Danks | 7.13 | 6.85 | 91.6 | 9.3% | 7.37 | -0.24 |
| Justin Masterson | 6.57 | 6.97 | 92.7 | 7.5% | 6.80 | -0.23 |
| Chris Carpenter | 7.24 | 6.86 | 92.5 | 9.2% | 7.41 | -0.17 |
| Tim Lincecum | 9.12 | 9.79 | 92.3 | 10.7% | 9.22 | -0.10 |
| Matt Cain | 7.27 | 7.13 | 91.2 | 9.1% | 7.37 | -0.10 |
| Ervin Santana | 7.01 | 6.83 | 92.8 | 8.4% | 7.11 | -0.10 |
| Gavin Floyd | 7.05 | 7.25 | 91.2 | 8.4% | 7.14 | -0.09 |
| Randy Wolf | 5.68 | 5.93 | 88.4 | 6.8% | 5.70 | -0.02 |
| Rick Porcello | 5.14 | 4.65 | 90.1 | 6.3% | 5.12 | 0.02 |
| Matt Harrison | 6.17 | 5.29 | 92.8 | 7.6% | 6.15 | 0.02 |
| Brandon Morrow | 10.19 | 10.95 | 93.8 | 11.5% | 10.16 | 0.03 |
| Jeremy Guthrie | 5.57 | 5.12 | 92.5 | 6.3% | 5.53 | 0.04 |
| Roy Halladay | 8.47 | 7.86 | 92.0 | 10.8% | 8.42 | 0.05 |
| Bronson Arroyo | 4.88 | 5.05 | 87.0 | 5.8% | 4.81 | 0.07 |
| Jason Vargas | 5.87 | 5.42 | 87.4 | 7.8% | 5.80 | 0.07 |
| Colby Lewis | 7.59 | 8.78 | 89.0 | 8.2% | 7.51 | 0.08 |
| Chad Billingsley | 7.28 | 8.03 | 91.5 | 7.6% | 7.18 | 0.10 |
| Jon Lester | 8.55 | 9.74 | 92.7 | 8.7% | 8.44 | 0.11 |
| Justin Verlander | 8.96 | 8.79 | 95.0 | 10.2% | 8.84 | 0.12 |
| Kyle Lohse | 5.3 | 5.28 | 89.4 | 5.9% | 5.16 | 0.14 |
| Tim Stauffer | 6.2 | 6.64 | 90.4 | 6.5% | 6.06 | 0.14 |
| CC Sabathia | 8.72 | 7.46 | 93.8 | 11.2% | 8.58 | 0.14 |
| Brett Myers | 6.6 | 7.24 | 88.4 | 7.3% | 6.45 | 0.15 |
| Mark Buehrle | 4.78 | 4.24 | 85.6 | 6.5% | 4.63 | 0.15 |
| Tim Hudson | 6.61 | 5.47 | 90.4 | 8.6% | 6.40 | 0.21 |
| Mike Leake | 6.36 | 5.92 | 89.1 | 7.7% | 6.12 | 0.24 |
| Derek Lowe | 6.59 | 6.32 | 88.0 | 8.1% | 6.35 | 0.24 |
| Chris Volstad | 6.36 | 5.25 | 91.3 | 7.9% | 6.11 | 0.25 |
| R.A. Dickey | 5.76 | 5.35 | 84.4 | 7.8% | 5.50 | 0.26 |
| Clayton Kershaw | 9.57 | 9.34 | 93.4 | 11.1% | 9.29 | 0.28 |
| Ted Lilly | 7.38 | 7.71 | 87.4 | 8.5% | 7.04 | 0.34 |
| A.J. Burnett | 8.19 | 6.99 | 92.7 | 10.0% | 7.80 | 0.39 |
| Livan Hernandez | 5.08 | 4.85 | 83.9 | 6.4% | 4.69 | 0.39 |
| Wandy Rodriguez | 7.82 | 8.22 | 89.1 | 8.5% | 7.40 | 0.42 |
| Yovani Gallardo | 8.99 | 9.73 | 92.7 | 9.0% | 8.56 | 0.43 |
| Javier Vazquez | 7.57 | 6.92 | 90.4 | 8.9% | 7.13 | 0.44 |
| Ryan Dempster | 8.5 | 8.69 | 90.3 | 9.3% | 8.03 | 0.47 |
| Trevor Cahill | 6.37 | 5.4 | 89.1 | 7.6% | 5.86 | 0.51 |
| Ian Kennedy | 8.03 | 7.79 | 90.3 | 8.8% | 7.45 | 0.58 |
| Felix Hernandez | 8.55 | 8.36 | 93.3 | 8.8% | 7.96 | 0.59 |
| Paul Maholm | 5.38 | 4.95 | 87.4 | 5.7% | 4.77 | 0.61 |
| Doug Fister | 6.08 | 4.89 | 90.0 | 6.7% | 5.37 | 0.71 |
| Matt Garza | 8.95 | 6.62 | 93.7 | 11.2% | 8.21 | 0.74 |
| Gio Gonzalez | 8.78 | 7.67 | 92.5 | 9.5% | 7.87 | 0.91 |
| David Price | 8.75 | 8.1 | 94.8 | 8.4% | 7.82 | 0.93 |
| Ubaldo Jimenez | 8.6 | 8.69 | 93.5 | 7.5% | 7.60 | 1.00 |
| Madison Bumgarner | 8.4 | 6.97 | 91.7 | 9.2% | 7.39 | 1.01 |
| Anibal Sanchez | 9.26 | 7.25 | 91.7 | 10.9% | 8.18 | 1.08 |
| C.J. Wilson | 8.3 | 7.5 | 91.0 | 8.3% | 7.19 | 1.11 |
| Cliff Lee | 9.21 | 7.84 | 91.5 | 9.3% | 7.78 | 1.43 |
| Zack Greinke | 10.54 | 7.4 | 92.5 | 10.6% | 8.20 | 2.34 |
I won’t spend a great deal of time here, but it’s worth talking through a few of these, and while the more interesting cases exist on the poles, I’d like to point out that the model actually is in full agreement with what Brandon Morrow did in 2011, predicting a 10.16 K/9 against his 10.19 K/9. The model thinks Zack Greinke, Cliff Lee, and C.J. Wilson ought to have looked much more like their 2010 performances, with Greinke being the real outlier. You’ll also notice there are some pretty solid arms at the bottom of that list, and you need to consider cases such as Felix Hernandez and David Price who experienced rather low swinging strike rates in 2011.
For players that we might see an uptick in K/9 should their velocity remain and they can maintain a similar swinging strike rate – the model thought Daniel Hudson, Cole Hamels, Edwin Jackson, and Ricky Nolasco should have all looked more like their 2010 selves than they did in 2011, all giving up right around a full strikeout per nine.
This isn’t perfect, and it will fluctuate right along with the fastball velocity and swinging strike rates of each starter. Also, some starting pitchers don’t rely on their fastball for strikeouts, and there may certainly be cases where a pitcher can maintain a higher K/9 rate without a plus fastball (Ian Kennedy, for instance). But where it is at least useful is to look back at 2011 and try to understand some of the inconsistencies year to year that we see relative to K/9.


![trend FBv[1]](http://www.fangraphs.com/fantasy/wp-content/uploads/2011/11/trend-FBv1.png)
Would have been interesting to incorporate called strikes and called 3rd strikes. Some pitchers that nibble relentlessly and consistently produce high called 3rd strike rates like Gallardo may be askew in this model. Fun piece though.
Nice!
Would also liked to have seen 2009-2010 data run, with an xK/9 for 2010, to see what happened to the outliers for 2010. Did those outliers return to expected K/9 rates, or were they a harbinger of good or bad for 2011?
Interesting info.
yeah, there are a lot of ways to go with this from here. Bringing in 2009 data, lowering the IP threshold for qualified starters, bringing in relievers, etc. Keep up with the suggestions, they’re helpful!
Thanks for making me go look up a word :)
Noticed Cory Luebke wasn’t on the list since he was a reliever in 2010 and only started most of the 2011 season.
He projects to a 8.96 xK/9 for 2011 using his reliever K/9 # from 2010. Expect him to potentially post a K/9 around 8.5 in 2012.
yeah, Luebke didn’t qualify, but he’s an interesting case. His overall K% was 27.3%, good for third in baseball – sandwiched in between Greinke and Kershaw. Based on his minor league track record, I’m skeptical if it will stay that high, but that’s just gut speaking. Too bad we don’t have 150 IP from 2010 as a backdrop.
I think we’d be better off here with a three-year weighted sample of K/9 (or better yet K%) as input to the predictor. Part of the reason we’re interested in a predictor like this is to spot if a guy exceeded his talent level last year. I think having last year’s K/9 as a primary input dilutes the result. Another interesting tweak might be to exclude K’s vs. pitchers batting in the NL, as those can inflate K/9 for some.
Anyway, interesting article, thanks!
all good suggestions – and I’ve actually started on the K% angle. I did have a measure to control for league, but it wasn’t statistically significant – but we know league is at least part of the story with K’s. Having 2010 K/9 as a control variable actually tightened up the model where SwStr% and FBv didn’t tell the whole story. Still a few directions to go with this stuff though, for sure. Thanks for the suggestions.
Just saw this article now…some thoughts: Simply put, there are many many factors at play. Some of it may have to do with pitch arsenal and rates; Change-up guys (DHudson, Hamels, etc) vs cutter throwers. Handedness is likely a fairly large impact as well (particular handedness in relation to certain pitches…like cutter or 2seam). Called strikes obviously have a LOT to do with it (my perfect example from 2011 would be Bartolo Colon and his 2-seamer). Also have to factor in the obvious ones such as consistency of figures. You are basing this largely on an equation that takes into FB velocity and swinging strike rates into account. They vary year to year and at times greatly. Therefore the expected figures would vary as well. I’m assuming all figures pulled for the above are from 2010 (in order to predict 2011). But what ability do we concretely have at predicting the variables in the equation? Other minor things like zone% and/or O-Swing% may be a factor in predicting rates. Also, what about GB/FB rates. And predicting Ks largely falls on secondary pitches which vary greatly in and of themselves.
I came to chime in about leagues as well. Cliff Lee and Greinke both switched to the National League, and those are two of the three pitchers whose strikeout increases were considered “flukey” by the formula here.
Pumpkin – will have a new one up on Second Opinion for 2012, which has a bit of narrative about the league changes on certain guys. It also looks at K% instead of K/9.
Looking forward to it, Michael.
Love seeing this kind of work.
We’re also to assume the relationships are linear. I’d imagine there are non-linear relationships with velocity.
Yet another thought….you probably shouldn’t run the equations and relationships off of the raw data. I’m thinking you should normalize each value and compare the normalization. It would make it more apples to apples in terms of variability within each column of data.
Using 2010 data I isolated the extremes in some context of swingstr% vs K/9 (top 10 each side). These are the vague eyeball test items I observed (pos/neg referring to the “diff” that you obverved and pitches req at least 2% thrown to compare)
1) The pos tend to have below avg z-swing% and vice versa (62 vs 66%)
2) The neg tend to be high % change-up throwers (22.6 vs 11.4%)
3) Also regarding the change-up…the neg also tended to be “slower” change-ups, while the pos were “faster” (85.0 vs 81.4 mph)
4) Velocity itself was across the board difference with lower velocities being in the neg group (FB 92.5 vs 89.9 mph), (SL 85.6 vs 83.8), (CU 77.1 vs 73.9) and even (CT 87.7 vs 86.1)
5) Summing up SL and CU % thrown…the pos group threw more of them (22.7 vs 15.9%)
Areas that seemed to pose no meaningful difference:
1) Contact percentages
2) FB%
And fwiw, the correlation between normalized K% and K/9 figures was 99% out of qualified SP.
Heteroscedacicity and collinearity be damned? I think we should consider actually causes strikeouts instead of these unsurprising correlations. Right?
not sure I follow. The heteroscedacicity was a comment on Bradley’s interesting piece (if you read it, he admits to the effect, but it was still interesting research and a jumping off point for other questions). And it’s not that the correlations are surprising or unsurprising, it’s their value as a predictive tool to sniff out over and underperformers relative to strikeout rates.
Why isn’t this on fangraphs?
The second two graphs appear as if the determinant and response variables are flipped on the x and y axis. What am I missing? FB velocity should influence K rate, not vice versa.
Let me point out that K/9 is usually an inferior metric to K%. Simply put, different types of pitchers face different numbers of batters per inning, even when not considering strikeouts. For example, no one gave out fewer BBs than Josh Tomlin last year, by any metric. He also had a pretty low BABIP, fueled in part by his fly ball tendencies. Put that together, and you have a guy whose K/9 is not going to match his K%, since he’s facing fewer batters than any other pitcher with a comparable strikeout ability (which admittedly is not that great).
Or, look at it this way, a pitchers punishment for increasing his BB/9 is a chance to increase his K/9. Similarly, variations in BABIP can have minor variations on K/9.
Thanks Omikron – that’s exactly why I pubbed this: http://www.fangraphs.com/plus/index.php/mining-for-under-and-over-performers-strikeouts/
Available on FG+, it’s a similar project that uses K% instead of K/9. If you’re a subscriber, I hope you like it. Feedback is always welcome.