## Velocity and K/9

One of the things that I’ve been wanting to look into this off-season is the relationship between velocity and effectiveness. As we all know, major league pitchers are selected off of the strength of their fastball more than anything else. Body type, breaking balls, performance – all of those fail to receive the same level of confidence as fastball velocity. If a guy can throw in the upper 90s consistently, he’s going to get a chance to work out all his other issues. If a guy can’t break 80, he’s going to have to be phenomenal at everything else to even get a crack at a major league job.

However, we know that velocity isn’t everything. Command, movement, the ability to mix pitches – these all count, and in many cases, they count a lot. Jamie Moyer is the obvious example that everyone points to. It’s clear that velocity isn’t a prerequisite for major league success, but that doesn’t really give us an answer for how important it is.

To start looking at the issue, I’ve taken the 427 pitchers who accumulated at least 30 IP in the majors last year and plotted their velocity and K/9 rate on a graph. Rather than talk about it some more, I’ll just show you that graph.

You can click on the graph to see the full version, by the way.

First off, there’s an obvious relationship. The regression line through the middle trends up, which fits with common sense – guys who throw hard strike out more batters than guys who don’t. But perhaps the slope of the line isn’t quite what you might have expected? It’s lower than I thought it would be, honestly. While there are guys like Jonathan Broxton and Fernando Rodney who fit right into the high velocity/high strikeout rate category, there’s also Brandon League and Matt Lindstrom – the two hardest throwers in the sample, and they posted K/9 rates of 6.27 and 6.75 respectively.

If you look down in the right hand corner, you’ll notice the r squared, which is the coefficient of determination. This number, .2299, essentially means that if you had a pitcher’s single year velocity data, you’d know about 23% of what is necessary to know his strikeout rate for that year. The other 77% of strikeout rate is not explained by how hard he throws his fastball. Now, since these are just single year samples, a portion of that unexplained K/9 rate will be noise, so don’t take that to mean that 77% of strikeout rate is command, off-speed stuff, and other factors all under the pitcher’s control. There’s variables outside of what the pitcher can influence that are in play, too – the umpires, the opposing batters, etc…

Still, though, it’s important to know that if you’re trying to predict strikeout rate, velocity is about 1/4 of what you need to account for. That makes it likely to be the biggest factor, but it’s not so dominating as to exclude the other things besides throwing hard. A high velocity fastball is a good thing, but it is definitely not the only thing.

Print This Post

One interesting thing about this graph to me is that it seems to suggest that speed does establish a ceiling for a pitcher’s strikeout rate. The upper left quadrant on this graph is almost non-existent.

I also noticed the lack of high-strikeout, low-speed guys, but I think that’s more evidence of selection bias rather than strong support for the idea that you can’t get Ks with slow fastballs.

The fact is, there simply aren’t many pitchers in MLB with low-velocity fastballs. Just look at the number of pitchers with fastball velocities higher than 90 MPH and K/9s lower than 6 and compare it to the total number of pitchers with velocities below 88 MPH. If the trendline from Dave’s analysis holds true, then one would expect that you’d find more than a few pitchers with 85-MPH fastballs putting up K/9s of 6 or higher. It looks like this is related to what Dave said in the first paragraph — that high-velocity pitchers are given more of a chance to work things out in the majors.

I think there’s a certain dose of selection bias on the low speed side of things, too. There are probably quite a few guys who could be successful strike-out pitchers with slow fasballs, but major league clubs won’t ever let them into a starting line up to prove it, because it goes against the commonly held belief that you need a 90 mph fastball to pitch in the majors.

I’m curious to know why, if you wanted to determine “the relationship between velocity and effectiveness,” you studied the relationship between velocity and K/9 rate. Is K/9 the only/best measure of “effectiveness”?

I would be curious to see the relationship between velocity and FIP.

I’m not done graphing. We’ll look at a bunch of different things.

I think of r-squared a little differently. The r-squared is .23, but that’s the percent of “statistical variance,” which is an abstract concept. The units of the variance in this case is “strike outs per 9 innings

squared.” Sure, you’d know 23% of that, but it has no real-world meaning outside of statistical abstraction.r, not r-squared, is what you’d actually use as an estimator for K/9, as its units are (standardized) “strike outs per 9 innings.” r (the correlation coefficient) includes the variance due to velocity plus all the covariance related to velocity, which we know and which does count toward our estimation of the variable of interest. That’s far higher–0.48. I’d prefer to say you know 48% of the answer.

I thought the exact same thing, the definition of r-squared is a bit off in the article. You can’t show causal relationships using r-squared calculations. You could throw in an extra 10 variables to your equation (player height, birthdate, left or right handed, etc) in addition to fastball speed and get an r-squared close to 1.00, almost every time. That doesn’t mean we can say “2% of a pitcher’s K/9 is from their height”, but rather that 2% your statistical model’s variance can be explained by the player’s height.

R-squared is one of the most understood concepts in statistics.

wonder if the author here was confusing r-squared with the results from something like an ANOVA or GLM?

You could run FIP as the response variable in a test with things like FB velocity and predictor variables that cover different types (Fb velocity and BB:K rate would be two nicely different variables), and then assess the contribution of each to the error seen in the results. That might allow you to say that FB velocity explains 23% of variation in FIP, for example.

What does it look like if you graph just starters or just relievers (or if it’s easier, just set the bar higher than 30 IP)? I would imagine that LOOGY type pitchers would end up with disproportionately more strikeouts because the matchups consistantly play to their advantage.

Dave,

Nice analysis. I wonder what the slope would look like on a truncated sample where you tossed the clear outliers like Moyer and Glavine. OLS regression gives disproportionate emphasis to outliers, and I’d argue that Moyer/Glavine types are probably not good “typical” pitcher profiles–at least not when they’re having undue impact on the regression coefficient.

You could also (wonky bit here) use a Mean Absolute Deviation regression on the full sample, which would give equal emphasis to deviations across the board, rather than disproportionate emphasis on the outliers.

Anyway, this is minor stuff, and I quite like the result.

–Chris

I am curious, who are those few who managed to strike out about 9/9IP with 86 mph fastballs?

J.P. Howell and Trever Miller, lending a little support to the earlier LOOGY comments.

except Howell isn’t a LOOGY.

Would it be instructive or helpful to have different graphs for starting pitchers and relievers…?

I’d love to see the regression results with combined variables like +93mphFB x repertoire or +93FB x slider, etc. I’m sure the R-squared would increase, but I’d be more interested in which variable has the highest correlation. Is it a speedy fastball and slider combo or is it a full repertoire of pitches that more accurately correlates with a high k/99?

I think Hardball Times showed that it isn’t necessarily a great fastball, but that the biggest factor is speed differential between the fastball and the offspeed pitches. It would be cool to see the graph above, but replaced with avg speed differential on the x-axis instead of fastball speed.

Really? What’s going on?

This is a big site for nerds. We’re total nerds over here. We don’t even know what a bed is, because we drink 2 gallons of mountain dew every day and never sleep. We just stay up all night doing regression analyses in front of our computer until April, at which point we go to sleep next until next off season, when we wake up and start doing regressions on the last season again, having used most of the built up fat from mountain due hibernating.

I can’t believer I bothered responding to this.

(By the way, in case it wasn’t clear, that was intended to be a sarcastic response to Brick, not to be taken seriously. Playing up the stereotype, cause how else can you respond to a very strange comment about shirts and beds?)

Question about an earlier selection bias argument:

Couldn’t selection bias cut the other way as well here? Don’t you expect GMs to cherry pick relatively high k/9 80-90mph pitchers and leave the rest in the minor league scrap bin?

Also, if a great fastball does allow a young prospect to “figure it out” in the majors before he is ready, wouldn’t that depress the k/9 ratio of that mph class in a systematic way? It would theoretically bring down their average, whereas the more tested, “crafty” pitcher would arrive in the big leagues at a more ideal period in his K/9 progression.

I think the sample size is too small to make any conclusions about the k/9 average for a particular mph class, especially at the low-speed end of the spectrum.

My point was just to argue that we would see more low-speed high-k/9 pitchers if we had more low-speed pitchers in the sample. My interpretation is that there are more high-speed pitchers in MLB, not only because of the relationship between speed and Ks, but also because high-speed pitchers are given more chances to succeed. If the trendline holds true, then you’d expect a relatively even distribution of K/9s around the line for any given speed. For pitchers with fastballs faster than 89 MPH, this is pretty much what you find — and I’d say that’s because there are enough pitchers at that level to provide a fairly representative sample. With fewer low-speed pitchers, you’re more likely to get results that look skewed, such as the apparent “ceiling” that Thor noticed above.

Yes, this is a totally random bump on an old discussion. There are a couple guys this year who thus far are racking up some K’s despite low velocity – Mike Leake and John Ely.

Attractive section of content. I just stumbled upon your weblog and in accession capital to assert that I get in fact enjoyed account your blog posts. Any way I’ll be subscribing to your augment and even I achievement you access consistently fast.

For whatever it’s worth, I repeated this experiment with the following constraints:

The pitcher must have been a starter

Data >= 2001 season

IP >= 500

Excludes postseason innings

I got an R^2 of 0.20 with alpha < 0.05. Seems to be a relatively valid model.