## R-Squared Fun with BB% and SO%

Some have expressed interest in the r-squared values for common metrics like walk and strikeout rates for starting pitchers. With that in mind I took each of the pitchers with at least 100 innings pitched in 2008-9 and ran their BB% and SO%. This gave us a sample of 92 pitchers and some numbers that are pretty similar all told.

Hereâ€™s the graph with important information for walk rate:

And here for strikeouts:

Minimal difference. Both are skills, albeit not 100% so. The more interesting aspect from the data set is the biggest droppers and risers for each. Jon Lester saw his strikeout rate rise from 17.4% to 26.7%; Justin Verlander from 18.5% to 27.4%; Joe Blanton from 13% to 19.5%; Matt Garza from 16.6% to 22%; others who jumped at least five percentage points include Adam Wainwright, Clayton Kershaw, and Zack Greinke. The biggest droppers for Ks were Joba Chamberlain (-9.5%), Scott Kazmir (-7.8%), Ervin Santana (-6.4%), Micah Owings (-6.1%), and Bronson Arroyo (-4.9%). Something must be in the Angels water bottles.

As for walks. Verlander again excels at something good and saw his walk rate decline 3.5%; Barry Zito and Ubaldo Jimenez saw 2.6% drops; Ted Lilly and Ubaldo Jimenez round out the top five with 2.3% drops and Felix Hernandez experienced the sole other 2% drop. Kyle Davies had his walk rate climb 3.5%, Livan Hernandez and Owings (ouch) 3%, Andy Pettite 2.9%, and finally Santana and Todd Wellemeyer had their walk rates go up 2.5%.

For those who love consistency, Jake Peavy, Lester, and Nick Blackburn saw zero movement in their walk rates and Lilly held the same distinction with his strikeout rate.

Print This Post

FanGraphs Supporting MemberFirst, what season is what axis? I would assume ’08 is the x-axis and ’09 is the y-axis. Second, it’s interesting that the slope of both regression lines is well below 1. K% went up .5% league-wide, so a .76 slope is very surprising. I know pitchers’ K rates generally drop throughout their career, but not that much in one season. BB rates went up .2% league-wide, so a .8 slope seems a little extreme also.

FanGraphs Supporting MemberI just figured out that the regression line actually shows regression towards the mean. (who woulda thunk?) I forgot to factor in the intercept in my initial analysis.

Unrelated, I wonder if league changes really affected the results that much. Other than Blanton, who already had a third of a season with PHI last year, no other league changing pitchers up there, one way or another.

150 starters/year and only 92 threw 100+ innings over 2 years? I find that amazing. Nearly 2 guys per team turnover in the rotation in just one year. Maybe giving long term deals to pitchers really is a bad idea!

Has there been any good work done indicating the circumstances under which K% increases are considered likely to be sustainable?

Whoa now: “Minimal difference. Both are skills, albeit not 100% so.”

Just because something isn’t correlated r^2=1 doesn’t mean it’s not almost entirely a skill. Randomness does not imply—for lack of a better term—luck. There isn’t a lot of luck involved in a BB or K outside of umpire calls and quality of competition. The variation is almost entirely simply variation in pitcher’s actual skill levels year to year i.e. new pitches, arm strength/health, mechanics changes, pitching strategy changes etc.

“There isnâ€™t a lot of luck involved in a BB or K outside of umpire calls and quality of competition.”

The problem is, that those two luck variables are a lot more relevant than you might think.

They are significant, I’ve seen the (your?) stuff on it, but dare I venture to say Verlander or Lester’s skyrocketing rates have a lot more to do with their FB jumping by 2.0 and 1.5 MPH respectively—or in reverse, Kazmir losing velo/Joba going from relieving to starting. That’s variation in true skill.

I guess the point stands that “it’s not 100% skill”, but that’s kind of a pointless statement then as no statistic is measuring 100% skill.

Right. Even if it was 100% skill, you would expect a lowish r^2 due to random variation (actually, that could be an interesting study. You could use the binomial distribution to find the expected year to year correlation of a stat that’s 100% skill… but I digress).

So yeah, K and BB rates are going to be skill for the most part (however, I did find that Pineiro was probably getting a bit lucky with his walks this year), and when you see a massive drop or rise, it’s most likely a combination of skill and luck, but much more skill than luck… if that makes sense.

The magnitude of an r or r^2 does NOT tell is whether something “is a skill” or not or even “how much of a skill” it is. In fact, the terms “is a skill” or “how much of a skill” are almost meaningless with respect to an r or r^2.

The magnitude of an r or r^2 in this kind of a regression is a function of two things and two things only. One, the spread of “skill” (the true rate for each player) in the sample of players (actually the population from which the sample is drawn) and two, the sample size of each variable (dependent and independent).

In this case, he is regressing one year on another year and the sample size of each year is around 150 innings (>100 IP).

To illustrate how the magnitude of an “r” has nothing to do with the “level of skill” that a metric reflects, consider that if he had done the same regression, but regressed 5-year samples on other 5-year samples for each player, the r would be much higher, in the .8 range or so. Would we then say that a pitcher’s K or BB rate is very much a skill now? And what if he did the regression using months rather than seasons? The “r” would be in the .2 or .3 range. Did the skill associated with BB and K rates suddenly diminish?

You might say, “Yeah, but with smaller samples, there is less skill and more luck,” and you would be right. But that still is not necessarily reflected in the “r”.

Imagine that we took all pitchers with around the same BB or K rates, and put them in their own league. BB and K rates would still very much be a skill, wouldn’t it? Do you know what the “r’s” would be if we ran regressions using players from this league? Zero! Regardless of the sample size – year to year, month to month, 5 years to 5 years. All zero! So now we have the same “skill” – BB and K rate – and large samples, but somehow the “r” is zero. Does that mean that BB and K rates have NO skill involved? No. As I said, an “r” in this kind of analysis is a reflection of the skill/noise ratio AND the spread of that skill among the population of players from which the sample is drawn.

So let me repeat. All the the “r” tells us is some combination of the sample size in each of the variables and the spread of true talent in the population with respect to that attribute being measured. And without further testing, we have no idea which is stronger – the sample size effect or the spread of true talent. With little spread in true talent, if we have large enough sample sizes, the “r” will approach 1. And with a large spread of true talent, even with small sample sizes, the “r” will approach zero. In addition, a small sample for one attribute may be a large sample for another. That depends on the “noise” (sample and measurement error) inherent in measuring that attribute.

For example, in measuring a player’s true speed, there is little inherent sample and measurement error and you can get a pretty good idea of a player’s true speed with just a few measurements. To measure a player’s true batting skill by recording their hits and outs, you obviously need several thousand opportunities to get a similar reliability. Does that mean that speed is “more of a skill” than hitting ability?

And to reiterate one of my earlier points, even though it is easy to reliably measure a player’s true speed, if all players in a sample/population had the same speed, any regression from one time period to another would yield a zero “r”. Would that mean that speed is not a skill? But if there were a spread of speed (some players are faster than others) in that sample/population, even a small one, a regression where the sample sizes were small would yield an “r” close to 1.

Question. The chart says BB% and SO% … but doesn’t clearly identify the denominator. Is it innings — *OR* is it batters faced? For pitchers with excellent (or awful) defense, this could make a difference. I know that BB/9 and K/9 are the most commonly displayed percentages … but most of the top notch metrics for pitchers remember to use batters faced instead to avoid defensive quality from skewing results in individual cases.

Is the data normal? If you look at the X axis and the Y axis by themselves it looks like you have a little bit too much data on the ends and the tails are a little fatter than a strict normal distribution. The further you get away from that idealized picture the less the R^2 will tell you about what you are measuring since the tails will have more influence on the metric since it relies on the squared distance of the sample points from the generated regression line. R^2 is actually very rarely useful – but it is simple to explain and therefore gets used a lot.

R. J.,

Can you include the standard error of the coefficients in the regression when you report them (or p-values, depending on your taste)? These numbers have no meaning without the measure of uncertainty around them.

Also, some other controls should probably be included. The improvement in walk rate or K rate could also be a function of tenure in the league/age. Those factors need to be controlled for.

Finally, as MGL states at length, R^2 is about the last thing that I will trumpet to declare the usefulness of a regression result.

Also, here is how I would interpret these results: the average pitcher’s correlation between last year’s BB rate and this year’s BB rate is 0.8013, without controlling for other confounding factors. And a one-year cross-section correlation doesn’t tell you much: if you have a time-series of individual pitchers (at least two observations for each pitcher, over a few years), those should be used. And in fact, after a certain age, if you expect them (BB or K rate) to stabilize, that should be the sample to give you your best measure of the degree of persistence (or the skill inherent).