On June 3rd Marlins pitcher Henderson Alvarez threw an 88-pitch shutout against the Rays scattering eight hits while not issuing a walk. On July 11th Marlins pitcher Henderson Alvarez also gave up eight hits while not issuing a walk but only made it five innings after surrendering 6 runs. While the circumstances surrounding these two starts aren’t completely the same they do a good job illustrating the phenomena of cluster luck.
Cluster luck, originally discovered and coined by Joe Peta in his book Trading Bases, essentially tells us how lucky teams have been by measuring the difference in the expected number of runs scored by a team based on its power (total bases), and base runners (hits/walks) and its actual number of runs scored. In Alvarez’s July start above he was a victim of poor sequencing, allowing his hits in bunches rather than spreading them out over the course of his start. For a more complete (and easier to understand) definition and some real world examples check out this and this.
What I will be attempting to do in this article is figure out a way to accurately estimate how many runs a pitcher should have allowed, and subsequently what his run average should look like, and then pinpoint certain pitchers who have been lucky or unlucky so far this season. Basically I am trying to normalize a pitcher’s RA by adjusting for sequencing and cluster luck.
Fortunately for me the heavy lifting for part one has already been done thanks to Dan Smyth. His metric, Base Runs (BsR), was developed and popularized in the early 1990′s and is an extraordinarily simple yet accurate way of estimating runs allowed using standard box score statistics. Base Runs for pitchers takes four inputs, innings pitched, hits, walks, and home runs, which are converted into four factors, A, B, C, and D. The final formula looks like A*B/(B+C)+D. For a lengthier piece on Base Runs, it’s properties, and it’s pros and cons consult this and this.
I took these statistics, including run average, for every pitcher in the majors through July 12th and figured his expected runs allowed by Base Runs, then converted it to Base Run Average or BsRA and took the difference between BsRA and his actual RA. I also calculated the pitchers’ RA- and BsRA- by taking the pitcher’s RA or BsRA and divided it by the league RA or BsRA (for reference the league RA is 4.14 and the league BsRA is 4.19). By taking the difference between the two, (BsRA-)-(RA-), we can figure out the percentage of extra runs compared to league average the pitcher should have allowed.
In the tables below you’ll see I’ve given this stat the name Luck%, a poor name admittedly since we’re dealing with percentages and I’m sure the differences aren’t completely due to luck but the name will have to do until I think of something better. For example Max Scherzer’s RA- is 80.92 (RA of 3.35/league RA of 4.14) meaning he has allowed runs at around 81% of the league average, but his BsRA- is 88.62 (BsRA of 3.71/league BsRA of 4.19) meaning he should have allowed runs at around 89% of the league average. We then get a Luck% of 88.62-80.92=7.71, so Scherzer should have allowed 7.71% more runs compared to league average, he has a Luck% of 7.71.
Whew. Now we can get to the names.
First the top ten qualified pitchers who have had their numbers most positively affected by cluster luck.
I like this list since it is very diverse. We have pitchers who have been pleasant surprises this season but who we all know aren’t really that good (Vargas and Simon). Older pitchers experiencing a late career resurgence (Beckett and Buehrle). Great pitchers (Greinke and Tanaka) and not so great pitchers (Chen). Hard throwing (Alvarez) and soft throwing (Young). High strikeout and low strikeout etc. etc. It’s good to see that not just one type of pitcher is affected giving me confidence that cluster luck does play a factor in a pitchers numbers to such a degree even this late in the season.
Now on to the top ten pitchers who have had their numbers most negatively affected by cluster luck.
|Jorge De La Rosa||102.2||4.91||4.32||103.2||118.6||-15.4|
This is a slightly less diverse list. Most of these guys are having disappointing seasons, but perhaps they haven’t been as bad as we think. Four of these guys have a below average RA, but an above average BsRA (or perfectly average in the case of Kuroda). Then there’s Anibal Sanchez who might just be one of the most underrated pitchers in baseball as his BsRA is seventh in all of baseball.
So what does Luck% end up telling us about a pitcher? We know that pitchers have little control over what happens after a ball is put in play, but what we’re doing here is figuring out which pitchers have been victimized by poor sequencing. Perhaps we can look at Luck% the same way we look at BABIP. If the measure is abnormally high compared to a pitcher’s career rate and the pitcher hasn’t made a substantial improvement in his mechanics or pitch repertoire perhaps some regression is in order.
So is Anibal Sanchez due for a spectacular second half? Maybe not. A myriad of factors could be influencing his low Luck%. We know that in general offense goes up when runners are on base and Sanchez could be especially susceptible to allowing runs to score in bunches. He has a slow move to the plate potentially allowing more runners to steal and get in scoring position. Perhaps his stuff is less effective from the stretch due to a breakdown in mechanics. Maybe he focuses too much attention the runners on base and not enough on the one at the plate, I really don’t know.
I only have half a season of data on 100 or so pitchers so obviously more research is needed. One could find the correlation between Luck% and peripheral stats such as K% and BB%, or find year to year correlations for Luck% to find out how much variation is actually luck and how much is skill. I’d definitely be intrigued by those results and I’ll likely revisit these numbers when the season ends.
I’m still relatively new to performing this kind of analysis so any constructive criticism would be greatly appreciated or if you’ve seen something like this done elsewhere on the internet. If you have suggestions for any improvements (especially the name) or further research I’d love to here it. If you think I majorly screwed up somehow I’d love to hear about too.