In the early 21st century, Oakland Athletics’ General Manager Billy Beane revolutionized baseball forever. He was the first general manager in baseball to heavily utilize sabermetrics in his baseball operations. This isn’t a history lesson though, I bring him up because of his idea that outs are precious, and as a hitter your goal is to not make out, thus him prioritizing OBP so heavily. In the following years, baseball statistics have seen phenomenal progress on both offense and for pitchers. While I believe FIP and xFIP are both very useful statistics in really measuring a pitcher’s skill, my problem is that they essentially ignore all the batted ball data that we have (GB%, FB%, LD%). SIERA and tERA have solved some of these problems, but are far from perfect, and I believe the more statistics we have, the better.
As I mentioned with Beane, while we largely focus on a hitter’s ability to not make out, we still don’t have a catch-all statistic to realize how effective pitchers are at getting batters out, because if the batter’s goal is to not make out, the pitcher’s goal is to get the batter out. So I present to you expected out percentage, or exOUT% (the name is certainly a work in progress). exOUT% sets out to answer a simple question: For any plate appearance, what is the likelihood that the pitcher will get the batter out? This can easily be found by just looking at a pitcher’s opponent OBP, but that is rather primitive, and we can get a better estimate by focusing more on pitchers’ skills to strike people out, not walk batters, and the type of contact they are giving up, and also trying to negate the effect of the defense by him, by just using league averages. So to calculate a pitcher’s exOUT%, I used K%, BB%, GB%, LD%, FB%, lFFB%, and 2014 league averages on ground balls, line drives, and fly outs. (HBPs are essentially ignored but can certainly be incorporated in a future version, this is pretty much exOUT% v1.0)
I want to give full disclosure, I am not a statistician or close to it. Math and statistics are an area of interest and I am currently pursuing a degree in math-economics, but I am far from a professional, so I recognize there are going to be errors in my data. This is an extremely rough version; there’s even a combination of data from this year and last year so there will be inconsistencies, as I don’t have the resources to gather all the data I need. If after reading this, you are interested in this and would like to take this further, please feel free to contact me if you have the skills necessary to advance this further (or even if you don’t).
I will first post a simple step-by-step breakdown of how to calculate exOUT%, and then get into more detail and take you through it with Clayton Kershaw, because well, he is awesome.
1- Add K% and BB%, subtract this percentage from 100%, this leaves you with a balls in play%, let’s just say BIP%
2- Multiply the pitcher’s GB% (make the percentage a number less than 1, for example 40% is .4) and BIP% (leave it between 1 and 100, ex 40%), this gives you a GB% for all PAs, not just balls in play, we’ll call this overall GB%, or oGB%… now multiply this percentage (in between 1 and 100) times the league average percentage of ground balls that don’t go for hits (league average is .239 on ground balls in 2014, so out percentage on ground balls is 76.1%, but make it .761…. this will give you a percentage you can leave between 1 and 100, if the number is 20%, that means that there’s a 20% chance that pitcher will induce a ground ball out that PA, assuming league average defense, we can assume this because we’re using the league average for batting average on groundballs… we’ll call this exgbOUT%
3- Now follow the same steps but with LD%, exldOUT%, the percentage chance for any given PA that the pitcher will produce a line drive out. (The league average on line drives last season was .685 (!) so that means there is a 31.5% chance a line drive will result in an out)
4- Same thing with FB%, sort of, because we also want to incorporate IFFB%. So multiply a pitcher’s FB% by their IFFB%, this gives you the percentage of balls in play that the pitcher produces an infield fly ball (bipIFFB%). Multiply this percentage by their BIP% to get his overall percentage of PAs that result in an infield fly, and this will also be their exiffbOUT%, because any infield fly ball should be converted to an out, and if not, it’s to no fault of the pitcher, so we won’t punish him. Next subtract a pitcher’s IFFB% from 1 or 100, whatever, and this is their balls in play percentage of fly balls that are normal fly balls, to the outfield. Multiply this number by their BIP%, this gives you the overall normal FB% for a pitcher, not just balls in play. Multiply this number by .793 (the league average on fly balls in is .207, so there’s a 79.3% that a fly ball will result in an out). This number is the percentage chance that for any given PA, the pitcher will produce a fly ball out to the outfield. Add this exnfbOUT% (n for normal) and his exiffbOUT% and you have his exfbOUT%, the percentage that for any given PA, the pitcher will produce a flyball out, to the infield or outfield.
5- Add K% + exgbOUT + exldOUT + exfbOUT
6- You have your exOUT%
The terms are not that technical or scientific so I don’t confuse anyone — I tried to simplify a very complicated procedure as much as possible. To clarify and give you an example, let’s go through Clayton Kershaw.
Kershaw profiles like this (I compiled this data on 8/21): 32.3 K%, 4.9 BB%, 52.8 GB%, 26 FB%, 11.8 IFFB%, 21.2 LD%.
So let’s look at the balls that don’t go in play, strikeouts and walks. Add the two and balls not in play percentage is 37.2, 4.9% are walks and thus won’t be an out, and 32.3% are strikeouts so will be an out. Thus far, Kershaw’s exOUT% is 32.3 (of a possible 37.2 so far)
Now let’s look at the balls in play. People will usually say that a pitcher can’t control what happens when a ball is in play, but I vehemently disagree, the type of contact the pitcher gives up can’t be ignored and largely effects what will happen to the ball in play. I will quote a FanGraphs article here to explain it, “Generally speaking, line drives go for hits most often, ground balls go for hits more often than fly balls, and fly balls are more productive than ground balls when they do go for hits (i.e. extra base hits). Additionally, infield fly balls are essentially strikeouts and almost never result in hits or runner advancement.” And FanGraphs also gives us this data from 2014.
GB: AVG- .239, ISO- .020, wOBA- .220
LD: AVG- .685, ISO- 190, wOBA- .684
FB: AVG- .207, ISO- .378, wOBA- .335
So this means that fly ball pitchers are most likely to get outs, although they may be less effective because when they don’t get outs, it’s more trouble than for ground ball pitchers. But remember, this statistic is just finding the chance that the pitcher will get a hitter out.
All right, so, let’s calculate Kershaw’s exgbOUT%, exldOUT%, and exfbOUT%; you can follow the numbers along with the steps I listed above.
62.8 x .528 = 33.1584
(33.1584 x .761)= 25.23354424 exgbOUT
62.8 x .212 = 13.3136
(13.3136 x .315) = 4.193784 exldOUT
26 x .118= 3.068 bipIFFB%
26 x .882= 22.932 (bipFB%)
62.8 x .22932= 14.401296 (onFB%)
14.401296 x .791= 11.3914251 exnfbOUT%
62.8 x .03068= 1.926704 oIFFB% and exiffbOUT%
exnfbOUT% + exiffbOUT% = 13.3469317 exfbOUT%, if you followed my math exactly a decimal may be off, like 13.31 something, but this is the number the excel doc chugged out, so I’m trusting that, my iPhone calculator can’t carry all the decimals sometimes.
Now add them all up
32.3 + 25.23354424 + 4.193784 + 1.926704 + 11.3914251 = 75.07%
K% + exgbOUT% + exldOUT% + exiffbOUT% + exnfbOUT% = exOUT%
The league average exOUT%, using league average statistics from 2014 for the ones involved, is 69.8%. Scherzer leads the majors (well the 89 pitchers I was able to export data from FanGraphs) with a 76.43 exOUT%. If you want to look at it as a more concise and better version of opponent OBP, his is .236, so, you know, good. Here is a picture of the data for the top 37 — the J column is what you are looking at. Betances is in their because I wanted to calculate one reliever.
All right, I’ve explained it a bit in the prologue, but now that you’ve seen it, let me explain more why I like this stat. Well first, I created it and calculated, so, well, yeah… but I also like this stat because it answers a very simple question “How good is a pitcher at getting people out?” Pitching in its simplest form, is exactly that, getting people out. The stat recognizes that there’s basically only these outcomes for an at bat: strikeout, walk, ground ball, line drive, and fly out, and looks at the pitcher’s stats in these categories to determine how many people he should be getting out. The stat is more predictive than evaluative in nature, because you can calculate a pitcher’s actual out percentage, but that doesn’t nearly tell the whole story, because a lot of luck is involved with balls in play, and other fluky outcomes.
This operates under the basis that a ground ball will perform the way the average ground ball does, a line drive performs the way an average line drive does, and a fly ball behaves the way a typical fly ball does. There could be guys getting very fortunate with ground balls: having a great infield behind them, balls not squeaking through the holes; with line drives: being hit right at people; and fly balls: staying in the park, having outfielders who cover a lot of ground. And there could be guys who are getting unlucky: the ground balls are getting through the holes, the infielders don’t have range; line drives seem like they are always going for hits, and fly balls are falling in. This says that a pitcher can’t control that, but they can control how much they strike out people, how much they walk people, and how often they give up ground balls, line drives, and fly balls, and if these balls in play behaved the way they should, the pitcher should be getting this percentage of people out.
I will address the flaws I have found with it. As much as getting people out is important, sometimes what happens in the plate appearances that don’t end in outs are almost as important. This only deals in batting average regarding balls in play, but wOBA is very important too. Fly balls are more likely to be outs than ground balls, but the wOBA on fly balls is over 100 points higher. Additionally, I’d prefer instead of ground balls, line drives, fly balls, to use soft contact, medium contact, hard contact, because that is a truer test of pitcher skill, however, I did not have this data at my disposal as far as league averages on what the batting average is for soft contact, medium contact, hard contact (if someone does, please contact me like I said). So what I have for now will do and this batted ball data is still a good measure. I set out to calculate what percentage of batters a pitcher should be getting out, and that is exactly what I found out. So while it’s not perfect, it has its use, and it’s something to build on.
And build on I did. While the out percentage is nice, it doesn’t give us a measure like ERA or FIP or xFIP, that tells us how many runs a pitcher should be giving up. So using the data I used to calculate exOUT%, I present to you exRP27 (expected runs per 27 outs, a stupid name for a hopefully not stupid stat).
The basis for this stat is this data from FanGraphs, “Line drives are death to pitchers, while ground balls are the best for a pitcher. In numerical terms, line drives produce 1.26 runs/out, fly balls produce 0.13 R/O, and ground balls produce only 0.05 R/O.” (I don’t know how this was calculated, or when it is accurate for, but this is what I got). We don’t know this for soft contact, medium contact, hard contact, so again I’m sticking with ground balls, line drives, and fly balls.
All right, so what I am going to do using this stat and the pitcher’s K%, BB%, GB%, LD%, and FB% is see how many runs the pitcher should be allowing over 27 outs, and then adjust it to get it on a scale similar to ERA, FIP, and xFIP.
Keeping Clayton Kershaw as our example, let’s take a look.
Kershaw’s K% is 32.3 — we’re multiplying this by 27 (for outs in a game), and we get 8.721 K’s, so 0 runs so far because a K will never produce a run
Now GB%. His exgbOUT% is 25.23354424, multiply this by 27 and we get 6.8 (ish, final number will be exact via the Excel doc). Multiply this by .05 (the runs per GB out he gets) and we get .34 runs.
LD%- his exldOUT% is 4.193784, multiply by 27 and get 1.13232168, and multiply this by 1.26 for LD runs/out and we get 1.43 runs
His exfbOUT% is 13.3181291, now multiply by 27 get 3.6 and then that by .13 and you get .47 runs
Add up all these exRUNS and Kershaw’s total is 2.24. However, we can’t stop here because the number of outs he’s recorded is only 20.3 (8.7+6.8+1.1+3.6) approximately. 20.3 is the rounded up total. So get this 20.3 (or whatever the pitcher’s exOUTS is) up to 27 by multiplying by whatever it takes, and then multiply his exRUNS by this same number. For Kershaw you end up with 2.97 exRP27. The league average would be 3.78. Last year’s average ERA/FIP/xFIP was 3.74, but when I adjust everything to that, everyone’s exRP27 just goes down slightly (Kershaw’s from 2.97 to 2.94), but I want it to be on a more realistic scale where everyone’s totals are lower and a really good exRP27 is comparable to a really good FIP, like in the low 2s.
So I don’t know what the statistic’s correct way is, but here is what I did to make it work. I calculated what his “ERA” would be using by multiplying his exRUNS by 9 and then dividing that by his exOUTS. His was .99, the league average was 1.26. I then did .99/1.26 to get .78 or so, I then multiplied that by his exRP27 and got 2.34. I felt like this was more realistic and in line with his ERA/FIP/xFIP. Obviously, can’t be the same because they measure different things, but just got in in the area. And the same is done for all pitchers. Obviously, not everyone gets multiplied by .78 of course. The league average remains 3.78, between last season and this season’s average for ERA/FIP/xFIP.
Here is the leaderboard for that (S column):
I really like this stat a lot, and feel like it does what I wanted to accomplish: figure out how many runs a pitcher should allow per 27 outs given his K%, BB%, GB%, LD%, FB%, and the notion that balls in play will behave the way they normally do, as anything else is likely luck and not indicative of the pitcher’s performance.
I look at Sonny Gray as someone this stat is perfect for. His ERA is outstanding at 2.04, but his FIP is 3.00, his xFIP is 3.47 and his SIERA is 3.50. The problem is, at least with FIP and xFIP for sure, is that they ignore what happens when the ball is in play. He doesn’t strike out too many people, he has a good BB% but not spectacular, and he’s given up 10 home runs, a fair amount, so this hurts his FIP and whatnot. However, instead of saying “well he will regress, look at his FIP/xFIP/SIERA” this looks at why he’s having this success, and it has to do with the balls in play, which is getting ignored. Gray’s LD% is just 14.6! That is really good! Second best of the 90 pitchers I did this for. And his GB% is 54%, 9th best, also really good. The pitcher does have control over the type of contact he allows, and the fact that Gray is producing a ton of ground balls, and very few line drives, is why he’s been so successful. His 2.34 exRP27 suggests that he has not been as good as his 2.04 ERA suggests, but he’s not as far off as the other stats suggest.
Obviously exRP27 is far from perfect, and is in no way supposed to replace FIP/xFIP/SIERA, but it is something to look at with them. I am a big believes in aggregation, so I think that averaging some combination of these 4 stats together or them all, is an even better way to evaluate a pitcher. We’ve got more data than ever, so it makes sense to use it, exRP27 and exOUT% are just more examples of utilizing this data to help better evaluate pitchers.
I hope you guys enjoyed. Any feedback please comment or contact me. Next I will be looking at exWOBA against for pitchers using similar data, and exWOBA for batters using the data but for hitters.