Moving past DIPS

Defense Independent Pitching Stats basically states that pitchers had little control over what happened to balls that were put in play. Since Voros McCracken proposed it, there has been much controversy about the degree to which pitchers actually can control BIPs by doing things like throwing knuckleballs and inducing pop-ups. However, less attention has been paid to how DIPS has been applied and translated into DIPS ERA and later, FIP.

What the past decade of debate has shown is that there is actually very little variation between pitchers and their ability to control balls in play. For example, Tim Wakefield, the premier knuckleballer of our age, has a career BABIP of .275, compared to the average of (roughly) .290. In 8,933 career BIP, that works out to:

(.290-.275) * 8933 = 133.995

That gives us about a quarter of a hit per game, or 7.9 hits a season, for a knuckleballer like Wakefield. For his career, if you estimate about .7 runs per hit, Wakefield’s ability to prevent hits on balls in play has lowered his ERA by roughly, oh, .30. (This is all back-of-the-envelope math meant mostly for illustration; we could obviously do a lot better than this but it works for a rough estimate.)

Which, of course, isn’t nothing. But Wakefield is one of the most extreme examples of a pitcher being able to affect his BABIP. However, the variation between his ability to control BIP and the average pitcher’s could be small compared to issues in translating BABIP to runs .

Just along for the ride

It’s important to recognize what we bought into along with DIPS theory. McCracken at the time published a method of estimated a pitcher’s defense-independent ERA that used reconstructing a pitcher’s batting line using the league-average BABIP in place of his own, and then using Extrapolated Runs to convert that into runs.

Voros and others have revised this work numerous times; the simplest and most popular implementation of a DIPS measure of estimated ERA is FIP.

What we all bought into along with McCracken’s theory on defense was, essentially, Bill James’ Component ERA, typically abbreviated as ERC.

Almost every DIPS-like measure of performance has resorted to the use of some sort of component ERA to figure out a pitcher’s defense independent performance. For some reason, most of the controversy surrounding the use of DIPS has focused around Voros’s conclusions on balls in play, when really it’s the use of component ERAs that warrants further examination. So what problems have we (largely without realizing or considering it) brought into our DIPS analysis with component ERAs?

Linearity.
Most (but not all) component ERAs are linear. (ERC itself isn’t, I should note.) FIP is certainly linear. What I mean is this: FIP treats, for instance, a home run allowed by Pedro Martinez as having the same run value as one allowed by Glendon Rusch. This simply isn’t the case; Pedro allows fewer baserunners and therefor fewer runs per home run. This artificially “caps” the high and low end of the FIP range as smaller than the actual range of performance of major league pitchers.
Situational pitching.
Guy bad at pitching out of the stretch? Not accounted for. Able to dial up his fastball a bit and get an extra strikeout in a crucial situation? Not included. In other words, component ERA measures treat pitchers as though they all approach situations exactly the same.
Sequencing.
In real life, it matters if a guy gives up a walk before a homer or a homer before a walk; in component ERA measures they all look the same.

One could of course argue that there is an element of “luck” (or for you pedants out there, “observed variation around an estimated level of true talent performance”) to sequencing and situational pitching. But that notion has essentially come along for the ride with DIPS theory, and there’s nothing in McCracken’s research to suggest that they’re any more subject to “luck” than strikeout rates or walk rates or home run rates.

Pitching to the situation

Let’s examine one aspect of pitching not addressed by DIPS theory and ignored by component ERAs: situational pitching. We’re going to study performance between 1989 and 1999, which is the longest period for which freely-available play-by-play data is detailed enough for this kind of study. Let’s look at the league averages to start:

RUNNERS
PA
BB
K
HR
GB
FB
LD
PU
EMPTY
1002156
0.09
0.16
0.03
0.45
0.25
0.21
0.09
FIRST
519630
0.08
0.15
0.03
0.47
0.24
0.20
0.08
LOAD
44236
0.07
0.17
0.03
0.45
0.26
0.20
0.09
SCORE
262142
0.17
0.16
0.02
0.48
0.24
0.20
0.09
ALL
1828164
0.10
0.16
0.02
0.46
0.25
0.20
0.09

For the sake of clarity I have compressed the 24 distinct base-out states into only four states, ignoring the number of outs completely. This is for illustrative purposes, and may not reflect the correct way to group these for substantive analysis.

The first column represents the runners on base:

  • Empty means there are no baserunners.
  • Loaded means, well, the bases are loaded.
  • First indicates any situation where there is a runner on first except for when the bases are loaded.
  • Score indicates any other situation – that is, when there are runners in scoring position but first base is open.
  • All refers to all situations.

For these purposes I have included hit by pitch and intentional walks in BB; BB, K and HR are per plate appearance and the batted ball types (ground balls, fly balls, line drives and popups) are per batted ball. Now, for the interests of clarity, let’s look at the figures divided by the overall average—in other words, the relative difference between what a pitcher does in that situation compared to what he does in all situations:

RUNNERS
PA
BB
K
HR
GB
FB
LD
PU
EMPTY
1002156
0.91
1.03
1.04
0.98
1.02
1.01
1.01
FIRST
519630
0.82
0.93
1.03
1.02
0.97
0.99
0.98
LOAD
44236
0.78
1.05
1.05
0.97
1.04
1.01
1.06
SCORE
262142
1.74
1.01
0.79
1.03
0.96
0.97
1.00

That should make it clearer as to how a pitcher (and the hitters he faces, it should be noted) change approach based on the situation. Look at the dramatic differences in walk rate, for instance, especially with the bases loaded.

Now let’s look at some individual pitchers, to see how they might approach situations differently. I have put the actual rates for these pitchers in the “All” group; the rest of the figures are the pitcher’s performance relative to himself, not the league. That’s an important distinction. And again, this is only from 1989 to 1999. Also, a caution—this is exploratory analysis, just a casual stroll through the data. Please don’t become too attached to one particular data point or one particular pitcher.

Homestretch: The 1967 AL Pennant Race, Part 3
A tight race shows no signs of letting up.

Let’s start with Pedro Martinez:

RUNNERS
PA
BB
K
HR
GB
FB
LD
PU
EMPTY
3361
0.98
1.00
1.14
0.96
1.04
0.99
1.08
FIRST
1362
0.79
0.94
0.56
1.06
0.93
1.05
0.84
LOAD
85
1.11
1.34
2.42
1.02
0.78
1.59
0.40
SCORE
669
1.51
1.08
0.99
1.09
0.96
0.87
0.99
ALL
5477
0.08
0.28
0.02
0.43
0.26
0.20
0.11

There’s certainly a lot of variance here, compared to what we saw for the league average. Some of this is obviously noise, especially in the loaded group; there were only 85 plate appearances in the sample with the bags juiced. But there are still some interesting things going on here. One thing to note is that Pedro seems to be getting a lot more grounders with runners on. This is important; let’s compare the value of the various types of outs when there are runners on or not:

EVENT EMPTY MEN_ON
Strikeout -0.18 -0.42
Flyout -0.19 -0.34
Groundout -0.19 -0.41
Lineout -0.19 -0.38
Popup -0.19 -0.42

With the bases empty, it simply doesn’t matter what sort of an out you get; when you start adding baserunners, it starts to matter a great deal what kind of outs you are getting. If Pedro is getting more of his ground balls with men on, then his ground balls are more valuable than the average pitchers’ because he is getting them at more opportune moments.

Now let’s compare Pedro to a very different sort of pitcher; a guy who doesn’t get a lot of strikeouts, like Kirk Rueter:

RUNNERS
PA
BB
K
HR
GB
FB
LD
PU
EMPTY
2215
0.82
0.98
0.99
0.97
1.02
1.05
0.99
FIRST
1045
0.88
1.01
1.29
1.03
1.01
0.95
0.92
LOAD
57
0.78
1.70
0.64
0.98
1.44
0.60
0.75
SCORE
446
2.23
0.96
0.41
1.09
0.82
0.89
1.29
ALL
3763
0.07
0.12
0.03
0.46
0.25
0.20
0.09

Again, we can conclude too much from sample data. But it certainly looks like Reuter likes to pitch around guys when he has runners on and first base open. And he goes looking for more groundouts and popups in those situations as well. As we just established, outs that are much more valuable than others when men are on base.

What we need to do now is answer two key questions:

  • How much of situational pitching is skill?
  • How can we use a pitcher’s situational tendencies to predict his ERA?

That, unfortunately, will have to wait.

References & Resources

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org“.

I’ve studied the accuracy of various component ERA systems before, here and here.

Since I know I’ll be asked, here’s Greg Maddux:

RUNNERS
PA
BB
K
HR
GB
FB
LD
PU
EMPTY
6356
0.82
1.06
0.90
1.01
0.94
1.03
0.95
FIRST
2619
0.65
0.87
1.27
0.98
1.08
0.98
1.00
LOAD
150
0.72
1.00
1.71
0.91
1.23
0.82
1.85
SCORE
1438
2.47
0.99
0.89
0.98
1.11
0.93
1.16
ALL
10563
0.06
0.18
0.01
0.60
0.16
0.19
0.06


Print This Post
Sort by:   newest | oldest | most voted
Nutlaw
Guest
Nutlaw

I’m quite intrigued by this approach, Colin. Great read.

Spencer Hamblen
Guest
Spencer Hamblen

Why doesn’t the inclusion of IBB throw the numbers off, especially since they only (generally) occur in one of the 4 situations?

Peter Jensen
Guest
Peter Jensen
Colin – This is a wonderful example of how the presentation of data can lead to false assumptions. One thing to note is that Pedro seems to be getting a lot more grounders with runners on. This is important;… But do the numbers really show that Pedro is getting a lot more grounders with runs on?  His overall GB rate is .43 per hit ball, the league’s overall rate is .46 per hit ball.  So all that the data is showing is that Pedro gets ground balls at about the league average rate per hit ball when there are men… Read more »
Nick
Guest
Nick

This is very interesting Colin.  I have always thought that a pitcher’s RA more accurately reflects his ability over a *very* large sample size than defense independent statistics, however, most of the time there is too much noise.  I would love to see an attempt to smooth that out, and I can’t think of anyone better than you do do so!

On a side note, I always thought that FIP was at least somewhat dynamic.  Doesn’t the presence of K’s and BB’s as a direct modifier of HR’s in the formula, at least imply some attempt to create a non-linear model?

Colin Wyers
Guest
Colin Wyers

Nick, if you review the FIP equation again, you should notice that the K, BB and HR terms don’t interact with each other at all; they’re purely additive with each other. You could put in 0 walks or 1,000 walks, and the value of a HR in FIP won’t budge an inch.

And thanks for the kind words; I’ll try not to disappoint.

Voros McCracken
Guest
Voros McCracken
Colin Wyers
Guest
Colin Wyers
Peter, I really don’t think anything you say disproves the basic conclusion. Let’s use Maddux’s walk rate for a second. The value of a walk differs significantly depending on whether or not there is a runner on first. Let’s use this chart just as an example: http://www.tangotiger.net/customlwts.html The 5 RPG chart should be close enough for illustration purposes. A walk is worth 0.327, compared to an IBB’s worth of 0.198 or a HBP’s value of 0.353. For the sake of illustration right now, let’s say that a walk with a man on first is worth .15 more runs than a… Read more »
Peter Jensen
Guest
Peter Jensen
Colin – Now you have totally confused me.  I don’t even see a Wakefield example and I never referred to Maddux.  I also read the article again and I can’t find the “basic conclusion” that you are saying i am trying to disprove.  So please state it again for dumb old me.  And why are you throwing linear weights at me when nothing I commented on had anything to do with linear weights.  But now that you bring it up, I am not sure why you have a chart of linear weights for different types of outs.  If your thesis… Read more »
Dave Evans
Guest
Dave Evans

I also think unearned runs should be accounted for. A strikeout pitcher does not let his defense commit as many errors as a contact pitchers. I would like to see how many less unearned runs a high K high FB pitcher(thus few balls in play)gives up compared to a low k high bb high GB pitcher.

Micke MCd
Guest
Micke MCd

Fight!  Fight!  Fight!

Jon
Guest
Jon

Voros, how do you figure out what A, B, C, and D are in that link?  Do you multiple the various events by the constants in the grid?  I’m guessing that’s the case, but it isn’t clear.

David
Guest
David
While DIPS and FIP might have this issue, there’s one major issue that I’ve always had with FIP – and many other pitching metrics. That is, they factor in K/9 IP rather than K/PA.  This allows pitchers who allow more BIP to have the same FIP ERA than do pitchers with more K/PA.  Conversely, ‘Prospectus’ has a stat (which they don’t utilize regularly, for some odd reason), called QERA.  All the stat does is look at GB%, K/PA, and BB/PA.  Many of the same factors as FIP, DIPS, and so many others, except they calculate it per plate appearance. (If… Read more »
Matthew Cornwell
Guest
Matthew Cornwell

Colin, you mention that Madux should be credited an extra 9 runs prevented (vs. average?) from 1989-1999 becasue of the disporportionate amount of his walks being given up in lower run-costing situations. 

Are you claiming this is significant?

Because, he prevented over 30 runs vs. average over the same period due to preventing hits on BIP(compared to his teamates)- something you seem to be downplaying earlier.

Kampfer
Guest
Kampfer

With a sufficient sample size, situational pitching should be neutralized.

Adam Guttridge
Guest
Adam Guttridge
Colin, This is good work… even if it doesn’t end with solid, actionable conclusions, exploring the nuances can lead to insight that makes progress. As per linearity, I make a pretty simple ajdustment for this; when calculating WAR, give pitchers credit for the fact that they influence their own run environment. It’s like non-SB/CS baserunning when evaluating a hitter; it never makes a huge difference (5-6 runs in the extreme cases), but it’s certainly worth accouting for, and for a tiny minority of players, the difference is significant. And I’d strongly suspect that with the 3 effects you mention, linearity… Read more »
Matthew Cornwell
Guest
Matthew Cornwell
Written in Jan. 2007 I looked at every Hall of Fame pitcher that had available splits as well as 15 guys not eligable or not elected (John, Kaat, Blyleven, Morris,Tiant, Stieb, Cone, Maddux, Clemens, Glavine, Johnson, Martinez, K.Brown, Schilling, and Mussina) to see how many of them had allowed more BB’s with men on than with nobody on.  Here is what I found… There were only 7 guys who gave up more walks with runners on base than without (and their % more of BB’s w/RO)… Robin Roberts 28% Tom Glavine 25% Curt Schilling 15% Greg Maddux 14% Tommy John… Read more »
wpDiscuz