﻿ Beyond DIPS | The Hardball Times

# Beyond DIPS

The theory of Defense Independent Pitching Statistics can be traced to this famous line by Voros McCracken in Baseball Prospectus 2001:

There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.

This was backed up by year-to-year correlation testing as well as a healthy dose a common sense. Voros then proceeded to classify each event (walk, hit, strikeout, etc.) by whether or not it was influenced by defense. The Three True Outcomes (strikeouts, walks, home runs… and hit batsmen) were considered defense independent, while anything to do with hits were not.

Years later, after much recycling of Voros’ ideas and formulas, the basic theory of DIPS remains. From Tom Tango’s FIP, which only uses the Three True Outcomes and is so simple you can literally calculate it on the back of a napkin, to slightly more complicated formulas like Graham MacAree’s tRA (which uses batted ball numbers) and David Gassko’s Pitching Runs Created, only stats which are deemed independent of defense are considered.

However, DIPS, and all of its decedents, are only designed to model how many runs a pitcher would have given up if defense (and timing) were taken out of the equation. They are not meant to isolate the pitcher’s performance, only to eliminate the defensive performance. In my opinion, this a concept that is often overlooked by many-a-good analysts and dedicated sabermetric disciples out there.

To assume that the only stats that are out of a pitcher’s control are his hit rate on balls in play, otherwise known as BABIP, and the timing of his events is a mistake. Consider all of the factors that go into each pitch that are completely out of a pitcher’s control:

• The batter
• The umpire
• The defense
• The environment (ballpark, wind, etc)

In fact, the only thing a pitcher has complete control of is the inherent attributes of the pitch as it reaches the plate (and even then there are environmental factors to consider). For a long time, we’ve assumed that strikeouts, walks, batted ball numbers, etc. are very much in a pitcher’s control. When you think about that, it is a ludicrous assumption. The batter has at least equal control over the outcome of each pitch as the pitcher does, and the presence of the umpire makes it likely that the pitcher has less than 50 percent control over the outcome of each pitch.

With Pitch f/x data available now for over 1.5 million pitches in the majors since 2007, we can begin to investigate the degree of control that pitchers have over the outcomes of each pitch. My recent research on Jarrod Washburn and A.J. Burnett, suggests, in my opinion, that pitchers have far less control over even their defense-independent stats than we have thought. However, those studies were far from conclusive. Today, I hope to bring a bigger hammer to the table to try to prove my thesis.

### OK, it’s more of an allen wrench

First, let’s consider all pitches with the following characteristics:

• From a RHP to a RHH
• Between 91 and 93 MPH
• Between -3 and -7 inches of horizontal spin deflection (movement)
• Between 8 and 12 inches of vertical spin deflection (movement)
• Exploring Extreme Ballparks Past
A tour through some of baseball's most extreme climes.
• On a 2-1 count
• With the previous pitch type being another fastball
• With nobody on base

Since 2007, there have been 536 such pitches thrown in the major leagues that were captured by the Pitch f/x cameras. If you’ve read Josk Kalk’s two part series on the “Anatomy of a league average pitcher” (and if you haven’t, go read it now!), you’ll notice that this is a pretty generic pitch type in terms of it’s attributes (velocity and spin deflection) and it’s environment (count, batter/pitcher hand, baserunners).

Now let’s break those pitches down by their location in the strikezone. To do that, I first adjusted the vertical height of each pitch in the sample by the top and bottom strikezone estimates for each batter provided by the Gameday operators. I then split the strike zone into nine equal parts. As you can see on the right, while each of the zones are bigger than one might like, they are, at the very least, distinct.

Most importantly, using those zones allows for a comparison of pitches that are pretty homogeneous. They each have similar velocity, movement, environment, and location. There are obviously slight differences, but nothing too significant. Given that, you’d expect the outcomes of each of those pitches to be pretty static if pitchers really had a significant amount of control over the outcomes of each pitch. In other words, you’d expect little variation in what actually happens after the pitch leaves the pitcher’s hand. But is that really the case?

If you take a look at the pitches down and center, the dark blue ones on the batter graph, there were 34 pitches thrown. Three of them were balls, five of them were called strikes, none were swinging strikes, 10 were foul balls, zero were singles, three were doubles, zero were triples, three were home runs and 10 were in play outs. That’s a lot of variation. Just based on random chance, the pitch could either be, say, put in play for an out, or hit for a home run. The difference in value between those two plays is HUGE. Using John Walsh’s Run Values (helpfully cataloged here by THT’s own Harry Pavlidis), the difference between a home run and an out on a 2-1 count is 1.68 runs.

Of course, I cherry picked the one location vector with the most variance, but most of the other ones have a similar amount of variance. If you click that link, you can see the distribution of pitch outcomes based on location. The little white bars represent when the run value of the outcome becomes negative (good for the pitcher). If you check out the distribution of run values by location vector, the picture is pretty amazing:

As you can see, while the distributions all peak around -.1 run value, in almost all cases, there is a significant chance that the outcome of the pitch ends up being much worse, or better, than that for the pitcher. The only location vector with a decent amount of parity is the one for away and down (green). In more than 90 percent of the time, in this subset of pitches, the outcome is either a ball or a non-in play strike.

The problem with the above example is that I’m dealing with a very small subset of pitches, only 40-50 in each location vector. Common sense suggests that the outliers will even out over a large sample size, and similar pitches will have similar outcomes for the most part. However, that’s something we simply can’t investigate further at this point due to the sample size restrictions in creating homogeneous pitch types.

Another thing we can look at is called balls and strikes by the umpire. Josh Kalk, John Walsh and Jonothan Hale have already examined how accurate umpires are in great detail, and found that while they were generally very good, they did make their share of errors. Earlier this year, I took a look at how those errors were distributed over individual pitches, and found that there was a lot variance.

### So what now?

This study is hardly conclusive, but I think that, along with my previous two articles on Washburn and Burnett, it does show that pitchers have a lot less control over the outcomes of pitches than previously believed. Even stats that aren’t affected by defense at all are heavily affected by other factors, such as the batter and umpire. The next frontier of pitching stats, in my opinion, should be exploring ways to isolate the performance of pitchers from that of those other factors. As of now, the only way I can think of doing that is by using Pitch f/x data, and to look at the inherent qualities of each pitch and not care about the actual outcome. While any kind of comprehensive Pitch f/x pitching metric is far from being ready, I’d like to explore some possible ways in which one could be created.

## Binning

This involves identifying the key characteristics of each pitch, and creating a number of unique combination of such pitches. For example, you could look at all pitches thrown from a RHP to a RHH in a certain count with a given range of velocity, spin deflection and location coordinates, and figure out the average run value of such pitches. Repeat that for every combination of pitch attributes there are. This is similar to the methodology behind UZR, and is a very “clean” process. With enough data, this, in my opinion, would be an excellent way to develop a Pitch f/x based pitching metric.

The problem is that we simply don’t have enough data right now. There are simply too many combination’s of pitch attributes (count, batter hand, pitcher hand, speed, movement, location, previous pitch type) to get reliable samples for most of the bins. That leads to wacky things like, with all else being equal, a 91-93 MPH fastball being more effective than a 93-95 MPH fastball. Obviously that can’t be true, and it serves to show that we simply don’t have enough data to do comprehensive binning.

## Regression

This is a good method to use if you have sample size issues in some bins – which we do. Regression looks at the overall estimated relationship of datapoints, rather than each individual set of points. Regression usually does a pretty good job of neutralizing sample size oddities. Some very good work on regression and run values has been done Jeremy Greenhouse and Chris Moore at Baseball Analysts.

Jeremy used what’s known as a LOESS regression to estimate the expected run value of a given pitch given its velocity and movement. This is a very powerful technique, but it has its flaws as well. For one, the variables in Pitch f/x are often so random, following no discernible pattern, that a regression technique might not be able to give enough of a range in value. Furthermore, LOESS does not produce a closed form equation, making it very hard to apply to all pitchers. And like with the binning method, sample size oddities can skew the results of given pitch types – especially for lefty-on-lefty matchups, pitchers with strange release points and less populated counts. I’m not sure if LOESS is strong enough to compensate for all of that, but my inclination is that it is not.

## Other problems

Even if we had enough data for the binning method or enough flexibility for the regression method, there would still be massive problems in creating a Pitch f/x based pitching metric. For one, those methods would simply estimate the value of each pitch, not of the eventual outcome of each at-bat. Obviously, the way in which pitches are sequenced together has a huge effect on the eventual outcome of at-bats, and to treat each pitch as its own separate and independent entity would be a mistake. Furthermore, there are aspects of pitching that aren’t quantified by Pitch f/x data. Deception, which involves a player’s windup, his time to the plate, his glove, etc. are not going to be captured by a Pitch f/x based metric, and potentially have a big impact on the expected value of a pitch.

### Conclusion

I feel as though this article has raised more questions that it has answered. If you accept my thesis that pitchers only have limited control over what happens after the pitch leaves their hand and the experiments that I set up to prove that, you should accept that one of the biggest potential advancements in sabermetrics will be a Pitch f/x based pitching stat. The question is whether or not we will have to capacity to develop such a stat, and whether or not the additions over current DIPS metrics (the fact that it won’t rely on outcomes which are influenced by the batter and the umpire) will outweigh the subtractions (will have a tough time handling sequencing of pitches and other pitch attributes that aren’t captured by Pitch f/x).

I feel it will be a worthwhile endeavor for one to go about creating a Pitch f/x based pitching metric, and while I am no closer to creating one than when I started this article, I hope that it will be an important part of new research in 2010.

References & Resources
A great article written by Mike Fast in the beginning of 2009 suggested that pitchers have more control over the outcomes of balls in play than DIPS gave them credit for. He also used Pitch f/x data, but looked primarily at where the balls were put in play. That research doesn’t necessarily contradict my own; however, it’s obviously related, and Mike has a great summary of the DIPS timeline at the bottom of the post.

I’m still thinking of experiments like in the “OK, so it’s more of an allen wrench” section to set up to try to show how much variance is expected in the outcomes of pitches. I’ve been racking my brains for a couple of weeks, but I can’t really think of another way to approach that problem. If anyone has suggestions, feel free to sound off in the comments section or email me, and I’ll try to investigate them.

Something else that I feel I should have mentioned in the article, but don’t really have a place to put it, is that a Pitch f/x based metric could be amazingly useful for rookies or pitchers coming back from injuries. Given that a Pitch f/x based metric would take out almost all of the noise associated with traditional stats, the sample size requirements for predicting future performance would likely be far less than for the current stats out there.

Guest
Peter Jensen
Nick – I think your research is impeccable, and your charts are informative, but I fear that even if you had the amount of Pitch f/x data needed to adequately study this question that your answers wouldn’t be very informative.  You are basically asking how well a pitcher throws, not how well he pitches.  Pitching is a different skill than just the ability to throw well, although throwing well is a necessary component of pitching well.  But pitching well means adapting to the batters you are facing, the home umpire you have drawn and the park that you are pitching… Read more »
Guest
Scottwood

“The fact that it won’t rely on outcomes which are influenced by the batter and the umpire”

Essentially, we’d be creating a stat that tries to do what scouts are always supposed to do.  Make observations about a player without getting stuck on what his outcomes were. I like it.

Guest
Nick Steiner
Peter – Thanks for the kind words.  But pitching well means adapting to the batters you are facing, the home umpire you have drawn and the park that you are pitching in and still having good outcomes from the pitches that you throw I agree with this 100%, and it was something I had meant to mention in the “Other problems” section.  I agree that the actual outcomes include other information than just how good the pitch is – a ton of it (all the things you mentioned + sequencing + time to plate) – the question is whether or… Read more »
Guest
Nick Steiner

Scottwood – yes exactly!  It would be like a digitized, unbiased, form of scouting… that just might not pick up as much as a good scout will.

Guest
Peter Jensen
”…the question is whether or not you subtract more noise than skill by just looking at the attributes of the pitch.  I don’t really know the answer to that.  I’ve been trying to think of an experiment to set up to get the answer, but nothings come to me yet.” And I doubt you ever will.  It is not always possible to reduce real life events to experimental analysis.  And I really don’t like the use of “noise” to describe variations in results.  In this case you have variation caused by small sample size, variation caused by factors within the… Read more »
Guest
Jimbo
I agree with everything Peter said, and still think it is possible to quantify the “it” factor. Where the difficulty (and opportunity) seems to be is understanding how the DATA align with the GAME. For example, wouldn’t you want to ‘score’ a pitcher lower if his changeup release point is different from his fastball release point? That could be a data-driven aspect of measurement, but how it impacts pitcher quality depends on the actual experience of the game. Many people know that a great changeup throws off a hitter’s timing. The more they think a fastball is coming, the more… Read more »
Guest
Jimbo
Just read the umpire effectiveness article. Great piece! I’m with those who favor a computerized zone. I’m also with those who want the human umpire. Why not put a green light/red light indicator inside their mask? They’d be only ones to see it, and wouldn’t in any way replace them just free them up. Calls could be made just as quickly…by a human. I think the home plate ump has far too much impact on the teams competing. If he was “given” balls and strikes, he’d still need to monitor balks, batter position, call check swings, hit batters, and so… Read more »
Guest
Nick Steiner

So Peter, do you think something like Harry Pavlidis’ xRV100’s, adjusted for batter, umpire and ballpark would best thing that could currently be implemented?

Jimbo – I agree that it would take a lot more working with Pitch f/x to really understand how the matchups work, but I do think we are getting closer – slowly but surely.  Also, I like the fact that umpires make errors – it makes analyzing Pitch f/x more fun!

Guest
Peter Jensen

Nick – I don’t think it provides much useful information assign run values to individual pitches based on their characteristics.  It can be fun, but I don’t think it adds anything to our knowledge of a pitcher’s skill.  The only thing that counts about a pitches is how well hitters can hit them so looking at them in the aggregate is best.  I am writing an article that I will submit to THT soon about this.

Guest
Peter Jensen

“However, DIPS, and all of its decedents…

I hate to be the language police, or have a lot of pitching metrics died recently?

Guest
Nick Steiner

That should be descendants.

Guest
Nick Steiner

Peter – Harry’s xRV100 are like DIPS for each individual pitch.  They assign a run value to each pitch based on it’s count, and the outcome of the pitch (except that xRV100’s, unlike normal RV100’s substitute batted ball linear weights for actual outcome linear weights).  I personally think that is a nice middle ground between only using at bat outcomes, and not using any outcomes at all.

I’ll have to test out how RV100’s and xRV100’s predict future ERA compared to stats like FIP or tRA.

Guest
Peter Jensen

(except that xRV100’s, unlike normal RV100’s substitute batted ball linear weights for actual outcome linear weights).

I thought that was RV100E and that he only used the League Average Batted Ball linear weights for flies and liners.  Is xRV100 something different or is it a new name for RV100E?

Guest
Nick Steiner

Yes, xRV100 is referring to RV100E, my mistake.

Guest
Peter Jensen
“I don’t understand why MGL and Peter are so hung up on the art of pitching and how pitchf/x doesn’t capture this…” Jeremy – If that is the impression that you got from carefully reading my comments I am disappointed that I have not been able to communicate my position better. First, I am quite happy with Pitch f/x as it is.  It is a wonderful innovation and provides us with valuable data that we did not have before and could not obtain in any other way.  There have been many exciting and valuable studies done using this data. Second,… Read more »
Guest
Jeremy Greenhouse
MGL and Peter, thanks for clarifying. MGL, that is what I meant by smoothing. I guess if the zones are large enough, smoothing doesn’t matter. Nick, with the granularity of the pitchf/x data, of course we want to smooth it out. That’s all I’m trying to do with local regression. It takes all those bins you’re talking about, and tries to make sense of them. Peter, I apologize. I think I misrepresented you. I see what you’re saying about rv100E = xFIP. Knowing the rv100E/xFIP for the average 98 MPH fastball or curveball with 10 inches of drop I find… Read more »
Guest
Harry Pavlidis

All batted ball results in rv100E are based on league averages

Guest
MGL
We’ve discussed this before, but the problem with focusing on the characteristics of the pitch rather than the outcome, to determine the value of each pitch, and hence, all pitches, for any particular pitcher is this: 1) That leaves out “deception.”  Apparently, for example, an 87 mph fastball from Sid Fernandez was a lot more effective than the same 87 mph fastball thrown by a generic pitcher, presumably because El Sids’ delivery was so funky and he hid the ball so well. 2) The percentage of each type of pitch thrown (in that particular situation) HUGELY influences the value of… Read more »
Guest
Nick Steiner
I mentioned deception in the original article MGL: Furthermore, there are aspects of pitching that aren’t quantified by Pitch f/x data. Deception, which involves a player’s windup, his time to the plate, his glove, etc. are not going to be captured by a Pitch f/x based metric, and potentially have a big impact on the expected value of a pitch. And I agree that the game theory aspect, and “pitching to the batter and umpire” things that Peter brought up, are also going to potentially have a huge impact on the value of individual pitches.  The question, which I mentioned… Read more »
Guest
Jeremy Greenhouse
I don’t understand why MGL and Peter are so hung up on the art of pitching and how pitchf/x doesn’t capture this and that. Pitchf/x is what it is, and nothing is going to give you a perfect metric, but that shouldn’t stop anyone from trying to use bits and pieces of pitchf/x data to capture bits and pieces of pitching ability. Not everything has to be all-encompassing ERA. We can have strikeout rate and walk rate and fastball velocity and pitch type run values and all that stuff to paint a broader picture. Nick, I don’t follow your argument… Read more »
Guest
RZ

Nice article Nick.

Somehow I think of this when finding the perfect pitching machine:nanomchines.

Guest
MGL
Jeremy, if you mean “smoothing” with respect to the batted ball values and the “catch rates” for the various zones, the answer is, “no.”  I could have used some kind of “smoothing” function or whatever you want to call it, and that would have been better, and would have allowed me to use more “zones” or no zones at all (if I created a function for the catch rates and batted ball values based on the x/y coordinates of each type of batted ball), but for various reasons I did not. I used fairly large zones such that smoothing was… Read more »
Guest
Nick Steiner

Jeremy – I don’t know nearly as much about LOESS smoothing as you and Dave Allen do.  I would appreciate your thoughts about how well it could be used for a Pitch f/x based metric, and if it would be better or worse than the binning method or some other method.

Guest
MGL
There is actually one “level” deeper that you can go, in terms of trying to isolate luck from skill: That is the actual location of each pitch as compared to the intended location.  For example, let’s say that a pitcher is intending to throw a certain pitch low and away, based on the location of the catcher and/or his mitt, which we can get from watching the video, or from the “super” pitch f/x which includes the location of the target (the catcher and/or his mitt).  Now, his actual location is going to be some scatter-plot surrounding that intended location,… Read more »