## Quantifying “Good” and “Bad” Pitches

I found Jeff’s recent post on Jake Arrieta fascinating, because he goes into a game and pulls out Arrieta’s eight worst pitches from that game. This is something I’d never really thought deeply about before. We all know what bad pitches look like, right? An 0-2 fastball down the heart of the plate, a hanging slider, a pitch in the dirt on a full count, sure. But can we quantify this? Is there a way to say mathematically (in a way that makes some sort of sense) whether one pitch was better than another? Follow me beyond the jump and I’ll share some thoughts about how we might do this.

Consider a batter who has just stepped to the plate. Can we talk about his “expected OPS” in this situation? Sure we can — we can say, on average, what a major league baseball player’s OPS will be, so we can assign that to the player. The league average OPS this year is somewhere around .700, so we can say that we “expect” a hitter to have a .700 OPS in that plate appearance (which is of course impossible — but this is “on average”, remember). Now suppose we have that same hitter in a 1-0 count. This tips the scales a bit toward the hitter, and we can find out what the expected OPS is in this situation; let’s say it’s something like .720. On the other hand, the hitter might get into a 0-1 hole, which would lower the expected OPS to something like .650.

The point is, at each stage, we can assign what we might like to call a prior probability on what the batter is going to do in a given at-bat, and then when new information comes in, we update that probability. More to the point of this article, the pitcher is making good pitches *if he is driving this expected OPS down*.

Let’s now fix the parameters of a pitch. We have the handedness of the batter and the pitcher, the type of pitch thrown, and where the pitch ended up. The first two are known to the pitcher and he can control (to some extent) the last two. How do we determine if it was a good pitch? Here are the basic parameters of what I’m proposing:

- Take the expected OPS of the situation before the pitch (prior expected OPS)
- Determine the probability p_i of the possible outcomes of the pitch
- Determine the expected OPS o_i of the batter given each possible outcome
- Take the sum over p_i * o_i for all i, and call it the updated (posterior expected) OPS.
- Assign to the pitch a value equal to the prior expected OPS minus the posterior expected OPS.

This might sound complicated, so let’s take an example. Suppose we have a righty pitcher and a righty hitter, and a 0-0 count. Let’s assign the hitter the .700 prior OPS from above (again, the actual number will be different once we actually run the numbers). Now let’s say the pitch was a fastball more or less right down the middle. For this pitch and any other, there are basically five possible outcomes — assume for this pitch we have calculated their probabilities as follows:

- Called or swinging strike: 40%
- Ball in play: 30%
- Foul ball: 20%
- Ball: 10%
- HBP: 0%

If the pitch is a strike (or fouled off) the batter will be in a 0-1 count, meaning the expected OPS will be .650 by the approximation above. If it’s a ball the count will be 0-1, raising the expected OPS to .720. If the ball is in play — I haven’t gone over this case yet, but we should be able to calculate the expected OPS of this particular ball in play from historical data — let’s say the expected OPS is .900. This would result in a posterior expected OPS of

0.4 * .650 + 0.3 * .900 + 0.2 * .650 + 0.1 * .720 = 0.732

Since the posterior expected OPS is greater than the prior OPS, this was not a particularly good pitch, and it gets a value of 0.700 – 0.732 = -0.032.

All right, this is the framework of what I am proposing. As written above, I don’t even think it would take too long to run these numbers and to start quantifying good and bad pitches. But before I do, I need to ask some questions to the sages here at FanGraphs, such as:

- Has this already been done?
- Is this even worth doing?
- OPS is not the best stat for doing this with. Something along the lines of linear weights would be better. Suggestions?
- What should I consider when making these models?
- Is anyone interested in helping with R coding or Pitch/FX data for this?

Thanks for reading.

Print This Post

This seems to measure how good a pitch is based on its result, which is only one part of the equation, since the batter has a big say in such things. Could something be done about that, maybe classifying a pitch based on the average batter’s OPS for that particular pitch type, location, and count? (You could still use the prior/posterior approach, but just with the average batting event.)

Could something be done about that, maybe classifying a pitch based on the average batter’s OPS for that particular pitch type, location, and count?Yes, that was exactly the plan. Sorry if I wasn’t clear on that point.

Bad news – it would be tough to truly evaluate any given pitch without factoring in the effect of sequencing. The belt-high fastball right over the plate might be an ok pitch if it follows two straight off-speed pitches down and away, but it would be gravy if it followed another similar fastball.

Good news – folks have been looking at a way to evaluate sequencing, and this might be the way to do it. If you customize the prior expected OPS (or wOBA or whatever) to the hitter’s expected outcome for ANY pitch that might be thrown, then you can create a value for that pitch using your approach (as modified by Steven above).

However, you could also look back to the pitches that were thrown in any given plate appearance, and see what the average value of that particular pitch (type of pitch, location, velocity) was for that hitter for ALL his plate appearances, and compare the result of the plate appearance to the sum of the expected “pitch values” for the actual pitches that were thrown. The difference would be the value of “sequencing”, planned or accidental, for that plate appearance.

You’re a bright guy, tz. You don’t happen to live in St. Paul, Minnesota — first name Tim — by any chance?

It’s true that this approach completely discounts the value of sequencing. While I agree that sequencing is important, I think it is more of a marginal value added to a pitch than the main driving value behind the pitch. In other words, I feel that a bad pitch is a bad pitch, whether or not it’s set up.

Here’s an example where we might see a big difference in these approaches. Let’s say a pitcher quickly gets ahead of a hitter 0-2, then throws a fastball up and in to set up the slider down and away. This approach will likely think that the 0-2 pitch was not a good pitch, as it brought the count from a very favorable one to a slightly less favorable one. The pitch sequencing argument is that the marginal value added from setting up the slider offsets the value lost from going to a less favorable count. In any case, I think it’s close enough that it’s worth following through with this approach and then adding on sequencing later as an additional parameter.

How do you then address the decision of the next pitch in the sequencing? Would you not have to look back then at the previous pitch to determine if the pitch was properly sequenced, dynamically effecting the value of the prior.

For example, lets say Clayton Kershaw throws an 0-2 fastball up to change the batter’s eye level. He then comes back with a fastball intended to be down but stays up belt high, as opposed to burying a slider or curve. Does the effectiveness of the “waste pitch” change in the context of what the next pitch in the sequence was?

Just an idea, but a very insightful analysis you bring up.

I actually started looking at this the other day. I was experimenting with pitchf/x data in SQL and making tables for the frequency of each outcome (Ball, Foul, In play, Swinging Strike, Called Strike, HBP) for specific parameters such as pitch type, speed, break, and location. I didn’t get very far, but it sounds similar to what you are doing. I’d definitely be willing to help with this.

Cool! The main thing I’m worried about is what the model will think for HBP. For instance, it may think a pitch 10 feet inside has a high chance of being a HBP (whereas in reality of course it would be way behind the hitter). If you have more pitch f/x data than I do (this is very likely) then just adding that would be very helpful indeed.

This reminds me of that Pitch Values section, except more specific, with pitch location. So I definitely think it is worth pursuing, perhaps with wOBA over OPS. Incorporating context as other commenters have mentioned will make it a truer measure of a pitch’s quality, but this would be interesting nonetheless.

wOBA is good. It has to be something that corresponds better to actual runs than OPS, for sure.

As far as comparing it to Pitch Values … yes and no. The approach as I’ve outlined it above doesn’t differentiate between, say, a Yu Darvish slider and an Edwin Jackson slider, as long as they’re in the same spot in the same count. It may be worth adding in information about pitch speed and break but I’m worried the data will get awfully thin. On the other hand, if we’re only going to use this to judge the quality of pitches by a single pitcher, then this information isn’t all that important …