## Exploring Pitching Tendencies

When discussing why a certain pitcher is effective, baseball people will most often cite stuff and command. Since these two attributes are easily detectable from simply watching the game, it’s understandable why they are so heavily cited. One thing that is often overlooked in understanding the successes or failures of pitching is how pitcher’s attack hitters. Turn on any game and you’ll undoubtedly hear an announcer say something like, “I’ll tell you what Jim, that high and tight fastball is setting up a slider down and away,” or “that changeup slowed his bat down, let’s see him try and zip a fastball by him.” It always struck me as odd that if the guidelines for selecting what pitch to throw were that standardized, then I can hardly see how it would be a wise choice to use those guidelines. Outside of facing Mariano Rivera, if a hitter knows what pitch is coming, it is a distinct advantage.

One can approach this subject from dozens of different angles. The first question I set out to answer is: Do pitchers exhibit any obvious tendencies to use a certain pitch based on what they previously threw?

The most basic way to answer this question is to look at the distribution of pitches that immediately follows each individual pitch type. If “guidelines” for pitch selection did not exist, then we might expect that the distribution following each pitch type would match the distribution of total pitches thrown. For example, Justin Verlander threw 57% fastballs, 18% curveballs, 16% changeups and 8% sliders this year. If Verlander did not show tendencies in how he selected a pitch based on his previous pitch, then the distribution of pitches thrown after a fastball should match the 57%/18%/16%/8% distribution that Verlander exhibited over the full season. The same would hold true for pitches after curveballs, changeups, and sliders.

I looked at every pitch thrown by a pitcher with at least 100 innings in 2011 (excluding Tim Wakefield and R.A. Dickey) and tallied the individual percentages of this two-pitch window within each at bat. To be clear, this means that the first pitch of a new at bat was not attributed to the last pitch of the previous at bat. This is meaningful because we don’t want to introduce bias from facing two different batters. Since Pitch f/x is not perfect in its pitch recognition, I grouped all pitches into four categories to ease the amount of error. Four-seamers, two-seamers and sinkers were classified as fastballs. Screwballs and knucklecurves were grouped with regular curveballs. Splitters and forkballs were combined into changeups. Finally, cutters were grouped with sliders. I also corrected for pitch outs and other intentional balls.

The chart above shows these four groups of pitch types clustered together and the percentages at which they were thrown. The color of the bar inside of each cluster indicates when that pitch type was thrown. For example, the dark blue bars show the percentages of the associated pitch type being thrown over the entire season and the red bars show the percentages of that type being thrown after fastballs. If pitchers did not show tendencies based on the pitch they had just thrown, what we’d see is that each bar in each cluster would be of comparable height. What we do see is that if pitchers threw a slider, curveball, or changeup, they were about twice as likely to throw another slider, curveball or changeup, respectively, than under any other circumstance. To put it another way, pitchers really like to double-up on their breaking ball and off-speed offerings.

My best explanation of this would be that hitters are generally told that pitchers will mix up their pitch selection. Since pitchers know this, they try to exploit it by repeating the same off-speed offering that hitters have been trained to believe is rare. How would you interpret these results?

In following posts, I’ll take a look at the most interesting individual cases of this group of pitchers studied and see how different approaches relate to performance.

UPDATE:

The data above can be conceptualized as if every pitch in the study was thrown by one pitcher. This means that each pitcher represents just a section of the conglomerate pitcher’s season. Perhaps a better way to represent the data is shown below.

These plots were generated by taking each individual pitcher and finding the ratio between the percentage of a certain pitch he threw after the designated pitch and the percentage at which that pitch was thrown over his entire season. Referencing the example above, if Verlander’s distribution of pitches after fastballs matched the 57%/18%/16%/8% distribution he showed over the entire season, then the equivalent ratios for pitches after fastballs would be one for fastballs, one for sliders, one for curveballs and one for changeups. Therefore, the higher the ratio, the more frequently that pitch is thrown after the designated pitch. Each pitcher’s respective ratios were clustered by pitch and plotted in the same pitcher order.

If every pitcher showed identical distributions regardless of what he previously threw, all of the bars on each plot would be one. Again, we see that the majority of pitchers favor doubling-up on their off-speed offerings and show few tendencies to favor a pitch when working off a fastball.

Print This Post

From this chart we can clearly see that throwing batters consecutive sliders is equivalent to flipping them the bird.

I would interpret this as pitchers trying to exploit a weakness. Pitcher knows that the hitter is weak against sliders, so he throws him a lot of sliders, sometimes two or three in a row.

Sliders suck.

Signed,

Delmon Young

Signed,

Most Hitters in Baseball History

I think the enrichment in pitch type doubles comes from pitchers repertoire. Verlander doesn’t throw many sliders, so if they were randomly distributed, the slider-slider combo would be rare.

Similarly, pitchers who prefer their slider would be more likely to throw slider, slider than curve, curve. I think you could test this by looking at individual pitchers.

This.

I was shocked that the article didn’t address this.

Perhaps the batter in question is thought to be vulnerable to that pitch, so if at first he doesn’t succeed, try again.

I think there may be an issue with the magnitude of your findings, not all pitchers have four pitches, so a pitcher’s distribution after a particular offspeed pitch can only include the pitches that he knows how to throw. Thus, the curveball percentage after a curveball is a lot higher than usual, because (other than a fastball) it is the only pitch he is guarenteed to have. Lots of guys only have 3 pitches, and some guys only use 2.

Guys with 100 IP?

Agreed.

Agreed

Did you account for the fact that if a pitcher throws a slider, then we know that he for sure has that pitch in his arsenal? Not every pitcher throws each of the three off-speed pitches with regularity. I think this would introduce a bias to your results above.

My pony is too slow. Tyranitar got it first.

I’m with Jesse. That was my immediate interpretation. Most of the time pitchers pitch to a batter’s weakness(es) because that provides the highest chance of getting the batter out. Only occasionally will a pitcher pitch to a batter’s strength and then only because the pitcher is reasonably certain that the batter is not expecting it.

Wow, Josh, thanks for all the work. It was well worth it, as it lead to a surprising and interesting conclusion.

Isn’t the obvious explanation that your analysis failed to control for each pitcher’s mix of pitches? Take Carlos Marmol for example. He threw 64% sliders this year. If he had a 36/64 split regardless of sequence, we would expect a slider to follow a slider 64% of the time. We would also expect a change up or curveball to follow a slider 0% of the time simply because he doesn’t throw those pitches.

On the other end of the spectrum we have Roy Halladay who only throws fastballs (83% including fastballs, cutters & splitters) and curveballs (17%). If his sequencing is random, he would follow curveballs with curveballs 17% of the time but will follow curveballs with sliders or change ups 0% of the time.

Thus, pitchers who throw a lot of sliders are likely to follow sliders with sliders, and the same holds for any other pitch. Unless I am missing something, your analysis is skewed such that the probability of following any pitch with that same pitch is high for reasons completely different than the ones discussed in the text of the article. Your analysis would only work if all pitchers have the same mix of pitches and the same frequencies with which each pitch is thrown.

I would think that using all pitchers with over 100 IP would help balance this, and also exclude some relievers, (including Marmol) who only really use two pitches. But in any case, a pitcher who throws 64% sliders is going to be balanced by the dozens of pitchers with a more ‘regular’ repertoire. And since he is comparing the frequency of a pitch to the average of pitchers tendencies to throw it in any count, then a pitcher who throws tons of sliders is already accounted for in the average. If he caused a spike in the slider-slider column, it would mean he was throwing even more sliders than he would normally, after he threw a previous slider.

Also, he did say he’s going to look at individual cases next, since there’s always a risk of skewing the numbers. We’re just not at that point yet.

I’m open-minded about being convinced otherwise but I think your intuition on the math is incorrect. Allow me to try to clarify what I suspect is a bias in this analysis:

Pitchers that throw sliders are the *only* pitchers who can be included in a calculation of “how often does pitch __ follow a slider?” Since, by definition, all of those pitchers do throw sliders, it is likely that, for those pitchers (who, again, are the only ones included in the calculation of which pitches follow sliders) a slider will follow a slider. Many pitchers who throw sliders do not throw curveballs and therefore will never follow a slider with a curveball.

Again, if I am wrong I would love to hear a convincing argument why. I would bet that if this data were looked at for individual pitchers there would be far less of a tendency to see a disproportionate rate of consecutive same-pitch breaking balls, thus proving the bias in the current analysis.

But the problem with this is, there a lot of 3-pitch starters like Max Scherzer or AJ Burnett, that are essentially 2-pitch pitchers depending on the hand of the batter (FB-CB or FB-SL to righties, FB-CH to lefties).

So basically, even with starters you’re going to see a sample bias based on repertoire.

I agree, the underlying math may be incorrect and therefore any conclusion reached would not be properly supported.

Also, I’m not sure grouping pitches together is wise. A pitcher with a sinker, for example, would use his sinker differently than his fastball.

Keep in mind the way a pitcher attacks a line up, not a particular batter as well. Many pitchers tend to throw primarily fastballs the first time through a lineup and then mix in their off speed pitches the second and third time through. This may account for some of the distribution for the two pitch sequences.

Possible chance for bias (unsure how much). Some hitters are poor against certain pitches and will see more of them if scouted properly and pitched to reflect this. The variance appears much larger than I would expect from this effect.

Also, many pitchers throw first pitch fastballs. This may affect the data as well. It would be nice to see a distribution of pitch type for the first pitch to batters.

While this is interesting, I’m excited to see the individual cases since they will offer much more insight into the question you pose.

It might be that I throw my A+ twice in a row if I got to a two-strike count the first time.

I think if the pitch resulted in a ball or strike matters here too.

Agreed. I’d also love to see some analysis of pitch location, since it’s commonly asserted that pitchers will “change the batter’s eye level” from low to high or v.v., or throw at the outside corner after pushing the batter off the plate.

I also think location analysis will be more valid for pitchers with good control, since A.J. Burnett might be trying to throw an outside strike after an inside pitch, and just fail utterly to hit his location. In fact, Mariano Rivera might be a great subject for analysis by pure location, since his pitch type rarely varies and his control is usually excellent.

I kind of wonder if you’d be better off having some sort of Markov chain analysis of this? I mean, those things are basically built to model this sort of transition data. You could do a model of seeing the transitions between pitch selections. I would hazard to say that the batter faced is going to have a lot to do with pitch selection overall, though. So some data about them might also be valuable to modeling this.

If you just control for how likely the pitcher in question is to throw that particular pitch overall in a regression model, then you can predict what type of pitch they throw based on sequence, you can do this without a full markov chain analysis. So run a multinomial logit model with four pitch types as the dependent variables, and control for the pitcher’s overall % of each pitch type across the whole season and then throw in what they threw last. Those results are the ones that you are looking for. You could also look at whether different types of pitchers use different patterns.

This analysis IS a Markov chain analysis.

It estimates transition possibilities of a Markov chain where each pitch is only affected by the previous pitch.

Since there are more fastballs thrown, more sliders will still come after fastballs than come after sliders. What you have shown is a pitcher is more likely to use his best off speed offering twice in a row than go to a less effective off speed pitch.

Game theorist to the rescue:

http://cheaptalk.org/2011/10/19/serial-correlation-in-baseball-pitch-sequences/

The analysis needs to be done separately for vs L and vs R. Pitchers have vastly different pitch mixes vs the two sides. You need to compare how often a pitcher threw a pitch with how often he throws the pitch against that handedness.

No one has talked about how the count affects this analysis. Sliders, change-ups and curve-balls that get the pitcher to 0-2 or 1-2 could be very likely to be thrown again. Pitchers ahead in the count are likely to thrown “junk” out of the strike zone multiple times when they are ahead in the count. What about looking at the subset of pitches thrown at 0-0, 0-1, and 1-0 where it may be more likely to be random?