Curveball command

One night in Brooklyn (Johnny Sain) threw 32 straight curveballs from 32 different directions: underhand, overhand, sidearm, three-quarters, behind his back almost. The whole world knew what was coming. Didn’t matter. We haven’t hit one of them yet. He could drop a curve in a coffee mug.
Rex Barney

Several PITCHf/x analysts have tackled the issue of measuring pitchers’ ability to locate the ball where they intend.
Mike Fast in his essay on Cliff Lee‘s 2008 turnaround (see the 2009 Hardball Times Annual) wrote that “with a little extra work … we can come close, at least qualitatively if not quantitatively, to assessing a pitcher’s command of his pitches.”
Dave Allen explored the matter by looking at charts of Mariano Rivera‘s cutter. Jeremy Greenhouse first approached the topic last June and published a list of the best at locating pitches in the 2011 Hardball Times Annual.

One issue making the road toward the perfect measure of command difficult is that we can’t know the intended target of the pitch, and I believe we will never be able to do that. We’ll get close when (if?) we have data on catchers’ glove positioning (see Nick Steiner’s follow-up on Jeremy’s article in the Annual), but that won’t be the end-all-debates solution.
Many catchers nowadays move until the last moment to keep hitters from peeking at their position. And, especially on breaking pitches, the position of the glove might just be a reference point; i.e., the pitcher aims at that target but to obtain the desired effect of delivering the ball to a different spot (lower, for example, on a curve).

Thus, data on catchers’ glove positioning may be at best an acceptable proxy of intended location (as Mike Fast mentioned in the past, if tracking the backstoppers’ feet is an easier task, that would work the same, if not better).

However, all those difficulties do not entirely prevent us from measuring command. In the following lines I’ll try to add my contribution, which is heavily based on the excellent work of those who preceded me.

Let’s start with a chart, showing John Lackey‘s curveball locations (data from 2009 and 2010).
image

As Harry Pavlidis first noted, the scatter plot of pitch locations of a given pitcher often reveals the arm angle of that pitcher.

Here’s another one, relative to Matt Cain‘s curves.
image

One difference between Lackey’s and Cain’s charts is that the latter is—well, fatter. If we suppose Cain doesn’t throw his curves around on purpose, the fatter scatter plot is an indication of a lower ability to hit the spots.

Mike Fast, while working on the 2009 Annual article, noticed “(Cliff) Lee threw each pitch type to only one location, almost without fail.” Curveballs have not been chosen at random for this example. While many of the other pitch types are more effective when delivered on the black, what’s important for deuces is vertical location. You can see that in one of my first articles.
Thus the assumption that pitchers do not intentionally fatten their location scatter plot on curveballs can be accepted for the moment.

How can we measure pitchers’ ability to hit their spots? Ideally we need to rotate the scatter plot so that the dots are somewhat vertically aligned, instead of roughly aligned along the delivery angle. In this way the rotated reference system would shift from horizontal location vs. vertical location to location perpendicular to the delivery angle vs. location parallel to the delivery angle. A measure of the variation along the new horizontal axis (for example the standard deviation) would capture the lateral command.

Let’s go step by step.

1. Rotation

Principal Components Analysis (PCA) is an advanced statistical method. It consists of a linear transformation of the data that chooses a new coordinate system (a rotation of the original axes) for the data set such that the new axes coincide with directions of maximum variation of the original observations.

Okay, let’s get ungeeky. Given the above Lackey chart, PCA finds a new axis as shown below.

image

I hope the above image is worth more than the 50 preceding words.

If the new axis (the first principal component) is chosen as the ordinate, then we have the desired new reference system, where the abscissa is location perpendicular to the delivery angle while the y-axis is location parallel to the delivery angle.

MLB’s Diversity Fellowship Is a Step in the Right Direction
It is not a perfect program, but it certainly counts as progress.

2. Variability

The following histograms (distribution along the second principal component) show the lateral command (from now on short for location perpendicular to the delivery angle) of Lackey’s and Cain’s curves (they are limited to 2010 data against right-handed batters).

image
image

The good news is that both are symmetrical with a single central peak; i.e., roughly bell-shaped. That’s another point in favor of the pitchers-aim-at-the-same-horizontal-spot-on-curveballs hypothesis.

Cain’s histogram is wider, indicating higher dispersion around the mean (that is, the preferred target). The standard deviation for his lateral location is .627, while it’s .517 for Lackey’s.

3. Results

Applying the algorithm to the regular curveballers (included in this analysis are the pitchers who have thrown at least 400 curveballs both in 2009 and 2010, according to MLBAM classifications), we get the following list for 2009 and 2010 combined. Lower values indicate better lateral command of the bender. Values have been separately calculated according to batter handedness, then recombined in a single value.

		   lateral 
           player  command
     Roy Halladay  0.51
       Barry Zito  0.51
      John Lackey  0.51
        Dan Haren  0.51
      Paul Maholm  0.52
  Felix Hernandez  0.53
        Zach Duke  0.53
     Jason Hammel  0.55
    Sean Marshall  0.55
  Chris Carpenter  0.56
   Javier Vazquez  0.56
       Randy Wolf  0.56
   Bronson Arroyo  0.57
       Jon Lester  0.57
     Gio Gonzalez  0.58
  Adam Wainwright  0.59
      Gavin Floyd  0.59
 Justin Verlander  0.59
  Wandy Rodriguez  0.60
  Yovani Gallardo  0.60
     Joe Saunders  0.61
       Roy Oswalt  0.63
        Matt Cain  0.63
    James Shields  0.63
     Tim Lincecum  0.67
     Ricky Romero  0.67

Other than the sexy list, the analysis spat out bell-shaped histograms for nearly all considered pitchers. Another encouraging result is that pitchers show very similar lateral command values against right-handed batters and left-handed batters; they are also stable in consecutive seasons, although the year-to-year correlation seems (remember it’s just a 26 individuals data set) a bit lower than the versus righties/versus lefties one (I feel it should be that way).

4. Repeat the process for upright command

I have tackled lateral command first, because it was easier to accept the pitchers-aim-at-the-same-spot hypothesis for it.
However, since we have seen that it’s vertical location that is important for Uncle Charlies, let’s try to see if the whole process holds for upright command (from now on short for location along the delivery angle) as well.

Here’s another scatter plot comparison, featuring curveballs by Roy Halladay and Felix Hernandez (2009 and 2010 data).

image
image

This time the charts are more or less equally “fat,” but the one picturing the King’s curves is way “longer” than the one of Doc’s.
The standard deviation on the first principal component (the new axis along the delivery angle) is 1.19 for Hernandez and 0.99 for Halladay. Here are the corresponding histograms (again limited to 2010 data against righties).

image
image

Again, they surely have a single peak and are fairly symmetrical. And the histograms of all pitchers listed before share more or less the same traits. Again, the hypothesis that pitchers aim roughly to a single spot when delivering curves is acceptable.

Now, some more ranking.

		   upright
           player  command
   Bronson Arroyo  0.92
        Zach Duke  0.94
        Dan Haren  0.95
     Roy Halladay  0.99
   Javier Vazquez  1.00
    Sean Marshall  1.01
  Adam Wainwright  1.04
  Wandy Rodriguez  1.04
      John Lackey  1.05
 Justin Verlander  1.07
    James Shields  1.07
        Matt Cain  1.08
       Jon Lester  1.12
       Barry Zito  1.13
     Tim Lincecum  1.14
     Jason Hammel  1.15
      Paul Maholm  1.16
     Joe Saunders  1.17
     Gio Gonzalez  1.17
       Roy Oswalt  1.19
  Felix Hernandez  1.19
       Randy Wolf  1.21
      Gavin Floyd  1.24
  Yovani Gallardo  1.26
  Chris Carpenter  1.29
     Ricky Romero  1.47

Upright command showed a higher year-to-year correlation than lateral command, and there’s also some relation between the two.

Comments

How can all this stuff really be useful other than for creating command rankings?
{exp:list_maker}It could be employed to monitor command improvements. Adam Wainwright, for example improved both his lateral (from 0.62 to 0.56) and his upright command (from 1.06 to 1.02) from year 2009 to 2010.
A check on whether higher command of one pitch results in higher efficacy for that pitch (as measured by pitch run value) is surely required. Separate analysis on lateral and upright command would be advisable.
Another field that could be explored is whether delivery angle plays a role in a pitch effectiveness: Are pitchers with a lower angle allowed a poorer command because their mistakes do not end in the most dangerous locations?
Finally, at a micro level, it might be employed to suggest where a pitcher should aim his throws, based on the hitter’s weaknesses (everybody is heatmap crazy lately!) and the pitcher’s command (both lateral and upright) and delivery angle. {/exp:list_maker}

Issues

One big issue with this analysis is that the PITCHf/x system doesn’t have a perfect and consistent calibration, as other analysts have shown (see Mike Fast during last year LCS, to have a feel of how much the system could be offset on a given night). Thus if a pitcher played in games with extreme miscalibrations, he would look wilder than he is. This issue spurred me to build an algorithm for calculating the correction factors for every game, thus the locations used for this article have been corrected (for an explanation on the correction algorithm just stay tuned).

Another problem is that on some pitch types you are more likely to find scatter plots like this one, featuring two distinct targets, for which a more elaborated algorithm is needed.

image

Well, not exactly like this one, because you’ll hardly find someone hitting his spots like Mariano Rivera with his bread-and-butter cutter. We’ll tackle this issue before long as well.

References & Resources
Mike Fast: The Cliff Lee Turnaround – The Hardball Times Baseball Annual 2009
Dave Allen: Measuring a Pitcher’s Ability to Locate a Pitch – The Baseball Analysts
Jeremy Greenhouse: Spitballing on Command – The Baseball Analysts
Jeremy Greenhouse and Nick Steiner: Scouting by Numbers – The Hardball Times Baseball Annual 2011.


Print This Post
Sort by:   newest | oldest | most voted
Peter Jensen
Guest
Peter Jensen
Another terrific article Max. I wonder how much a pitcher’s use of deception plays a part on the choice of locations.  A pitcher may be choosing a pitch type and pitch location to have that pitch mimic another pitch without much concern of where the pitch actually ends up since the success of the pitch will depend on how well it deceives the batter.  That might defeat your analysis somewhat.  This is one area where teams have a clear advantage over independent analysts.  While we can only infer a pitcher’s command for in game performance, they can ask a pitcher… Read more »
Dave Studeman
Guest
Dave Studeman

Fantastic, Max.

Jeremy Greenhouse
Guest
Jeremy Greenhouse

Very cool, Max. Wish I knew more about principal component analysis or clustering.

Derek
Guest
Derek
The underlying premise here is that vertical movement is all that matters.  (e.g. with good vertical movement, the hitters cannot hit it if they know it is coming) Then, why not simply calculate (strikes + balls located low and away)  / total curves attempted I know that it would not seem nearly as “advanced” as throwing out the term principal components analysis, but anyone who understands PCA, knows that this simple fraction may make more sense as a command metric. The problem with this method is that pitches that are high and heading toward the batter’s ear (clear mistakes that… Read more »
Max Marchi
Guest
Max Marchi

Peter,
Probably Scientific Baseball LLC has the best setting to put this kind of analysis to test.

Max Marchi
Guest
Max Marchi
Derek, what you propose is to see whether a pitcher throws the ball where it’s least dangerous, no matter if he intended to hit that spot. Many other analysts have already done that. What I’m trying to see is whether a pitcher can locate the ball where he wants, no matter if it’s a good place. If a pitcher wants to throw his curves in the fat part of the plate and letters high, and hit that spot every time, he has great command of his curve—too bad he aims it at bad spots. This particular pitcher doesn’t need to… Read more »
Derek Neal
Guest
Derek Neal
Max: The premise is that the best pitchers in the world have catchers who are not stupid.  Thus, unless one assumes that the vertical movement is the dominate (or even only) factor determining the batter’s success at hitting the pitch hard, one would tend to assume that balls left in the middle of the plate were more likely to be deviations from their intended target than balls placed low and right on the corner. I am also willing to assume that curve balls at the ear are less likely to be hitting an intended spot (since batters never swing and… Read more »
Max Marchi
Guest
Max Marchi
Derek: A clarification first. The centre of the histograms, i.e. the higher bar, does not represent the middle of the strike zone, rather the place where most of the pitches are located for one particular pitcher. That is, it might be down-the-middle if a pitcher constantly delivers down-the-middle, or low and away if the pitcher nibbles that corner on most of his offerings. What you are saying makes perfectly sense and doesn’t necessarily conflict with findings in the article. Let me try to rephrase my previous reply. Say you aim 100 darts at the same target (center of the dartboard).… Read more »
Derek
Guest
Derek
I assumed that 0 on the horizontal axis was for a second principal component of zero (i.e. I assumed that the verbal description above the chart was correct and the label was wrong)  Thus, the histogram is providing the density function for the perpendicular distance (second principal component) from the bold line (what you call delivery angle). My problem is that the second principal component is zero anywhere on that line because along that line the first component is giving a perfect fit, and ON THAT SAME LINE, many of the points are good pitches and many are bad pitches.… Read more »
Max Marchi
Guest
Max Marchi
Derek, I presented two measures, one on the first and one on the second principal component. Thus, if the pitch is along the delivery angle, it is considered to be well located “laterally”, but it might be badly located “upright”. The combination of the two should give a better idea of the pitcher’s command. Thank you for all your thoughtful comments. I’m working on some ideas I got from reading them, including something on the single spot issue (common sense says you’re right, so the burden of proof is on my side). Surely, as Peter suggested in the first comment,… Read more »
wpDiscuz