Archive for October, 2013

The Best Case for Bryce

Happy 21st birthday, Bryce Harper!

In two seasons to date, Harper has posted a 128 wRC+ while hitting .272/.353/.481 in 1094 plate appearances.

Steamer projections have Harper projected to hit .266/.347/.464 as a 21-year old, which would make for a 125 wRC+. But if Harper posts a lower batting average, OBP, and slugging than he did in either of his first two years, I imagine that would be a major disappointment, not just for fans of the Washington Nationals, but for fans of the sport of baseball. And also, perhaps mostly, for the player himself. (But at least the projections have him down for a career-high 23 home runs.)

Changing gears for a moment, how about that Mike Trout. You may have heard, but some people thought he was the American League’s most valuable player after he hit .326/.399/.564 in 639 plate appearances in 2012. Then he somehow got even better as a hitter in 2013, posting a .323/.432/.557 line in 716 PA.

But when Trout was 19, he hit .220/.281/.390 in a 40-game, 135-PA cameo in 2011. Harper would crush that line as a 19-year old rookie in 2012. Then, of course, Trout’s age-20 season set an impossible standard that Harper had about a 3.4×10^9 percent chance of surpassing, if we’re being optimistic.

Because of the one-year age difference, had Trout just ended up reasonably good rather than ridiculously great, he might have served as a decent guide for how Harper could develop. Sort of a one-year advance copy. But Trout’s 2013 season confirmed that he is ridiculously great, so that idea is out the window for now.

What about other players who got their starts as teenagers? According to the Baseball-Reference.com similarity scores, Harper through his age 20 season has posted numbers most similar to Tony Conigliaro (956), Ken Griffey (954) and Mickey Mantle (954). All three of these players debuted in their age 19 seasons.

Mantle was already a great hitter when he was 20, posting a .311/.394/.530 line in 626 PA (158 wRC+), but the other two players set more worldly, but still great-for-20, lines: Griffey a .300/.366/.481 (666 PA, 132 wRC+) and Conigliaro a .269/.338/.512 (586 PA, 131 wRC+).

Harper’s wRC+ in 2013 was 137, slightly better than either Griffey or Conigliaro, but he only put in 497 plate appearances. Still, the three players had awfully similar age-20 seasons.

When he turned 21, Conigliaro’s effectiveness decreased to a 123 wRC+ and .265/.330/.487, before a recovery when he turned 22 (144 wRC+, .287/.341/.519, 389 PA) prior to the disaster that occurred on August 18, 1967, when he was hit in the face by a pitch.

Griffey’s improvement was steadier, as he posted a .327/.399/.527 line when he was 21 and a .308/.361/.535 one at 22 years old, with wRC+ marks of 148 and 145, before experiencing his first two 160 wRC+ seasons the next two years.

One more player I want to talk about in this context is Giancarlo Stanton. He fiddled around in A+ and AA when he was 19, because the universe doesn’t just up and grant every great talent the ability to hit Major League pitching as a teenager. Stanton instead debuted in his age 20 season and hit .259/.326/.507 (118 wRC+ in 396 PA) before hitting 34 home runs in his age 21 season with a 141 wRC+ and a .262/.356/.537 line in 601 PA.

So where the heck are we now? I just shared a lot of names and players and numbers and slashes, but none belong to Bryce Harper. He’ll have a heck of a lot more to do with his development than Mickey Mantle’s ghost.

I think the record shows, however, that players who hit well when they are 19 and 20 generally don’t stagnate at 21. The projected line from the beginning of this post still seems low.

To conclude, here is a possible range of outcomes for Bryce Harper in 2014:

Worst-case: His health remains an issue. His stats end up about as projected…or worse.

Mid-case #1: He actually gets healthy but still faces a Conigliaro-like decline between his age 20 and age 21 seasons. (Although, Conigliaro’s decline still left him hitting at a darned good level.)

The Steamer projection is somewhere between this and the prior case.

Mid-case #2: Ken Griffey. Don’t let the version of Ken Griffey from his mid-20s in the mid-90s, the version who hit 56 home runs in consecutive seasons, interfere with the classification of this as a “mid-case.” A 10-20 point jump in Harper’s wRC+, as Griffey experienced when he turned 21, would be a welcome development and continue Harper on his perennial all-star path.

Best-case: Mike Trout. I might have skipped a couple mid-cases, but let’s get back to Trout. It’s going to always get back to Trout, I think, for years when we have conversations like this. But if Trout could struggle when he was 19–unlike Harper, Mantle, Conigliaro, Griffey (sorry Stanton)–and then explode when he turned 20, why can’t the other once-in-a-generation talent of this generation experience a similar jump? (Please allow me a “why can’t” when talking about best-case scenarios.) It wouldn’t be a change from bad to great, but good to unfathomable, and it would come a year later, but maybe instead of having Griffey’s age-20 season and Griffey’s age-21 season, Harper can skip right to Trout or Mantle’s age-21 season.

The “Griffey-Griffey” path is still a more realistic hope for those looking for Harper to exceed the computed expectations set by Steamer. I don’t think a 150 wRC+ is out of reach, but even a 140 or 145 wRC+ or so would be a nice continuation for Harper’s career.


#KillTheWin, Postseason Style

Adam Wainwright pitched a decent game Monday night in Game 3 of the NLCS, throwing 7 innings and giving up 6 hits, no walks and striking out 5. He had a game score of 62, usually a sign of a well-pitched game, and he ended up with the loss because the Cardinals offense chose to take the night off. Brian Kenny (@MrBrianKenny) of the MLB Network started a movement called KillTheWin, his quixotic effort to have the win eliminated as a baseball statistic. I wrote a couple posts at my blog Beyond The Scorecard because I thought it was an interesting idea and seemed like a fun issue to research and will include the links at the end of this post, but Wainwright’s game got me thinking–how often in the postseason is a pitcher not justly rewarded for a good effort?

As the use of starting pitchers has changed over time, the win has become a far less effective metric in judging pitcher effectiveness. I don’t remember how I stumbled across using a game score of 60 as my marker of effectiveness (probably at Kenny’s suggestion) and like any other single number it’s not the entire story of a pitching performance, but it grants the opportunity to separate pitching effectiveness from a lack of offensive production or bad defense. Including Monday’s game there have been 1,393 postseason games played since 1903, meaning there have been 2,786 starts in postseason history–this chart shows the breakdown of wins, losses and no-decisions for those starters in that time frame:

In the postseason, starting pitchers won almost 36% of their starts. This covers the entire spectrum of postseason play, from the games in the early 1900s when a pitcher typically finished what he started all the way to examples like Saturday where Anibal Sanchez was removed after 6 innings (and 116 pitches)…and throwing a no-hitter. Different times, to be sure. With this context, this chart shows how often a pitcher who had a game score of 60 or greater was credited with the win:

Definitely an improvement over the general trend, but still, a pitcher who pitches well enough to attain a game score of 60 or greater has done all he can–he’s given up few hits and walks and struck out a decent number of hitters. In short, he’s kept base runners off base, the primary job of a pitcher and almost 35% of the time has nothing to show for it, or even worse, is tagged with a loss. This chart shows these numbers since the playoffs were expanded in 1969:

The introduction of relievers definitely hurt the cause of these starting pitchers, with almost 40% of pitchers who threw very good games not receiving a win. On the flip side, it is gratifying to see that only around 9% of wins go to pitchers who were the beneficiaries of being on the right side of 13-12 scores or games along those lines–justice exists somewhere. This last chart shows the record by game score stratification:

Who was that unlucky pitcher with a game score greater than 90 who received the loss? Nolan Ryan in Game 5 of the 1986 NLCS.

The 10-15 regular readers of my blog hopefully are aware that I typically write with my tongue firmly lodged in my cheek, and the win is so entrenched in baseball lore that removing it as a point of discussion simply won’t happen, but it doesn’t mean that it has to receive the emphasis it does. When we have the wealth of data that sites like FanGraphs places at our fingertips, we don’t have to rely on a metric that was formed at the inception of organized baseball that is a relic today, particularly one that doesn’t give an accurate portrayal of pitching performance around 35% of the time. Kill The Win–maybe not, but we can certainly de-emphasize it.

#KillTheWin blog posts:

The first one, which lays out definitions and rationale

The second one, which expands it

A final one, an exercise in absurdity


Merkle’s Boner and False Imprisonment

Talcott v. National Exhibition Co., 144 A.D. 337, 128 N.Y.S. 1059 (2 Dept., 1911)

What was Merkle’s Boner?

On September 23, 1908 the Chicago Cubs played the New York Giants at the famed Polo Grounds.  Al Bridwell came to bat with two outs and the game tied 1-1 in the bottom of the ninth.  He laced a single to the outfield and the runner on third trotted home, thinking he had just scored the winning run.  The Cubs second baseman Johnny Evers, of the famed “Tinkers to Evers to Chance” double play combination and future Hall of Fame inductee, however, called for the ball from the outfield because Fred Merkle, the Giants runner on first, had not touched second base.  Although there is controversy regarding whether Evers got the actual ball back, the umpire ruled Merkle out at second and due to the force, the apparent winning run was erased.

As was common at the time, the fans at the Polo Grounds would walk across the field after the game to exit the ballpark.  By the time the play was decided and the winning run nullified, however, the fans believing the Giants had won were already streaming across the field and it was impossible to resume the game before the game was called on account of darkness.

On October 6, 1908, the National League Board of Directors made its final ruling that because Merkle had failed to reach second, the force rule was applied correctly and the game was a tie.  At the end of the season, the Cubs and Giants were tied for first place and a makeup game was needed to determine which team would play in the World Series.  This game was played on October 8, 1908 at the Polo Grounds and reportedly drew 40,000 people, the largest crowd ever to have attended a single baseball game at the time.

The Cubs won this game over the Giants and went on to beat the Tigers 4-1 in the World Series, their last World Series victory.

The play that forced the makeup game was dubbed “Merkle’s Boner” and Fred Merkle was tagged with the nickname “Bonehead.”  Years later, Merkle admitted that he never touched second base but claimed he had been assured by umpire Bob Emslie that the Giants had won.  Despite a solid 16-year Major League career, including four seasons with the Cubs, Merkle was never able to shake the stigma of the play.

What does Merkle’s Boner have to do with this case?

As a result of the play and the October 6th mandate for the makeup game, the Polo Grounds played host to the makeup game on October 8, 1908.  This game was “of very great importance to those interested in such games, and a vast outpouring of people were attracted to it.”  On the morning of the game, the ticket booths at the Polo Grounds were inundated with people trying to secure reserved seats for that afternoon’s game.

Plaintiff Fredrick Talcott, Jr. went to the ballpark intending to buy tickets for the game and entered an “inclosure” where the ticket booths were located.  After finding that the tickets were sold out, he tried to leave the inclosure along with a great number of people also trying to exit at the same time.  As he attempted to leave, however, ballpark attendants prevented his exit and he was “detained in the inclosure for an hour or more, much to his annoyance and personal inconvenience.”  Mr. Talcott brought this lawsuit seeking damages for false imprisonment.  He further claimed to have been pushed by the defendant’s “special policemen.”

The Giants countered that plaintiff simply could have used one of the other exits available.  Mr. Talcott alleged, however, that he was not aware of any other exits to the inclosure and none were pointed out to him.

Who won?

The case went to a jury trial and Mr. Talcott was awarded $500 in damages (approximately $12,000 today) with judgment entered on May 19, 1910.

The Giants appealed but the appellate court affirmed the judgment in favor of Mr. Talcott.

Why?

The jury found that that plaintiff’s detention was unwarranted.  The appellate court agreed with this finding, ruled that the award was not excessive and found no reason to interfere with the jury’s verdict.

Additionally, the court found that Mr. Talcott was not required to demonstrate that he incurred any special or actual damages as a result of the detention.


Pitching Sinks

Pitch sequencing is a complicated topic of study. Given the previous pitch(es) to a batter, the next pitch may depend on factors such as the game-based information (e.g., count, number of outs, runners on base); the previous pitch(es), including their location, type, and batter’s response to them; and the scouting report against the batter as well as the repertoire of the pitcher. In order to approach pitch sequencing from an analytical prospective, we need to first simplify the problem. This may involve making several assumptions or just choosing a single dimension of the problem to work from. We will do the latter and focus only on the location of pitches at the front of the strike zone. Since we are interested in pitch sequencing, we will consider at-bats where at least two pitches were thrown to a given batter. The idea is to use this information to generate a simple model to indicate, given the previous pitch, where the next pitch might be located.

We can start with examining the distance between pitches, regardless of the location of the initial pitch. If this data, for a given pitcher, is plotted in a histogram, the spread of the data appears similar to a gamma distribution. Such a distribution can be characterized many ways, but for our purposes, we will use the version which utilizes parameters k and theta, where k is the shape parameter and theta is the scale parameter. With a collection of distances between pitches in hand, we can fit the data to a gamma distribution and estimate the values of k and theta. As an example, we have the histogram of C.J. Wilson’s distances between pitches within an at-bat from 2012 overlaid with the gamma distribution where the values of k and theta are chosen via maximum likelihood estimation.

Author’s note: I started working on this quite a few weeks ago and so, at the time, the last complete set of data available was 2012. So rather than redo all of the calculations and adjust the text, I decided to keep it as-is since the specific data set is not of great importance in explaining the method. I will include the 2013 data in certain areas, denoted by italics.

Wilson Gamma photo WilsonGamma.jpeg

While this works for the data set as a whole, this distribution will not be too useful for estimating the location of a subsequent pitch, given an initial pitch. One might expect that for pitches in the middle of the strike zone, the distribution would be different than for pitches outside the strike zone. To take this into account, we can move from a one-dimensional model to a two-dimensional one. Also, instead of using pitch distance, we are going to use average pitch location, since this will include directional information as well. To start, we will divide the area at the front of the strike zone into a grid of three-inch by three-inch squares. We choose this discretization because the diameter of a baseball is approximately three inches and therefore seems to be a reasonable reference length. The domain we consider will be from the ground (zero feet) to six feet high, and three feet to the left and right of the center of home plate (from the catcher’s perspective).

We will refer to pairs of sequential pitches as the “first pitch” and the “second pitch”. The first pitch is one which has a pitch following it in a single at-bat. This serves as a reference point for the subsequent pitch, labeled as the “second pitch”. Adopting this terminology, we find all first pitches and assign them to the three-inch by three-inch square which they fall in on the grid. Then for each square, we take its first pitches and find the vector between them and their associated second pitches (each vector points from the first pitch to the second pitch). We then average the components of the vectors in each square to provide a general idea of where the next pitch in headed for the first pitches in that square.

In areas where the magnitude of the average vector is small, the location of the next pitch can be called isotropic, meaning there is no preferred direction. This is because average vectors of small magnitude are likely going to be the result of the cancellation of vectors of similar magnitude in all directions (from the histogram, the average distance between pitches was approximately 1.5 feet with most lying between 0.5 and 2.5 feet apart). One can create contrived examples where, say, all pitches are oriented either left or right and so there would be two preferred directions rather than isotropy, but these cases are unlikely to show up at locations with a reasonable amount of data, such as in the strike zone. In areas where the average vector has a large magnitude, the location of the next pitch can be called anisotropic, indicating there is some preferred direction(s). Here, the large magnitude of the average vector is due to the lack of cancellation in some direction. For illustrative purposes, we can look at one example of an isotropic location and one of an anisotropic location. First, for the isotropic case:

Wilson Isotropic photo WilsonIsotropic.jpeg

In this plot, the green outline indicates the square containing the first pitches and the red arrows are the vectors between the first and second pitches. The blue arrow in the center of the green square is the average vector. For the grid square centered at (-0.375,2.125), we have a fairly balanced, in terms of direction and distance, distribution of pitches. Therefore the average vector is small in magnitude. In other cases, we will have the pitches more heavily distributed in one direction, leading to an anisotropic location:

 photo WilsonNematic.jpeg

As opposed to the previous case, there is a distinct pattern of pitches up from the position (-0.125,1.625), which is shown by the average vector having a substantially larger magnitude. This is due to most of the vectors having a large positive vertical component. Running over the entire grid where at least one pitch had a pitch following it, we can generate a series of these average vectors, which make up a vector field. In order to make the vector field plot more legible, we remove the component of magnitude from the vector, normalizing them all to a standard length, and instead assign the length of the vector to a heat map which covers each grid square.

 photo WilsonCPVectorField.jpeg

For the 2013 data set:

Wilson Vector Field 2013 photo WilsonVectorField2013.jpeg

By computing these vectors over the domain, we are able to produce a vector field, albeit incomplete. Computing this vector field based on empirical data also lends itself to outliers influencing the average vectors as well as problems with small sample size. We can attempt to handle these issues and gain further insight by finding a continuous vector field to approximate it. To do this, we will begin with a function of two variables, to which we can apply the gradient operator to produce a gradient field. We can zoom in near the strike zone to get a better idea of what the data looks like in this area:

 photo WilsonSZVector.jpeg

Note that as we move inward, toward the middle of the strike zone, the magnitude of the average vector shrinks. In addition, the direction of all vectors seems to be toward a central point in the strike zone. Based on these observations, we choose a function of the form

P(x,z) = (1/2)c_x(x – x_0)^2 + (1/2)c_z(z – z_0)^2.

The x-variable is the horizontal location, in feet, and z the vertical location. This choice of function has the property that there is a critical point for P and when the gradient field is calculated, all vectors will radially point toward or away from this critical point. The constants in the equation of this paraboloid are (x_0,z_0), the critical point (in our case, it will be a maximum), and (c_x,c_z) are, for our purposes, scaling constants (this will be clear once we take the gradient). The gradient of function P is

grad(P) = [c_x(x – x_0), c_z(z – z_0)].

Then c_x and c_z are constants that scale the distances from the x- and z-locations to the critical point to determine the vector associated with point (x,z). Note that grad(P)(x_0,z_0) = [0,0]. In fact, we will give this point a special name for future reference: the pitching sink. For vector fields, a non-mathematical description of a sink is a point where, locally, all vectors point toward (if one imagines these vectors to be velocities, then the sink would be the point where everything would flow into, hence the name). This point is, presumably, the location where we have the least information about the direction of the next pitch, since there is no preferred direction. Again using Wilson’s data as an example:

Wilson Gradient Field photo WilsonCPGradient.jpeg

For the 2013 data set:

Wilson Grad Field 2013 photo WilsonGradField2013.jpeg

The gradient field is fit to the average vectors using linear least squares minimization for the x- and z-components. This produces estimates for c_x, c_z, x_0, and z_0. For the original vector field, if we are interested in the location where the average vector is smallest in magnitude (or the location where there is the least bias in terms of direction of the next pitch), we are limited by the fact that we are using a discretized domain and therefore can only have a minimum location at a small, finite number of points.

One advantage to this method is that it produces a minimum that comes from a continuous domain and so we will be able to get unique minimums for different pitchers. Another piece of information that can be gleaned from this approximation is the constants, c_x and c_z. If c_x is large in magnitude, there may be a large east-west dynamic to the pitcher’s subsequent pitch locations. For example, if a first pitch is in the left half of the strike zone, the next pitch may have a proclivity to be in the right half and vice versa. A similar statement can be made about c_z and north-south dynamics. Alternatively, if c_x is small in magnitude, then less information is available about the direction the next pitch will be headed. For Wilson, the constants obtained from the best fit approximation are a pitching sink of (-0.163,2.243) and scaling constants (-0.925,-1.055).

For C.J. Wilson’s 2013 season, we have the sink at (-0.109,2.307) and scaling constants (-0.902,-0.961), so the values are relatively close between these two seasons.

We can now obtain this set of parameters for a large collection of pitchers. For each pitcher, we can find the vector field based on the data and then find the associated gradient field approximation. We can then extract the scaling constants and the pitching sink. We can run this on the most recent complete season (2012, at the start of this research) for the 200 pitchers who threw the most pitches that year and look at the distribution of these parameters.

 photo TwoKSinks.jpeg

The sinks cluster in a region roughly between 1.75 and 2.75 feet vertically and -0.5 and 0.5 feet horizontally. This seems reasonable, since we would not expect this location to be near the edge or outside of the strike zone. Similarly, we can plot the scaling constants:

 photo TwoKScales.jpeg

The scaling constants are distributed around a region of -1 to -0.8 vertically and -0.7 and -0.9 horizontally.

One problem that arises from this method is that since we are averaging the data, we are simplifying the analysis at the cost of losing information about the distribution of second pitches. Therefore, we can take a different approach to try to preserve that information. To do so, at a grid location, we can calculate several average vectors in different directions, instead of one, which will keep more of the original information from the data. This can be accomplished by dividing the area around a given square radially into eight slices and calculating the average in each octant.

However, since each nonempty square may contain anywhere from one to upwards of thirty plus pitches, using octants spreads the data too thin. To better populate the octants, we can find pitchers with similar data and add that to the sample. To do this, we will go back to the aforementioned average vectors and use them as a means of comparison. At a given square, with a pitcher in mind whose data we wish to add to, we can compute the average vector for a large collection of other pitchers, compare average vectors, and add the data from those pitchers whose vector is most similar to the pitcher of reference. In order to do this, we first need a metric. Luckily, we can borrow and adapt one available for comparing vector fields:

M(u,v) = w exp(-| ||u||-||v|| |) + (1-w) exp(-(1 – <u,v>/||u|| ||v||))

Here, u and v are vectors, and w is a weight for setting the importance of matching the vector magnitudes (left) and the vector directions (right). For the calculations to follow, we take w = 0.5. The term multiplied to w on the left is an exponential function where the argument is the negative of the absolute value of the difference in the vector magnitudes. Note that when ||u|| = ||v||, the term on the left reduces to w. As the magnitudes diverge, the term tends toward zero. The term multiplied to (1-w) is an exponential function with argument negative quantity 1 minus the dot product between u and v, divided by their magnitudes. When u and v have the same direction, <u,v>/||u|| ||v|| = 1, and the exponent as a whole is zero. When u and v are anti-parallel, <u,v>/||u|| ||v|| = -1 and the exponent is -2 so the term on (1-w) is exp(-2) which is approximately 0.135, which is close to zero. So when u = v, M(u,v) = 1 and when u and v are dissimilar in magnitude and/or direction, M(u,v) is closer to zero.

We now have a means of comparing the data from different pitchers to better populate our sample. To demonstrate this, we will again use C.J. Wilson’s data. First, we will run this method at a point near his sink: (-0.125,2.125). Since we will have up to eight vectors, we can fit an interpolating polynomial in between their heads to get an idea of what is happening for the full 360 degrees around the square. The choice of interpolating polynomial in this case will be a cubic spline function. This will give a smooth curve through the data without large oscillations. Working with only Wilson’s data, which is made up of 30 pitches, this looks like:

 photo WilsonVector.jpeg

The vectors are spread out in terms of direction, but one vector which extends outside the lower-left quadrant of the plot leads to the cubic spline (light blue curve) bulging to the lower left of the strike zone. Otherwise, the cubic spline has some ebb and flow, but is of similar average distance all around.

 photo WilsonOctant.jpeg

When we remove the vectors and replace them with the average vector of each octant (red vectors), we have a better idea of where the next pitch might be headed. We also color-code the spline to keep the data about the frequency of the pitches in each octant. Red indicates areas where the most pitches were subsequently thrown and blue the least. We see that the vectors are longer to the left and, based on the heat map on the spline, more frequent. However, a few short or long vectors in areas that are otherwise data-deficient will greatly impact the results. Therefore, we will add to our sample by finding pitchers with similar data in the square. We will compute the value of M between Wilson at that square and the top 200 pitchers in terms of most pitches thrown for the same season.

For Wilson, the top five comparable pitchers in the square (-0.125,2.125), with the value of M in parentheses, are Liam Hendriks (0.995), Chris Young (0.986), A.J. Griffin (0.947), Kyle Kendrick (0.943), and Jonathan Sanchez (0.923). Recall that this considers both average vector length and direction. Adding this data to the sample increases its size to 94 pitches.

 photo WilsonetalVector.jpeg

For this plot, the average vector (the blue vector in the center of the cell) is similar to that of Wilson’s solo data. However, since the number of pitches has essentially tripled, the plot has become hard to read. To get a better idea of what is going on, we can switch to the average vector per octant plot:

 photo WilsonetalOctant.jpeg

Examining this plot, most of the average vectors are in the range of 1-1.5 feet. The shape of the interpolation is square-like and seems to align near the edge of the strike zone, extending outside the zone, down and to the left.

We can also run this at points nearer to the edge of the strike zone. On the left side of the strike zone, we can work off of the square centered at (-0.875,2.375) (note that we drop the plots of the original data in lieu of the plots for the octants).

 photo WilsonLeftSideOriginal.jpeg

For the original sample, the dominant direction (where most of the vectors are pointed, indicated by the red part of the spline) is to the right, with an average distance of one to two feet in all directions. Now we will add in data based on the average vectors, increasing our sample from 15 to 97 pitches.

 photo WilsonLeftSide.jpeg

For the larger sample, the spline, which is almost circular, has average vectors approximately 1 to 1.5 feet in length. The preferred directions are to the right (into the strike zone) and downward (below the left edge of the strike zone). Also note that comparing the two plots, the vectors in the areas where there are the most pitches in the original sample (between three and six o’clock) have average vectors that retain a similar length and direction.

 photo WilsonRightSideOriginal.jpeg

Switching sides of the strike zone, we can examine the data related the square centered at (0.875,2.375). For the original sample, the dominant direction is to the left with little to no data oriented to the right. Since there are octants that contain no data, we get a pinched area of the cubic spline. This is due to the choice of how to handle the empty octants. We choose to set the average distance to zero and the direction to the mean direction of the octant. This choice leads to pinching of the curve or cusps in these areas. Another choice would be to remove this octant from the sample and do the interpolation with the remaining nonempty octants.

 photo WilsonRightSide.jpeg

Adding data to this sample increases it from 9 pitches to 67, and the average vector and spline jut out on the right side due to a handful of pitches oriented further in this direction (this is evident from the blue color of the spline). In the areas where most of the subsequent pitches are located, the spline sits near the left edge of the strike zone. Again, the average vectors in the red area of the spline maintain a similar length and direction.

 photo WilsonTopSideOriginal.jpeg

Moving to the top of the strike zone, we choose the square centered at (0.125,3.375). The original plot for a square along the top contains 11 pitches and no second pitches are oriented upward. There are only have four non-zero vectors for the spline and the dominant direction is down and to the left.

 photo WilsonTopSide.jpeg

In this square, the sample changes from 11 to 72 pitches by adding similar data. Note the cusp that occurs at the top since we are missing an average vector there. Unsurprisingly, at the top of the strike zone, the preferred direction for the subsequent pitch is downward, and as we rotate away from this direction, the number of pitches in each octant drops.

 photo WilsonBottomSideOriginal.jpeg

Finally, along the bottom of the strike zone, we choose (0.125,1.625). Starting with 27 pitches produces five average vectors, with the dominant direction being up and to the left.

 photo WilsonBottomSide.jpeg

With the additional data from other pitchers, the number of pitches moves up to 87. The direction with the most subsequent pitches is up and to the left. In areas where we have the most data in the original sample (the red spline areas), the average vectors and splines are most alike.

There are several obvious drawbacks to this method. For the model fitting, we have some points in the strike zone with 30+ pitches and as we move away from the strike zone, we have less and less data for computing the averages. However, as we move away, the general behavior becomes more predictable: the next pitch will likely be closer to the strike zone. So the small sample should have less of a negative effect for points far away. This is also a potential problem since we use these, in some cases, small samples to calculate the average vector in each square, which is used as a reference point for adding data to the sample. It may be better to use the vector from the gradient field for comparison since it relies on all of the available data to compute the average vector (provided the gradient field approach is a decent model).

Another problem is that in computing the average vector, we are not taking into account the distribution of the vectors. The same average vector can be formed from many different combination of vectors. However, based on the limited data presented above, adding to the sample, using M and the average vectors, does not seem to have a large effect on octants where there is the most data in the original sample. These regions, even with more data, tend to retain their shape. These are also the areas that are going to contribute most to the average vector that is used for comparison, so this seems like a reasonable result.

A smaller problem that shows up near the edge of the zone is that we still occasionally, even after adding more data, get directions with only one or two pieces of data and this causes some of the aberrant behavior seen in some of the plots, characterized by bulges in blue areas of the spline. One solution to this would be to only compute the average vector in that octant if there were more than some fixed number of pitches in that direction. Otherwise, we could set the average vector to zero and the direction to the mean direction in that octant.

Obviously, an analysis of one pitcher over a small collection of squares in the grid does not a theory make. It is possible to examine more pitchers, but because the analysis must be done visually, it will be slow and imprecise. Based on these limited results, there may be potential if the process can be condensed. The pitching sink approach gives an idea of where the next pitch may be headed. As we move toward the sink, we have less information on where the next pitch is headed since near this point, the directions will be somewhat evenly distributed. As we move toward the edge of the strike zone, we get a clearer picture of where the next pitch is headed if only for the reason that it seems unlikely that the next pitch will be even further away.

While this model seems reasonable in this case, there may be cases where a more general model is needed to fit with the behavior of the data. To recover more accurate information on the location of the next pitch, we can switch to the octant method. Since some areas with this method will have very small samples, we can pad out the data via comparison of the average vectors. This seems to do well at filling out the depleted octants and retains many of the features of the average vectors in the most populated octants of the original samples. At this point, both these models exist as novelties, but hopefully with a little more work and analysis, they can be improved and simplified.


Bronson Arroyo and His Future

The loss to the Pirates, the recent removal of Dusty Baker, and the upcoming free agency of Shin-Soo Choo has overshadowed Bronson Arroyo and his status with the Reds. It seems that if there is one player who never receives enough attention, it is him. But while the baseball world may not seem to realize that he is a free agent, there is no doubt that Walt Jocketty and his staff are very much aware of the 36 year-old starter’s expired contract.

Bronson Arroyo, with the exception of 2011, has been not only one of the Reds best starters, but one of the most consistent pitchers in baseball. He has not been a Cy Young candidate and he is not the ace of the Reds by any means. But the one thing that cannot be denied is his innings pitched per year. Since joining the Reds, he has thrown over 1600 innings and has averaged about 211.1 innings pitched per year. They have started to dip recently but throwing 202 innings each year of the past two seasons shows that despite the age, he still has his durability. He has managed to avoid the DL in his career which is something to be marveled at. Every year that he has pitched with the Reds, he has started at least 32 games and averaged 6-7 innings per start. This kind of reliability is something to be desired out of a starter in this day and age where there is at least one Tommy John surgery or one pitcher who is on a strict innings limit.

One of the things that allow Arroyo to be so durable is the fact that he does not waste his time out of the zone with his pitches. His goal is to go right at the hitters. This season, he was fifth in the majors in walks per nine with 1.51. During his tenure with the Reds (2006-2013), he has averaged 2.31 BB/9 which is good for 14th among pitchers who have thrown at least 1000 IP during that time frame. He seems to be trying to improve those numbers as his BB/9 has been 1.54 over the past two seasons. He indicates that he refuses to beat himself by giving up the free pass (which can help him out seeing as how does not strike out a lot of batters and he does tend to give up home runs).

Bronson Arroyo has made himself a very good pitcher due to great durability and his ability to change speeds when he pitches. Last season his fastball averaged 87 mph and his curveball averaged about 70 mph. The change of speeds helps him to keep most batters off balance because they have no idea what kind of speed is going to be released from his arm or what kind of arm slot the baseball is going to be thrown at. While watching a Reds game, one of the guests in the booth said that he would rather face a pitcher like Aroldis Chapman because he knows what speed and arm slot to expect most of the time. Chapman will throw his fastball about 85.4% of the time and his off-speed pitch (slider) about 14.6% of the time. Once the batter stands in the batter’s box, he can expect to see that heater for the majority of the time. Bronson Arroyo throws his fastball (or sinker) last season for 44.1% of the time. That is 55.9% of the time that he throws one of his 3 other off speed pitches that ranges anywhere from 70 mph to 77.6 mph.

Despite the fact that Arroyo is such a good pitcher, it is unlikely that he will return to the Reds. The Reds, I’m sure, would like nothing more than to have Bronson Arroyo return to their team. The problem is that the Reds are going to have a full rotation and none of the other pitchers are going to the bullpen any time soon. Tony Cingrani has emerged as a phenomenal young left-handed starter that has earned a starting spot. Homer Bailey and Mat Latos have proven to be durable aces that on their best day can match up with anyone and shut down the best of offenses even in Great American Ballpark. Mike Leake probably would have been sent to the bullpen to make room for Arroyo but because of the great bounce-back season that he had, he has re-solidified his spot in the rotation as well. Cueto could be an option to be sent to the bullpen because of his long list of injuries but it is true that when healthy, he is one of the best pitchers in the game. The Reds also have several very talented pitching prospects in the minors in Robert Stephenson, Daniel Corcino, and Nicholas Travieso who are just waiting for an excuse to be called up to the majors. And because of Arroyo’s proven track record it is almost a solid guarantee that he will not be sent to the bullpen.

If you take away anything from these past few paragraphs, it should be that Arroyo is a solid and dependable starter. Maybe on certain teams (I’m looking at you, Houston) he could be an ace but on most teams he will be a solid mid-bottom of the rotation starter for any team. His tendency to give up home runs could be cured in a more pitcher-friendly ballpark but it is unlikely that the problem will go away all together. He is a good pitcher who might get his 3 years, and 30+ million dollars somewhere but he will not find it in Cincinnati. Cincinnati is a mid-market team who is going to have to worry about signing up Homer Bailey, Mat Latos, and Tony Cingrani in the future and they have already spent a lot of money to keep Jay Bruce and Joey Votto locked up for the long haul. Their depth in pitchers allows them to look elsewhere for places on where to spend all of the money that they would have to spend in order to resign Arroyo. Perhaps they could use it to get La Russa out of retirement . . .


A Review of Lineup Optimization in 2013: AL

Warning: Very long post ahead.

At some point in time, maybe you’ve complained about the lineup your favorite team’s manager used. Maybe you’ve heard of or considered the concept of lineup optimization. Maybe you’ve heard that an optimized lineup, over the course of a full season, wouldn’t make that big of a difference.

It really doesn’t, but that doesn’t make it any less interesting.

In elementary school I spent precious class time attempting to optimize kickball lineups. I suppose that was my first foray into the world of sabermetrics and general baseball nerdiness.

Now, I tend to visit BaseballPress on a daily basis to check the lineups of every team, just because. Even more now, I am writing a long post regarding lineup optimization in the MLB.

Sky Kalkman wrote a great piece on his interpretation of The Book’s findings on lineup optimization. He summed it up with this:

“…we want to know how costly making an out is by each lineup position, based on the base-out situations they most often find themselves in, and then weighted by how often each lineup spot comes to the plate. Here’s how the lineup spots rank in the importance of avoiding outs:

#1, #4, #2, #5, #3, #6, #7, #8, #9

So, you want your best three hitters to hit in the #1, #4, and #2 spots. Distribute them so OBP is higher in the order and SLG is lower. Then place your fourth and fifth best hitters, with the #5 spot usually seeing the better hitter, unless he’s a high-homerun guy. Then place your four remaining hitters in decreasing order of overall hitting ability, with basestealers ahead of singles hitters.”

Following the conclusion of the Major League Baseball regular season, I took to the task of finding each team’s most common starters and lineups, hypothetically optimizing them and comparing the results by which team theoretically cost themselves the most runs by straying from optimization.

I sorted each team’s hitters by plate appearances, made sure there was a representative of every position and used Baseball-Reference’s batting order archive to find the most common order those eight/nine players appeared in to find each team’s hypothetical “most common” lineup.

Then I plugged that lineup into Baseball Musing’s lineup optimization tool, along with their 2013 OBP and SLG to find the optimized lineup for each team.

It’s far from a perfect science, especially with teams like Oakland who often change their lineup by utilizing platoons, but it’s good enough and I wanted an opportunity to tell people much smarter and more qualified than me how to better do their job.

Behold, the results (where rpg is runs per game, season difference is the amount of runs “lost” from a season’s worth of theoretical lineups to optimized lineup, and rank is the most optimized to least optimized lineups):

Boston Red Sox

Common rpg: 5.448. Optimized rpg: 5.547. Season difference: -16.038 runs.

Rank: 10th AL, 24th overall

2013 OBP SLG Optimal OBP SLG
CF Ellsbury .355 .426 LF Nava .385 .445
RF Victorino .351 .451 DH Ortiz .395 .564
2B Pedroia .372 .415 RF Victorino .351 .451
DH Ortiz .395 .564 1B Napoli .360 .482
1B Napoli .360 .482 C Saltalamacchia .338 .466
LF Nava .385 .445 SS Drew .333 .443
C Saltalamacchia .338 .466 CF Ellsbury .355 .426
3B Middlebrooks .271 .425 3B Middlebrooks .271 .425
SS Drew .333 .443 2B Pedroia .372 .415

 

 

 

 

 

 

 

Oh, man. Off to a rocky start. Bear with me, folks, they aren’t all this jarring. This is probably the wackiest one that got spit out. The Red Sox obviously would never hit Dustin Pedroia ninth. The Book likes the nine-hitter to be a high OBP, low SLG guy so the top-of-the-order hitters have guys on base when they come to bat. And to be fair, Dustin Pedroia pretty much had the batting profile of a slap-hitter this season. He had the lowest SLG on the team, and his ISO puts his power production below guys like Brandon Crawford and Chris Denorfia. While his great .372 OBP is likely being put to waste in this lineup, Pedroia’s 2013 numbers fit the bill of an optimal #9 hitter when the rest of the lineup is this good.

Tampa Bay Rays

Common rpg: 4.689. Optimized rpg: 4.779. Season difference: -14.580 runs.

Rank: 8th AL, 17th overall

2013 OBP SLG Optimal OBP SLG
CF Jennings .334 .414 2B Zobrist .354 .402
DH Joyce .328 .419 RF Myers .354 .478
2B Zobrist .354 .402 DH Joyce .328 .419
3B Longoria .343 .498 1B Loney .348 .430
1B Loney .348 .430 3B Longoria .343 .498
RF Myers .354 .478 CF Jennings .334 .414
LF Johnson .305 .410 SS Escobar .332 .366
C Molina .290 .304 LF Johnson .305 .410
SS Escobar .332 .366 C Molina .290 .304

 

 

 

 

 

 

 

I always like Joe Maddon’s lineups. He mixes things up a lot and isn’t afraid to push the envelope. He’s batted catchers high in the order. He’s led off Ben Zobrist, an excellent – but unconventional – leadoff hitter. For a while this year he batted Evan Longoria second, which is quite smart and probably never would have been considered a decade ago. However, Desmond Jennings isn’t an ideal leadoff hitter with a .330 career OBP and Matt Joyce‘s .252 BABIP left him with the lowest OBP of his career. Zobrist is the Rays best leadoff hitter and Wil Myers, arguably the Rays most productive hitter, should be higher in the order.

Baltimore Orioles

Common rpg: 4.724. Optimized rpg: 4.814. Season difference: -14.580 runs.

Rank: 9th AL, 19th overall

2013 OBP SLG Optimal OBP SLG
LF McLouth .329 .399 LF McLouth .329 .399
3B Machado .314 .432 1B Davis .370 .634
RF Markakis .329 .356 SS Hardy .306 .433
CF Jones .318 .493 CF Jones .318 .493
1B Davis .370 .634 3B Machado .314 .432
C Wieters .287 .417 DH Flaherty .293 .390
SS Hardy .306 .433 2B Roberts .312 .392
DH Flaherty .293 .390 C Wieters .287 .417
2B Roberts .312 .392 RF Markakis .329 .356

 

 

 

 

 

 

 

Chris Davis started the season out as the Orioles #5 hitter, because no one yet knew he would transform into some sort of robot humanoid. Once the transformation was well underway, Buck Showalter continued batting Davis fifth and a struggling Nick Markakis third, likely because “if it ain’t broke, don’t fix it,” and the idea that moving a hot batter to a different spot in the order could somehow throw him out of his groove. It took the Orioles until the middle of August to move Davis into the three-hole and by then Davis’ low spot in the order relative to his production likely cost them a handful of runs. Given the disparity of his OBP compared to his teammates, he’s even better suited for the two-hole.

New York Yankees

Common rpg: 3.978. Optimized rpg: 4.077. Season difference: -16.038 runs.

Rank: 1th AL, 25th overall

2013 OBP SLG Optimal OBP SLG
CF Gardner .344 .416 CF Gardner .344 .416
RF Suzuki .297 .342 2B Cano .383 .516
2B Cano .383 .516 1B Overbay .295 .393
DH Hafner .301 .378 DH Hafner .301 .378
LF Wells .282 .349 SS Nunez .307 .372
1B Overbay .295 .393 RF Suzuki .297 .342
SS Nunez .307 .372 3B Nix .308 .311
3B Nix .308 .311 LF Wells .282 .349
C Stewart .293 .272 C Stewart .293 .272

 

 

 

 

 

 

 

This isn’t the Yankees lineup we’re used to after the last couple months, or really after the last decade. But as Yankees fans well know, it is the lineup we saw for the majority of the season. Sorry you had to see this again, Yankees fans. The Yankees did hit Robinson Cano in his more deserved second-position for a period of time, but it was basically out of necessity as they had no other real hitters to work with. Instead, Ichiro Suzuki spent the majority of the time in the two-hole seemingly on reputation alone, despite being the third-worst candidate for the spot on a team full of Jayson Nix, Eduardo Nunez and Lyle Overbay‘s.

Toronto Blue Jays

Common rpg: 4.791. Optimized rpg: 4.914. Season difference: -19.926 runs.

Rank: 14th AL, 29th overall

2013 OBP SLG Optimal OBP SLG
SS Reyes .353 .427 SS Reyes .353 .427
LF Cabrera .322 .360 DH Encarnacion .370 .534
RF Bautista .358 .498 CF Rasmus .338 .501
DH Encarnacion .370 .534 RF Bautista .358 .498
1B Lind .357 .497 1B Lind .357 .497
C Arencibia .227 .365 3B Lawrie .315 .397
CF Rasmus .338 .501 LF Cabrera .322 .360
2B Izturis .288 .310 C Arencibia .227 .365
3B Lawrie .315 .397 2B Izturis .288 .310

 

 

 

 

 

 

 

Like the Rays and Yankees, the Blue Jays experimented for a bit this season and batted Jose Bautista #2. Like the Rays and Yankees, this was very smart. Like the Rays and Yankees, they inexplicably stopped their experiment and reverted to a more traditional lineup. Melky Cabrera was not a good hitter this year, yet there he sits in the most important spot of our hypothetical lineup, while the Blue Jays have three great #2 candidates in Edwin Encarnacion, Bautista and even Adam Lind, who was basically “slow Jose Bautista” this season. Burying Colby Rasmus‘ .500 SLG in the seven-hole also didn’t help. And, no, that isn’t a typo. J.P. Arencibia really finished with a .227 OBP this year.

Detroit Tigers

Common rpg: 5.375. Optimized rpg: 5.510. Season difference: -21.870 runs.

Rank: 15th AL, 30th overall

2013 OBP SLG Optimal OBP SLG
CF Jackson .337 .417 DH Martinez .355 .430
RF Hunter .334 .465 3B Cabrera .442 .636
3B Cabrera .442 .636 2B Infante .345 .450
1B Fielder .362 .457 1B Fielder .362 .457
DH Martinez .355 .430 SS Peralta .358 .457
LF Dirks .323 .363 CF Jackson .337 .417
SS Peralta .358 .457 C Avila .317 .376
C Avila .317 .376 RF Hunter .334 .465
2B Infante .345 .450 LF Dirks .323 .363

 

 

 

 

 

 

 

OK, this one is actually kind of genius. Although OBP is far more important than speed in regards to a leadoff hitter, speed still kind of matters. You probably don’t want your slowest player batting leadoff, especially if you have a burner in the two or three spot. But the Tigers already have the slowest team in baseball, by far, and Miguel Cabrera is their ideal two-hitter. Since Victor Martinez won’t be holding Miggy up on the basepaths, putting his .355 OBP in front of Miggy is actually really smart, especially considering Miggy hits a first-inning homer like half the time anyway. Austin Jackson‘s baserunning ability is better suited towards the bottom of the lineup for singles hitters like Alex Avila, Torii Hunter and Andy Dirks. Because of this wildly unconventional lineup, the Tigers ranked last in the study, and I would really love to see this lineup actually get played out.

Cleveland Indians

Common rpg: 4.456. Optimized rpg: 4.509. Season difference: -8.586 runs.

Rank: 2nd AL, 7th overall

2013 OBP SLG Optimal OBP SLG
CF Bourn .316 .360 C Santana .377 .455
1B Swisher .341 .423 2B Kipnis .366 .452
2B Kipnis .366 .452 LF Brantley .332 .396
C Santana .377 .455 1B Swisher .341 .423
LF Brantley .332 .396 SS Cabrera .299 .402
SS Cabrera .299 .402 DH Reynolds .307 .373
DH Reynolds .307 .373 RF Stubbs .305 .360
3B Aviles .282 .368 3B Aviles .282 .368
RF Stubbs .305 .360 CF Bourn .316 .360

 

 

 

 

 

 

 

As an Indians fan who was constantly frustrated by Terry Francona’s lineups, their rank in the study surprised me. However, the Indians problem was more with player selection, not lineup order, which isn’t reflected in the study. The Indians best statistical hitter, Ryan Raburn, amassed only 277 PA’s and didn’t make the cut. Yan Gomes, the Indians second best hitter, eventually began receiving his well-deserved playing time, but still finished with just 322 PA’s and missed the cut. To start the season, the Indians buried Carlos Santana‘s great OBP in the six-hole and wouldn’t move Asdrubal Cabrera‘s putrid OBP out of the top of the order. But Francona fixed his mistake early enough for it not to be reflected in the years end most common lineup. And in that lineup, the Indians did a good job by having their top five hitters be their highest OBP guys. Michael Bourn was not the leadoff hitter the Indians thought they were signing, and was actually a pretty bad one with a .316 OBP. Santana and Jason Kipnis are much more deserving choices to lead off, though in real life I would likely flip-flop them, considering speed.

Kansas City Royals

Common rpg: 4.094. Optimized rpg: 4.204. Season difference: -17.820 runs.

Rank: 13th AL, 27th overall

2013 OBP SLG Optimal OBP SLG
LF Gordon .327 .422 DH Butler .374 .412
1B Hosmer .353 .448 1B Hosmer .353 .448
DH Butler .374 .412 LF Gordon .327 .422
C Perez .323 .433 C Perez .323 .433
CF Cain .310 .348 RF Lough .311 .413
3B Moustakas .287 .364 SS Escobar .259 .300
RF Lough .311 .413 CF Cain .310 .348
2B Getz .288 .273 3B Moustakas .287 .364
SS Escobar .259 .300 2B Getz .288 .273

 

 

 

 

 

 

 

Despite performing poorly in the study, the Royals two lineups were actually pretty close, and theoretically they could have earned themselves a handful more runs by simply swapping Billy Butler and Alex Gordon‘s spots in the lineup. I have always loved Gordon as an unconventional leadoff hitter, but this season he stopped taking walks and getting hits on 35% of his balls in play, leading to a pedestrian .327 OBP after posting marks of .376 and .368 the last two seasons. Butler had a weird year, too, as he started walking all the time and lost all his power, posting a lower isolated slugging percentage than David Lough. But Butler is one of the slowest players in baseball and Eric Hosmer is a pretty good baserunner, especially for a first basemen, so swapping their orders in the optimized lineup might make more sense.

Minnesota Twins

Common rpg: 4.301. Optimized rpg: 4.379. Season difference: -12.636 runs.

Rank: 5th AL, 12th overall

2013 OBP SLG Optimal OBP SLG
2B Dozier .312 .414 LF Willingham .342 .368
C Mauer .404 .476 C Mauer .404 .476
LF Willingham .342 .368 DH Doumit .314 .396
1B Morneau .315 .426 1B Morneau .315 .426
DH Doumit .314 .396 2B Dozier .312 .414
3B Plouffe .309 .392 3B Plouffe .309 .392
RF Arcia .304 .430 SS Florimon .281 .330
CF Thomas .290 .307 RF Arcia .304 .430
SS Florimon .281 .330 CF Thomas .290 .307

 

 

 

 

 

 

 

Blame slugging percentage for this one. Joe Mauer should really be the Twins leadoff hitter. But, since slugging percentage is flawed in its attempt to represent power by including singles – something Mauer hits a ton of – Mauer has over 100 points of SLG on Josh Willingham, leading the generator to believe Willingham is a more ideal leadoff hitter despite Mauer’s .404 OBP. We all know that Willingham is more of a power hitter than Mauer, which is why we should always use ISO to measure power, where Willingham edges Mauer .159 to .156 even on a down season. Other than the mistake of batting Brian Dozier leadoff, though, the Twins real-life lineup does a pretty great job, with their OBPs falling in descending order after Dozier. If this lineup generator used ISO instead of SLG like I wish it would, flip-flopping Mauer and Willingham at the top would likely be the optimal order for the Twins.

Chicago White Sox

Common rpg: 3.950. Optimized rpg: 4.030. Season difference: -12.960 runs.

Rank: 6th AL, 14th overall

2013 OBP SLG Optimal OBP SLG
CF De Aza .323 .405 CF De Aza .323 .405
SS Ramirez .313 .380 RF Rios .328 .421
RF Rios .328 .421 3B Gillaspie .305 .390
1B Dunn .320 .442 1B Dunn .320 .442
DH Konerko .313 .355 LF Viciedo .304 .426
3B Gillaspie .305 .390 SS Ramirez .313 .380
LF Viciedo .304 .426 DH Konerko .313 .355
2B Keppinger .283 .317 C Flowers .247 .355
C Flowers .247 .355 2B Keppinger .283 .317

 

 

 

 

 

 

 

Alejandro De Aza isn’t a great leadoff hitter with a .323 OBP, but when you have the fourth worst team OBP in baseball, .323 will do. The main problem with the White Sox order is their two-hole, as is the problem with most MLB lineups. Alexei Ramirez‘s offensive skill set is basically the exact one that MLB managers are beginning to move away from in the 2-hole, with his .313 OBP, complete disappearance of power and newfound penchant for stealing bases. Contrary to conventional wisdom, good base stealers are better suited for the 6/7 spots in the lineup. Risking outs with your best hitters at the plate, who are more likely to drive you in with extra base hits anyway, is not a good idea. With that aside, the White Sox did well by choosing the correct leadoff hitter and keeping their worst hitters at the bottom of the order.

Oakland Athletics

Common rpg: 4.933. Optimized rpg: 4.989. Season difference: -9.072 runs.

Rank: 3rd AL, 8th overall

2013 OBP SLG Optimal OBP SLG
LF Crisp .335 .444 3B Donaldson .384 .499
SS Lowrie .344 .446 SS Lowrie .344 .446
CF Cespedes .294 .442 DH Smith .329 .391
1B Moss .337 .522 1B Moss .337 .522
3B Donaldson .384 .499 C Norris .345 .409
DH Smith .329 .391 CF Cespedes .294 .442
RF Reddick .307 .379 LF Crisp .335 .444
C Norris .345 .409 RF Reddick .307 .379
2B Sogard .322 .364 2B Sogard .322 .364

 

 

 

 

 

 

 

Surprise! The Oakland Athletics scored well in a SABR-slanted study. And this doesn’t even take into account how well the A’s optimize their lineup on a daily basis by correctly utilizing platoons. But either way, in this theoretical lineup, the A’s do a good job by getting their second and fourth hitters correct. Though breakout player and MVP-candidate Josh Donaldson is better suited to lead off, Coco Crisp was still a good option. And whether incidental or not, Yoenis Cespedes‘ low OBP in the three-hole doesn’t hurt them too much, as OBP isn’t as important in the three-hole as conventional wisdom would tell you. The A’s do well in this study with the lineup provided for them, and do even better in real life by putting the right guys on the field every day.

Texas Rangers

Common rpg: 4.481. Optimized rpg: 4.582. Season difference: -16.362 runs.

Rank: 12th AL, 26th overall

2013 OBP SLG Optimal OBP SLG
2B Kinsler .344 .413 2B Kinsler .344 .413
SS Andrus .328 .331 3B Beltre .371 .509
RF Cruz .327 .506 1B Moreland .299 .437
3B Beltre .371 .509 RF Cruz .327 .506
C Pierzynski .297 .425 C Pierzynski .297 .425
1B Moreland .299 .437 CF Martin .313 .385
LF Murphy .282 .374 DH Profar .308 .336
DH Profar .308 .336 LF Murphy .282 .374
CF Martin .313 .385 SS Andrus .328 .331

 

 

 

 

 

 

 

Ian Kinsler is another guy who doesn’t scream “prototypical leadoff hitter,” basically in the sense that he’s not a speed-first centerfielder, but he is a pretty great one and easily the Rangers best option. So you have to give them credit for sticking with him instead of going to the more conventional, “easy” choice of Elvis Andrus or Leonys Martin. However, the Rangers lose a lot of value by keeping the speedy Andrus in the two-hole, a spot he really wasn’t suited for this season with a career-worst .327 OBP. Adrian Beltre is the perfect fit for the Rangers #2 hitter, and Andrus is better suited for the bottom of the order. With Andrus’ basestealing abilities, I think it would be wiser to switch his spot with Jurickson Profar‘s in this optimized lineup, giving Andrus the opportunity to attempt steals with the 8th and 9th hitters up, rather than the 1st and 2nd.

Los Angeles Angels of Anaheim

Common rpg: 4.864. Optimized rpg: 4.945. Season difference: -13.122 runs.

Rank: 7th AL, 15th overall

2013 OBP SLG Optimal OBP SLG
LF Shuck .331 .366 C Iannetta .358 .372
CF Trout .432 .557 CF Trout .432 .557
1B Pujols .330 .437 RF Hamilton .307 .432
RF Hamilton .307 .432 1B Pujols .330 .437
DH Trumbo .294 .453 2B Kendrick .335 .439
2B Kendrick .335 .439 SS Aybar .301 .382
3B Callaspo .324 .347 LF Shuck .331 .366
C Iannetta .358 .372 DH Trumbo .294 .453
SS Aybar .301 .382 3B Callaspo .324 .347

 

 

 

 

 

 

 

This one is similar to Detroit’s, but unlike Detroit’s, this one probably only works in theory. When Miggy is batting second and the entire team is slower than molasses in an igloo, I think you can get by with a slow-running, high-OBP guy like Victor Martinez leading off. When your #2 hitter is Mike Trout, you’re probably costing yourself extra bases on would-be Trout doubles and triples by having Chris Iannetta on first in front of him, likely having just drawn a leadoff walk. If Iannetta weren’t so slow and Trout weren’t so fast, Iannetta would actually be a pretty great leadoff hitter. Of all players with 350+ PA this season, only Joey Votto posted a higher BB% (18.6) than Iannetta (17.0). The Angels did do the right thing by putting Trout where he belongs in the two-hole, though. A .432 OBP is great for leadoff, but when you hit for more power than Giancarlo Stanton and Adam Dunn, some of those extra base hits go to waste leading off.

Seattle Mariners

Common rpg: 4.382. Optimized rpg: 4.439. Season difference: -9.234 runs.

Rank: 4th AL, 9th overall

2013 OBP SLG Optimal OBP SLG
SS Miller .318 .418 3B Seager .338 .426
2B Franklin .303 .382 DH Morales .336 .449
3B Seager .338 .426 RF Saunders .323 .397
DH Morales .336 .449 1B Smoak .334 .412
LF Ibanez .306 .487 LF Ibanez .306 .487
1B Smoak .334 .412 2B Franklin .303 .382
RF Saunders .323 .397 C Zunino .290 .329
C Zunino .290 .329 SS Miller .318 .418
CF Ackley .319 .341 CF Ackley .319 .341

 

 

 

 

 

 

 

The Mariners began the season with Dustin Ackley at second base and Brendan Ryan at shortstop. By the beginning of June, Ackley had hit himself back to AAA and not much later, the Mariners cut ties with Ryan’s offensive deficiencies in favor of rookies Brad Miller and Nick Franklin. Both held their own with the bat from the get-go, earning themselves the top two spots in the Mariners everyday lineup. However, despite holding their own, neither are really top of the order hitters with sub-.320 OBPs and just average power. Better suited for the top spots are the Mariners best player, Kyle Seager and most productive hitter, Kendrys Morales. Still, the Mariners performed well in the study, likely due to the similar profiles of most of their hitters.

Houston Astros

Common rpg: 4.133. Optimized rpg: 4.176. Season difference: -6.966 runs.

Rank: 1st AL, 3rd overall

2013 OBP SLG Optimal OBP SLG
LF Grossman .332 .370 LF Grossman .332 .370
2B Altuve .316 .363 C Castro .350 .485
C Castro .350 .485 2B Altuve .316 .363
1B Carter .320 .451 1B Carter .320 .451
DH Pena .324 .350 3B Dominguez .286 .403
RF Martinez .272 .378 CF Barnes .289 .346
CF Barnes .289 .346 DH Pena .324 .350
3B Dominguez .286 .403 RF Martinez .272 .378
SS Villar .321 .319 SS Villar .321 .319

 

 

 

 

 

 

 

The Astros place third in the study basically by default. It’s not hard to identify your best players and construct a near-optimal lineup when you’ve only got two league-average bats. Just put your best hitter, Jason Castro, in the two-hole, bat Chris Carter fourth to drive in runs and lead off your next highest OBP guy, who believe it or not is a one “Robbie Grossman” and the rest basically doesn’t matter because none of them are very good. Robbie Grossman is actually the most deserving leadoff batter on a real team in the Major League of Baseball. #Astros

Coming soon: Part 2, with National League lineups and conclusion.


The Best and Worst Four-Seam Fastballs of 2013

Introduction

What is the best pitch of all-time?  Is it Mariano Rivera’s cutter?  Is it Randy Johnson’s slider?  Is it Walter Johnson’s fastball?  I do not know.  What I do know is that this question is nearly impossible to answer, so let’s simplify things a little.  What was the best pitch thrown during the 2013 regular season?  On a rate basis, PITCHf/x would lead us to believe that the best pitch thrown by a qualifying pitcher was Yovani Gallardo’s cutter with a wFC/c of 4.95.  In other words, for every 100 cutters thrown by Gallardo, he saved 4.95 runs above a pitcher who throws an “average” cutter.  What does this really mean though?  This system of calculation is based off the changes in run expectancy due to the outcome of each pitch, which is extremely complicated and tedious to calculate.  I felt that there had to be a simpler way to quantify the quality of a pitch. 

Background

Back in August, I posted an article entitled “Baseball’s Most Extreme Pitches from Starters, So Far” that posited the idea of total bases per hit allowed.  In other words, I wanted to look at who was getting hit the hardest.  Now, it was rightly suggested in the comments that this wasn’t the greatest way to determine a pitch’s quality.  For example, let’s look at the following two extremely hypothetical examples.  One pitcher throws his fastball exactly 100 times.  In those 100 pitches, he throws 99 of them for strikes.  On the 100th pitch, he gives up a home run.  Now, by looking at TB/H, this pitch has a rating of 4.00, which is the worst possible rating.  However, he only gave up 0.04 total bases per pitch, which is excellent.  By comparison, the second pitcher throws exactly 100 fastballs as well.  He gives up 100 singles.  By TB/H, his fastball has a rating of 1.00, which is significantly better than the first pitcher.  However, he gave up 1.00 total bases per pitch, which is awful.  If a pitcher gave up a base runner each time he threw a pitch, he probably would cease throwing that pitch very quickly. 

That got me to thinking that total bases per pitch may be a much better way to determine the quality of a pitch, but there are also glaring problems with this method as well.  For example, 100 balls thrown in 100 pitches would a value of 0.00 total bases per pitch.  Clearly, a pitcher’s ability (or inability) to throw a pitch for a strike needed to be incorporated as well. 

Proposed Solution

To try and solve the problems suggested above, I propose the following simple formula:

adjTB/P = [1B + 2*2B + 3*3B + 4*HR + xBB] / Pitches

where,

xBB = Balls/4

With that said, I know some pitches are thrown out of the strike zone intentionally (i.e. the waste pitch).  At the end of the day, a waste pitch only puts you one step closer to walking a batter and adds one pitch to the pitch count.  Every coach would prefer their starter to throw a Maddux each time out, so efficiency is the name of the game.  In order to test this formula, let’s look at a sample calculation.

According to Baseball Prospectus and their PITCHf/x leaderboards, A.J. Burnett threw 614 four-seam fastballs this regular season.  On those 614 pitches, he allowed 10 singles, nine doubles, five home runs, and had 202 of those pitches called balls.  Burnett allowed 58 total bases and 50.5 xBB.  Doing some quick arithmetic, he allowed 0.1767 adjTB/P. 

At first glance, I’m sure your reaction is similar to my initial reaction.  Okay, so what does that mean?  On its face, a correct response may contain the words “I’m not really sure”.  If we look at the summation of each four-seam fastball thrown by starters this year, we find that the league allowed 0.1800 adjTB/P, so A.J. Burnett threw a slightly above average four-seam fastball this year.  To come to that conclusion though, you’d have to know both a player’s rate and the league rate.  We can present this information in a much nicer and easier to understand way. 

To do this, I decided to turn to the old standby from every scout in baseball, the 20-80 scale.  As you’re probably well aware, the 20-80 scale attempts to rate a player’s skills numerically.  50 is average.  60 represents exactly one standard deviation above average.  30 represents exactly two standard deviations below average, and so on and so forth.  By taking the weighted standard deviation of the data set, we can determine how many standard deviations above or below average a certain pitch is.  Looking at the full season data, the weighted standard deviation for four-seam fastballs is 0.0262 adjTB/P.  Another quick calculation tells us that A.J. Burnett rated as 0.13 standard deviations above average.  Converting that on a 20-80 scale rating, Burnett’s four-seam fastball gets a rating of 51.  On quick glance, the 51 rating makes much more sense than 0.1767 adjTB/P, which helps solve one of our problems.

Results

Now that we understand how to calculate the values and what they mean, let’s look at a scale for whose four-seam fastball really excelled and whose really was problematic.  To qualify for the full season, 600 total four-seam fastballs had to be thrown.  This gave me 103 qualified starting pitchers.  The Top 10 qualified starters were:

Rank

Pitcher

Rating

1

Lance Lynn

66

2

Anibal Sanchez

65

3

Matt Harvey

65

4

Zack Greinke

65

5

Jonathon Niese

62

6

Hector Santiago

62

7

Bartolo Colon

62

8

Madison Bumgarner

62

9

Clayton Kershaw

61

10

C.J. Wilson

60

 

For comparison, the Bottom 10 qualified starters were:

Rank

Pitcher

Rating

94

Ervin Santana

43

95

Ricky Nolasco

42

96

Jeremy Hellickson

42

97

Jason Vargas

40

98

Scott Diamond

40

99

Tim Lincecum

37

100

John Danks

35

101

Josh Johnson

35

102

Tom Koehler

34

103

Justin Grimm

31

 

On a monthly basis, a minimum of 100 four-seam fastballs had to be thrown.  The best and worst pitches each month this season were:

Month

Pitcher

Rating

Month

Pitcher

Rating

March-April

Anibal Sanchez

66

March-April

Brett Myers

23

May

Jose Quintana

67

May

Burch Smith

23

June

Tim Hudson

65

June

Dylan Axelrod

30

July

Anibal Sanchez

71

July

Justin Grimm

24

August

Rick Porcello

66

August

Andre Rienzo

20

September

Lance Lynn

68

September

John Danks

22

 

Only three starters qualified as above average in each month of the regular season.  Their monthly ratings are shown below.  No starter qualified as below average in each month this season. 

Pitcher

March-April

May

June

July

August

September

C.J. Wilson

53

51

61

57

64

55

Clayton Kershaw

56

56

52

58

65

60

Lance Lynn

63

62

58

55

53

68

 

I plan to continue this study by analyzing both other pitch types and relievers.  Baseball Prospectus provides data for the following pitches: four-seam fastball, sinker, cutter, splitter, changeup, curveball, slider, screwball, and knuckleball.  At the completion of all the pitch types, I’ll post the ratings for complete repertoires as well.  If well-received, I’ll try and provide monthly updates as next season rolls along.      


wRC for Pitchers and Koji Uehara’s Dominance

wRC is a very useful statistic.  On the team level, it can be used to predict runs scored fairly accurately (r^2 of over .9).  It can also be used to measure how much a specific player has contributed to his team’s offensive production by measuring how many runs he has provided on offense.  But it is rarely used for pitchers.

Pitching statistics are not so much based on linear weights and wOBA as they are on defense-independent stats.  I think defense-independent stats are fine things to look at when evaluating players, and they can provide lots of information about how a pitcher really performed.  But while pitcher WAR is based off of FIP (at least on FanGraphs), RA9-WAR is also sometimes looked at.  Now, if the whole point of using linear weights for batters is to eliminate context and the production of teammates, then why not do the same for pitchers?  True, pitchers, especially starters, usually get themselves into bad situations, unlike hitters, who can’t control how many outs there are or who’s on base when they come up.  But oftentimes pitchers aren’t better in certain situations, as evidence by the inconsistency of stats such as LOB%.  So why not eliminate context from pitcher evaluations and look at how many runs they should have given up based on the hits, walks, and hit batters they allowed?

To do this, I needed to go over to Baseball-Reference, as FanGraphs doesn’t have easy-to-manipulate wOBA figures for pitchers.  Baseball-Reference doesn’t have any sort of wOBA stats, but what they do have is the raw numbers needed to calculate wOBA.  So I put them into Excel, and, with 50 IP as my minimum threshold, I calculated the wOBA allowed – and then converted that into wRC – for the 330 pitchers this year with at least 50 innings.

Next, I calculated wRC/9 the same way you would calculate ERA (or RA/9).  This would scale it very closely to ERA and RA/9, and give us a good sense for what each number actually means.  (The average wRC/9 with the pitchers I used was 3.95; the average RA/9 for the pitchers I used was 3.96).  What I found was that the extremes on both sides were way more extreme (you’ll see what I mean soon), but overall it correlated to RA/9 fairly closely (the r^2 was .803).

Now, for the actual numbers:

wRC/9 IP
Koji Uehara 0.08 74.1
Tanner Roark 1.04 53.2
Joe Nathan 1.08 64.2
Greg Holland 1.17 67
Alex Torres* 1.24 58
Craig Kimbrel 1.41 67
Luis Avilan* 1.42 65
Neal Cotts* 1.43 57
Mark Melancon 1.52 71
Kenley Jansen 1.55 76.2
Clayton Kershaw* 1.59 236
Paco Rodriguez* 1.60 54.1
Luke Hochevar 1.65 70.1
Matt Harvey 1.69 178.1
Tyler Clippard 1.69 71
Jose Fernandez 1.80 172.2
Tony Watson* 1.89 71.2
J.P. Howell* 1.94 62
Bobby Parnell 2.00 50
Clay Buchholz 2.04 108.1
Glen Perkins* 2.09 62.2
Justin Wilson* 2.13 73.2
David Carpenter 2.13 65.2
Casey Janssen 2.15 52.2
Sean Doolittle* 2.16 69
Brandon Kintzler 2.17 77
Aroldis Chapman* 2.24 63.2
Luke Gregerson 2.29 66.1
Steve Cishek 2.30 69.2
Joaquin Benoit 2.31 67
Max Scherzer 2.32 214.1
Madison Bumgarner* 2.35 201.1
Sonny Gray 2.39 64
David Robertson 2.42 66.1
Jean Machi 2.44 53
Dane De La Rosa 2.46 72.1
Tyler Thornburg 2.56 66.2
Drew Smyly* 2.58 76
Jason Grilli 2.59 50
Stephen Strasburg 2.60 183
Danny Farquhar 2.64 55.2
Michael Wacha 2.66 64.2
Joel Peralta 2.67 71.1
Brett Cecil* 2.68 60.2
Brad Ziegler 2.69 73
Johnny Cueto 2.69 60.2
Tommy Hunter 2.69 86.1
Addison Reed 2.69 71.1
Bryan Shaw 2.72 75
Casey Fien 2.73 62
Mariano Rivera 2.77 64
Sergio Romo 2.81 60.1
Hisashi Iwakuma 2.81 219.2
Jose Veras 2.81 62.2
Cliff Lee* 2.81 222.2
Darren O’Day 2.82 62
Tanner Scheppers 2.85 76.2
Trevor Rosenthal 2.87 75.1
Yu Darvish 2.87 209.2
Adam Wainwright 2.88 241.2
Anibal Sanchez 2.88 182
Mike Dunn* 2.89 67.2
Jeanmar Gomez 2.90 80.2
Brian Matusz* 2.94 51
Charlie Furbush* 2.96 65
J.J. Hoover 2.97 66
Francisco Liriano* 2.98 161
Grant Balfour 2.99 62.2
Alfredo Simon 2.99 87.2
Jonathan Papelbon 3.04 61.2
Jesse Chavez 3.04 57.1
Tyson Ross 3.07 125
Gerrit Cole 3.07 117.1
A.J. Ramos 3.07 80
Craig Breslow* 3.07 59.2
Tom Wilhelmsen 3.07 59
Andrew Cashner 3.08 175
Chris Sale* 3.10 214.1
Felix Hernandez 3.10 204.1
Vin Mazzaro 3.10 73.2
Zack Greinke 3.11 177.2
Jim Henderson 3.12 60
Matt Albers 3.13 63
Sam LeCure 3.14 61
Anthony Swarzak 3.16 96
Jerry Blevins* 3.16 60
Henderson Alvarez 3.16 102.2
LaTroy Hawkins 3.17 70.2
Tony Cingrani* 3.17 104.2
Mike Minor* 3.18 204.2
Jordan Zimmermann 3.18 213.1
Tim Stauffer 3.21 69.2
Travis Wood* 3.21 200
Edward Mujica 3.21 64.2
Alex Cobb 3.22 143.1
Rex Brothers* 3.23 67.1
Justin Masterson 3.24 193
David Price* 3.24 186.2
Santiago Casilla 3.26 50
Ryan Cook 3.26 67.1
Brett Oberholtzer* 3.26 71.2
Bartolo Colon 3.27 190.1
A.J. Burnett 3.29 191
Danny Salazar 3.30 52
Josh Collmenter 3.31 92
Nate Jones 3.31 78
Chad Gaudin 3.33 97
Jamey Wright 3.33 70
Joe Smith 3.33 63
Homer Bailey 3.33 209
Marco Estrada 3.35 128
Hyun-jin Ryu* 3.36 192
Anthony Varvaro 3.36 73.1
Chad Qualls 3.38 62
Tim Hudson 3.38 131.1
Jarred Cosart 3.41 60
Scott Rice* 3.41 51
Chris Archer 3.42 128.2
Jake McGee* 3.43 62.2
Ervin Santana 3.48 211
Will Harris 3.48 52.2
Aaron Loup* 3.48 69.1
Yoervis Medina 3.50 68
Fernando Rodney 3.51 66.2
Huston Street 3.51 56.2
Burke Badenhop 3.51 62.1
Patrick Corbin* 3.53 208.1
Mat Latos 3.53 210.2
Ryan Webb 3.54 80.1
Jered Weaver 3.54 154.1
Rafael Soriano 3.56 66.2
Bruce Chen* 3.56 121
Scott Feldman 3.57 181.2
Shelby Miller 3.57 173.1
Alex Wood* 3.58 77.2
Matt Cain 3.59 184.1
Gio Gonzalez* 3.60 195.2
Craig Stammen 3.61 81.2
Hiroki Kuroda 3.62 201.1
Matt Moore* 3.62 150.1
Ryan Pressly 3.64 76.2
Dan Straily 3.64 152.1
A.J. Griffin 3.68 200
James Shields 3.68 228.2
Adam Ottavino 3.68 78.1
Pedro Strop 3.68 57.1
Cody Allen 3.68 70.1
Alexi Ogando 3.72 104.1
Jhoulys Chacin 3.73 197.1
Kyle Lohse 3.74 198.2
Jake Peavy 3.74 144.2
Cole Hamels* 3.76 220
Nathan Eovaldi 3.76 106.1
Carlos Torres 3.76 86.1
Andrew Albers* 3.78 60
Ricky Nolasco 3.80 199.1
Robbie Erlin* 3.80 54.2
Ross Ohlendorf 3.82 60.1
Dale Thayer 3.82 65
Jarrod Parker 3.85 197
Jose Quintana* 3.86 200
John Lackey 3.86 189.1
Julio Teheran 3.87 185.2
Cesar Ramos* 3.88 67.1
Ernesto Frieri 3.88 68.2
Steve Delabar 3.91 58.2
Ivan Nova 3.91 139.1
Matt Belisle 3.91 73
Ubaldo Jimenez 3.92 182.2
Kris Medlen 3.93 197
Wandy Rodriguez* 3.94 62.2
Kelvin Herrera 3.95 58.1
Justin Verlander 3.97 218.1
Garrett Richards 3.97 145
Charlie Morton 3.97 116
Matt Lindstrom 3.97 60.2
Tom Gorzelanny* 3.97 85.1
Jared Burton 3.97 66
Jeff Locke* 3.99 166.1
C.J. Wilson* 4.00 212.1
Tim Collins* 4.00 53.1
Seth Maness 4.00 62
Matt Garza 4.03 155.1
David Hernandez 4.03 62.1
Lance Lynn 4.04 201.2
Rick Porcello 4.04 177
Miguel Gonzalez 4.04 171.1
Carlos Villanueva 4.04 128.2
Derek Holland* 4.04 213
Robbie Ross* 4.05 62.1
Jim Johnson 4.05 70.1
Kevin Gregg 4.06 62
J.C. Gutierrez 4.08 55.1
Bryan Morris 4.09 65
Mike Leake 4.09 192.1
Joe Kelly 4.11 124
Zack Wheeler 4.11 100
Jon Lester* 4.12 213.1
Taylor Jordan 4.13 51.2
Bronson Arroyo 4.14 202
Tim Lincecum 4.15 197.2
Eric Stults* 4.17 203.2
Chris Tillman 4.18 206.1
Doug Fister 4.19 208.2
Junichi Tazawa 4.20 68.1
Corey Kluber 4.22 147.1
Logan Ondrusek 4.23 55
Jaime Garcia* 4.25 55.1
Tyler Lyons* 4.25 53
Jorge De La Rosa* 4.27 167.2
Yovani Gallardo 4.28 180.2
Wade Miley* 4.29 202.2
R.A. Dickey 4.30 224.2
James Russell* 4.30 52.2
Tyler Chatwood 4.32 111.1
Sam Deduno 4.33 108
Andy Pettitte* 4.35 185.1
Michael Kohn 4.37 53
Josh Outman* 4.38 54
Dillon Gee 4.38 199
Martin Perez* 4.39 124.1
Jake Arrieta 4.39 75.1
Shawn Kelley 4.39 53.1
Drew Storen 4.41 61.2
Preston Claiborne 4.42 50.1
Tommy Milone* 4.45 156.1
Wily Peralta 4.46 183.1
Scott Kazmir* 4.46 158
Felix Doubront* 4.54 162.1
Jeff Samardzija 4.55 213.2
Shaun Marcum 4.56 78.1
Dan Haren 4.58 169.2
Alfredo Figaro 4.58 74
Troy Patton* 4.60 56
Hector Rondon 4.62 54.2
Oliver Perez* 4.62 53
Trevor Cahill 4.63 146.2
Wei-Yin Chen* 4.63 137
Todd Redmond 4.64 77
Zach McAllister 4.64 134.1
Jonathon Niese* 4.65 143
Tom Koehler 4.65 143
Ronald Belisario 4.66 68
Jeremy Hefner 4.66 130.2
Jacob Turner 4.68 118
Kyle Kendrick 4.68 182
Chris Rusin* 4.70 66.1
Brandon McCarthy 4.70 135
Freddy Garcia 4.70 80.1
Randall Delgado 4.70 116.1
Wilton Lopez 4.72 75.1
Mark Buehrle* 4.73 203.2
T.J. McFarland* 4.74 74.2
J.A. Happ* 4.79 92.2
Jason Vargas* 4.80 150
David Phelps 4.81 86.2
Brian Duensing* 4.82 61
Hector Santiago* 4.84 149
CC Sabathia* 4.85 211
Nick Tepesch 4.88 93
Jeremy Hellickson 4.89 174
Wesley Wright* 4.93 53.2
Chris Capuano* 4.95 105.2
Donovan Hand 4.97 68.1
Jerome Williams 4.99 169.1
Adam Warren 5.01 77
Paul Maholm* 5.04 153
Jeremy Guthrie 5.08 211.2
Jonathan Pettibone 5.08 100.1
John Danks* 5.09 138.1
George Kontos 5.10 55.1
Edwin Jackson 5.10 175.1
Ian Kennedy 5.14 181.1
Brad Peacock 5.15 83.1
Bud Norris 5.16 176.2
Erik Bedard* 5.17 151
Travis Blackley* 5.18 50.1
Ryan Dempster 5.19 171.1
Kevin Correia 5.19 185.1
Erasmo Ramirez 5.20 72.1
Roberto Hernandez 5.20 151
Kevin Slowey 5.20 92
Aaron Harang 5.24 143.1
Jason Marquis 5.25 117.2
Jake Westbrook 5.27 116.2
Juan Nicasio 5.29 157.2
Heath Bell 5.35 65.2
Josh Roenicke 5.35 62
Esmil Rogers 5.38 137.2
John Axford 5.42 65
Mike Pelfrey 5.43 152.2
John Lannan* 5.45 74.1
Andre Rienzo 5.46 56
Ross Detwiler* 5.54 71.1
Jason Hammel 5.55 139.1
Stephen Fife 5.63 58.1
Edinson Volquez 5.65 170.1
Dallas Keuchel* 5.68 153.2
Jordan Lyles 5.70 141.2
Phil Hughes 5.71 145.2
Tommy Hanson 5.74 73
Luis Mendoza 5.79 94
Jeremy Bonderman 5.82 55
Brandon League 5.82 54.1
Roy Halladay 5.85 62
Chris Perez 5.94 54
Scott Diamond* 6.01 131
Ryan Vogelsong 6.04 103.2
Wade Davis 6.05 135.1
Justin Grimm 6.10 98
Paul Clemens 6.14 73.1
Lucas Harrell 6.23 153.2
Jeff Francis* 6.39 70.1
Brandon Morrow 6.39 54.1
Joe Saunders* 6.39 183
Jon Garland 6.40 68
Josh Johnson 6.45 81.1
Mike Gonzalez* 6.50 50
Wade LeBlanc* 6.54 55
Brandon Maurer 6.58 90
Barry Zito* 6.63 133.1
Carter Capps 6.64 59
Dylan Axelrod 6.82 128.1
Kyle Gibson 6.92 51
Joe Blanton 7.00 132.2
Clayton Richard* 7.14 52.2
Alex Sanabia 7.29 55.1
Tyler Cloyd 7.40 60.1
Philip Humber 7.62 54.2
Pedro Hernandez* 7.68 56.2
Average 3.95 110.2

The first thing that jumps out right away is that Koji Uehara had a wRC/9 of 0.08.  In other words, if that was his ERA, he would give up one earned run in about 12 complete game starts if he were a starter, which is ridiculous.  The second thing that jumps out is that most of the top performers are relievers – in fact, 12 out of the top 13 had fewer than 80 innings, with the only exception being Clayton Kershaw.  Also, the worst pitchers by wRC/9 had a wRC/9 much higher than their ERA or RA/9.  Pedro Hernandez, for example, had a wRC/9 of 7.68, and there were 6 pitchers over 7.00.  Kershaw actually has a wRC/9 that is lower than his insane RA/9, so maybe he’s even better than his fielding-dependent stats give him credit for.

But wait!  There’s more!  The reason we have xFIP is because HR/FB rates are very unstable.  So let’s incorporate that into our wRC/9 formula and see what happens (we’ll call this one xwRC/9):

xwRC/9 IP
Koji Uehara 0.06 74.1
Paco Rodriguez* 1.13 54.1
Luke Hochevar 1.25 70.1
Tyler Clippard 1.25 71
Craig Kimbrel 1.51 67
Kenley Jansen 1.63 76.2
Aroldis Chapman* 1.68 63.2
Greg Holland 1.69 67
Casey Fien 1.88 62
Joe Nathan 2.06 64.2
Tanner Roark 2.06 53.2
Neal Cotts* 2.12 57
Clayton Kershaw* 2.13 236
Max Scherzer 2.17 214.1
Huston Street 2.18 56.2
Jose Fernandez 2.23 172.2
Alex Torres* 2.26 58
Yu Darvish 2.28 209.2
Glen Perkins* 2.29 62.2
Matt Harvey 2.32 178.1
Tony Watson* 2.35 71.2
Stephen Strasburg 2.35 183
Mark Melancon 2.36 71
Johnny Cueto 2.38 60.2
David Carpenter 2.39 65.2
Luis Avilan* 2.41 65
Justin Wilson* 2.48 73.2
Tommy Hunter 2.49 86.1
Joaquin Benoit 2.50 67
J.P. Howell* 2.51 62
David Robertson 2.52 66.1
Madison Bumgarner* 2.54 201.1
Hisashi Iwakuma 2.56 219.2
Tony Cingrani* 2.57 104.2
Jason Grilli 2.66 50
Darren O’Day 2.67 62
Jose Veras 2.68 62.2
Marco Estrada 2.70 128
Casey Janssen 2.71 52.2
Travis Wood* 2.76 200
Sonny Gray 2.80 64
Grant Balfour 2.81 62.2
Clay Buchholz 2.81 108.1
Danny Salazar 2.81 52
Cliff Lee* 2.81 222.2
Steve Cishek 2.83 69.2
Sean Doolittle* 2.83 69
Jim Henderson 2.83 60
Carlos Torres 2.84 86.1
Edward Mujica 2.85 64.2
Kelvin Herrera 2.86 58.1
Brett Cecil* 2.87 60.2
Jake McGee* 2.89 62.2
Mariano Rivera 2.89 64
Joel Peralta 2.89 71.1
Ernesto Frieri 2.93 68.2
Michael Wacha 2.95 64.2
Anibal Sanchez 2.95 182
Luke Gregerson 2.98 66.1
Brandon Kintzler 2.99 77
Tim Stauffer 2.99 69.2
Tanner Scheppers 2.99 76.2
Brad Ziegler 2.99 73
Alex Cobb 3.05 143.1
Dane De La Rosa 3.05 72.1
Addison Reed 3.06 71.1
Travis Blackley* 3.08 50.1
Jerry Blevins* 3.09 60
Bobby Parnell 3.09 50
Freddy Garcia 3.11 80.1
Jeanmar Gomez 3.13 80.2
Ervin Santana 3.17 211
Jean Machi 3.19 53
Trevor Rosenthal 3.20 75.1
J.J. Hoover 3.20 66
Chris Archer 3.20 128.2
Sergio Romo 3.20 60.1
Alfredo Figaro 3.21 74
Drew Smyly* 3.22 76
Alfredo Simon 3.23 87.2
Jonathan Papelbon 3.24 61.2
Charlie Furbush* 3.24 65
Mike Dunn* 3.26 67.2
Wandy Rodriguez* 3.26 62.2
Tyson Ross 3.27 125
Justin Masterson 3.27 193
Felix Hernandez 3.29 204.1
Mike Minor* 3.32 204.2
Rex Brothers* 3.33 67.1
Homer Bailey 3.33 209
Adam Wainwright 3.34 241.2
David Hernandez 3.34 62.1
Bryan Shaw 3.34 75
John Lackey 3.35 189.1
Danny Farquhar 3.36 55.2
Randall Delgado 3.37 116.1
Chris Sale* 3.37 214.1
LaTroy Hawkins 3.38 70.2
Chad Qualls 3.40 62
Jordan Zimmermann 3.41 213.1
Matt Cain 3.43 184.1
A.J. Griffin 3.45 200
Zack Greinke 3.45 177.2
Joe Smith 3.45 63
Burke Badenhop 3.46 62.1
Chris Tillman 3.47 206.1
Andrew Cashner 3.47 175
David Price* 3.49 186.2
Scott Feldman 3.49 181.2
Miguel Gonzalez 3.49 171.1
Francisco Liriano* 3.50 161
Nate Jones 3.51 78
Shelby Miller 3.51 173.1
Bronson Arroyo 3.52 202
Jake Peavy 3.52 144.2
Ross Ohlendorf 3.53 60.1
Tim Hudson 3.53 131.1
Logan Ondrusek 3.54 55
Yoervis Medina 3.54 68
Kyle Lohse 3.55 198.2
Tom Gorzelanny* 3.56 85.1
R.A. Dickey 3.58 224.2
Dale Thayer 3.59 65
Sam LeCure 3.60 61
Josh Collmenter 3.60 92
Aaron Loup* 3.61 69.1
Jesse Chavez 3.62 57.1
Hyun-jin Ryu* 3.62 192
A.J. Burnett 3.62 191
Brian Matusz* 3.62 51
Gerrit Cole 3.63 117.1
Bryan Morris 3.64 65
Pedro Strop 3.66 57.1
Patrick Corbin* 3.71 208.1
Hiroki Kuroda 3.72 201.1
Matt Moore* 3.74 150.1
Brett Oberholtzer* 3.75 71.2
Dan Straily 3.75 152.1
Julio Teheran 3.76 185.2
Alexi Ogando 3.76 104.1
Anthony Swarzak 3.76 96
Shawn Kelley 3.77 53.1
Jered Weaver 3.79 154.1
Ryan Webb 3.81 80.1
Jaime Garcia* 3.82 55.1
Gio Gonzalez* 3.82 195.2
Matt Albers 3.83 63
Kris Medlen 3.84 197
Matt Garza 3.86 155.1
Jamey Wright 3.86 70
Craig Breslow* 3.88 59.2
Cody Allen 3.88 70.1
Preston Claiborne 3.89 50.1
Cole Hamels* 3.91 220
Rafael Soriano 3.91 66.2
A.J. Ramos 3.92 80
Bruce Chen* 3.93 121
Santiago Casilla 3.93 50
Todd Redmond 3.94 77
Rick Porcello 3.94 177
Bartolo Colon 3.95 190.1
Dan Haren 3.99 169.2
John Danks* 3.99 138.1
Craig Stammen 4.00 81.2
Tyler Thornburg 4.00 66.2
Fernando Rodney 4.00 66.2
Chad Gaudin 4.01 97
Will Harris 4.01 52.2
Tommy Milone* 4.01 156.1
James Russell* 4.01 52.2
Jarred Cosart 4.02 60
Robbie Erlin* 4.02 54.2
Troy Patton* 4.03 56
Scott Rice* 4.03 51
James Shields 4.03 228.2
Mike Leake 4.05 192.1
Jared Burton 4.05 66
Ubaldo Jimenez 4.05 182.2
Seth Maness 4.05 62
Jeremy Hefner 4.06 130.2
Vin Mazzaro 4.06 73.2
Tim Lincecum 4.07 197.2
Mat Latos 4.08 210.2
Junichi Tazawa 4.10 68.1
Eric Stults* 4.10 203.2
Garrett Richards 4.12 145
Adam Ottavino 4.12 78.1
Zack Wheeler 4.13 100
Andrew Albers* 4.15 60
Carlos Villanueva 4.16 128.2
Andre Rienzo 4.16 56
Jeff Samardzija 4.18 213.2
Jake Arrieta 4.20 75.1
Tom Wilhelmsen 4.21 59
Jim Johnson 4.21 70.1
Brad Peacock 4.22 83.1
Corey Kluber 4.22 147.1
Heath Bell 4.22 65.2
Wade Miley* 4.25 202.2
Michael Kohn 4.25 53
Martin Perez* 4.26 124.1
Ricky Nolasco 4.26 199.1
Matt Belisle 4.27 73
Charlie Morton 4.27 116
Jon Lester* 4.27 213.1
Scott Kazmir* 4.27 158
Roberto Hernandez 4.28 151
Jarrod Parker 4.28 197
Justin Verlander 4.29 218.1
Derek Holland* 4.31 213
Henderson Alvarez 4.31 102.2
Ryan Cook 4.32 67.1
Cesar Ramos* 4.33 67.1
Ivan Nova 4.33 139.1
Jeff Locke* 4.34 166.1
Andy Pettitte* 4.35 185.1
Ryan Pressly 4.36 76.2
Yovani Gallardo 4.36 180.2
Donovan Hand 4.36 68.1
Dillon Gee 4.38 199
Drew Storen 4.39 61.2
Alex Wood* 4.39 77.2
Tyler Lyons* 4.40 53
Nathan Eovaldi 4.41 106.1
Kevin Gregg 4.42 62
Wesley Wright* 4.43 53.2
Jose Quintana* 4.43 200
Anthony Varvaro 4.44 73.1
Steve Delabar 4.44 58.2
Jason Marquis 4.46 117.2
Oliver Perez* 4.48 53
Wily Peralta 4.48 183.1
Joe Kelly 4.49 124
Lance Lynn 4.49 201.2
J.C. Gutierrez 4.53 55.1
Roy Halladay 4.54 62
Jhoulys Chacin 4.54 197.1
C.J. Wilson* 4.55 212.1
Chris Rusin* 4.56 66.1
Erasmo Ramirez 4.56 72.1
Doug Fister 4.58 208.2
Aaron Harang 4.59 143.1
Hector Rondon 4.60 54.2
CC Sabathia* 4.60 211
T.J. McFarland* 4.62 74.2
Jeremy Hellickson 4.62 174
Sam Deduno 4.64 108
Nick Tepesch 4.64 93
Ian Kennedy 4.65 181.1
Wei-Yin Chen* 4.68 137
Robbie Ross* 4.68 62.1
Chris Perez 4.69 54
Jerome Williams 4.69 169.1
Trevor Cahill 4.70 146.2
Adam Warren 4.71 77
Hector Santiago* 4.75 149
Taylor Jordan 4.77 51.2
Ryan Dempster 4.79 171.1
Esmil Rogers 4.80 137.2
John Axford 4.80 65
Tim Collins* 4.81 53.1
Jeremy Guthrie 4.81 211.2
Tom Koehler 4.83 143
Matt Lindstrom 4.84 60.2
Felix Doubront* 4.86 162.1
Jorge De La Rosa* 4.89 167.2
Jason Vargas* 4.89 150
Paul Clemens 4.95 73.1
J.A. Happ* 4.95 92.2
Erik Bedard* 4.96 151
Paul Maholm* 4.97 153
Josh Outman* 4.99 54
Jacob Turner 5.00 118
Tyler Chatwood 5.00 111.1
Shaun Marcum 5.00 78.1
George Kontos 5.03 55.1
Jason Hammel 5.04 139.1
Brandon McCarthy 5.06 135
Zach McAllister 5.06 134.1
Brandon Morrow 5.13 54.1
Jonathon Niese* 5.17 143
Brandon League 5.17 54.1
David Phelps 5.18 86.2
Chris Capuano* 5.18 105.2
Clayton Richard* 5.21 52.2
Carter Capps 5.21 59
Ronald Belisario 5.26 68
Wilton Lopez 5.27 75.1
Dallas Keuchel* 5.28 153.2
Jonathan Pettibone 5.28 100.1
Juan Nicasio 5.34 157.2
Stephen Fife 5.34 58.1
Edwin Jackson 5.36 175.1
Mike Gonzalez* 5.39 50
Kevin Slowey 5.40 92
Josh Johnson 5.42 81.1
Phil Hughes 5.42 145.2
Mark Buehrle* 5.45 203.2
Bud Norris 5.46 176.2
Brian Duensing* 5.51 61
Josh Roenicke 5.52 62
Jeff Francis* 5.62 70.1
Scott Diamond* 5.64 131
Jordan Lyles 5.65 141.2
Justin Grimm 5.66 98
Tommy Hanson 5.67 73
Kevin Correia 5.67 185.1
Edinson Volquez 5.69 170.1
Lucas Harrell 5.72 153.2
Joe Blanton 5.73 132.2
Brandon Maurer 5.80 90
John Lannan* 5.85 74.1
Ryan Vogelsong 5.85 103.2
Jeremy Bonderman 5.87 55
Luis Mendoza 5.88 94
Kyle Kendrick 5.90 182
Jake Westbrook 5.93 116.2
Mike Pelfrey 5.95 152.2
Dylan Axelrod 6.11 128.1
Jon Garland 6.21 68
Wade Davis 6.22 135.1
Ross Detwiler* 6.24 71.1
Joe Saunders* 6.29 183
Alex Sanabia 6.62 55.1
Barry Zito* 6.63 133.1
Wade LeBlanc* 6.65 55
Kyle Gibson 6.70 51
Philip Humber 7.19 54.2
Pedro Hernandez* 7.32 56.2
Tyler Cloyd 7.73 60.1
Average 3.99 110.2

Not a huge difference, although we do see Uehara’s number go down, which is incredible, and Tanner Roark’s – the second-best pitcher by wRC/9 – nearly double.  Also, Tyler Cloyd becomes much worse, and is now the worst pitcher by almost half a run per nine innings.  Kershaw’s wRC/9 goes up by a considerable amount, so much so that his xwRC/9 is now higher than his RA/9.  All in all, however, xwRC/9 actually has a smaller correlation with RA/9 (an r^2 of .638) than wRC/9 does, so it isn’t as useful. 

Now, logically, the people who outperformed their wRC/9 the most would have high strand (LOB) rates, and vice-versa.  So let’s look at the ten players who both outperformed and underperformed their wRC/9 the most.  The ones who underperformed:

IP LOB% RA/9 wRC/9 RA/9 – wRC/9
Danny Farquhar 55.2 58.50% 4.69 2.64 2.05
Charlie Furbush 65 64.40% 4.57 2.96 1.61
Casey Fien 62 69.40% 4.06 2.73 1.33
Andrew Albers 60 60.40% 5.10 3.78 1.32
Nate Jones 78 62.90% 4.62 3.31 1.31
Joel Peralta 71.1 70.20% 3.91 2.67 1.24
Addison Reed 71.1 68.90% 3.91 2.69 1.22
Tom Wilhelmsen 59 69.90% 4.27 3.07 1.20
Jesse Chavez 57.1 66.90% 4.24 3.04 1.19
Koji Uehara 74.1 91.70% 1.21 0.08 1.13

We can see that everyone here – except for Koji Uehara, who had the fourth-highest LOB% out of all pitchers with 50 innings – is below the league average of 73.5%.  Only Uehara and Joel Peralta are above 70%.  Clearly, a low LOB% makes you allow many more runs than you should.  But what about Koji Uehara?  How did he allow all those runs (10, yeah, not a lot, but his wRC/9 was way lower than his RA/9) without allowing many baserunners to score and not allowing many damaging hits?  If you know, let me know in the comments, because I have no idea.

Now for the people who outperformed their wRC/9:

Rex Brothers 67.1 88.80% 2.14 3.23 -1.09
Donovan Hand 68.1 81.90% 3.82 4.97 -1.15
Stephen Fife 58.1 78.40% 4.47 5.63 -1.16
Jarred Cosart 60 85.90% 2.25 3.41 -1.16
Heath Bell 65.2 82.70% 4.11 5.35 -1.23
Chris Perez 54 82.30% 4.50 5.94 -1.44
Mike Gonzalez 50 80.30% 5.04 6.50 -1.46
Seth Maness 62 84.50% 2.47 4.00 -1.53
Adam Warren 77 84.70% 3.39 5.01 -1.62
Alex Sanabia 55.1 77.40% 5.37 7.29 -1.93

Just what you would expect:  high LOB%’s from all of them (each is above the league average).  Stephen Fife and Alex Sanabia are the only ones below 80%.

So what does this tell us?  I think it’s a better way to evaluate pitchers than runs or earned runs allowed since it eliminates context:  a pitcher who lets up a home run, then a single, then three outs is not necessarily better than one who lets up a single, home run, then three outs, but the statistics will tell you he is.  It might not be as good as an evaluator as FIP, xFIP, or SIERA, but for a fielding-dependent statistic, it might be as good as you can find.

Note:  I don’t know why the pitchers with asterisks next to there name have them; I copied and pasted the stats from Baseball-Reference and didn’t bother going through and removing the asterisks.


A Different Way to Look at Strikeout Ability

Mike Podhorzer has looked into the relationship of a batters’ average fly ball distance as it relates to their HR/FB ratio, and has found results that will allow others to more accurately project a hitter’s home run totals from year to year.

This got me thinking. Which can be a good or bad, but in this case, the authors’ labor produced a fruitful return. While a hitters’ HR/FB ratio can fluctuate indiscriminately from year to year, Podhorzer has proven a batters’ average fly ball distance is a better indication of a player’s true talent power production. In the same light, my study looks at how a player’s swinging strike rate (SwStr%) is a better indication of a pitcher’s strikeout potential than K/9.

My assumption was that K/9 and SwStr% have a strong relationship. But, how strong of a relationship is it? To find this out, I took all qualified starter seasons from 2003 to 2013, which gave me a sample size of 933 pitchers, and ran a correlation between their SwSTR% and their K/9. The results showed that there is an exceedingly positive correlation between SwSTR% and K/9, to the tune of a .807 correlation coefficient and a .65 R2.

Screen shot 2013-10-03 at 1.06.11 PM

What is important to note is that there are very few pitchers present in the sample with a SwStr% above 13%, which may be symptomatic of something larger. Getting batters to swing and miss is difficult. The more often you can get a batter to swing and miss, the more valuable you are as a pitcher. As a result, the higher the SwStr%, the smaller the sample size becomes. For example, Johan Santana (2004) and Kerry Wood (2003) are the two lone dots to the farthest right on the graph with SwStr% of over 15: wow.

After the relationship between SwStr% and K/9 ratio became unmistakable, I calculated what a particular SwSTR%s translates into, as far as K/9, with the formula Y=68.473*x+0.8435, and got this chart:

Screen shot 2013-10-03 at 1.55.30 PM

The next step is to take what we have discovered and apply it to a sample. The chart below shows each qualified pitcher for 2013, their SwStr%, xK/9, K/9, and K/9-xK/9.  xK/9 is what we would expect a pitcher’s K/9 to be based off of their SwStr%, and K/9-xK/9 shows us how much a pitcher over-performed or under-performed their SwStr% and xK/9.The first set of ten names are the pitchers who outperformed their xK/9 the most, and the second list of ten names are the players who underperformed their xK/9 the most.

Screen shot 2013-10-03 at 2.44.59 PM

The results show that Ubaldo Jimenez, Yu Darvish, and Jose Fernandez are the pitchers who have outperformed their xK/9 the most in 2013. These three pitchers also have great a great amount of deception and/or command (deception in Jimenez’s case: because, no one has ever called Ubaldo a control artist). And, while they may have outperformed their true talent in 2013 to an extent—they all had remarkable years—maybe that deception and control, which SwStr% does not take into account, leads to less swings by batters and more pitches taken for strikes, as opposed to swung at for strikes.

Perhaps xK/9 is more helpful when we look at pitchers who underperformed their SwStr%, like Jarrod Parker and Kris Medlen. Both of these pitchers had down years compared to what their projections suggested, but their xK/9s seem to be optimistic about their futures. Parker showed a .18 improvement in his K/9 from the first half to the second half of the season, while Medlen showed almost a full point improvement going from a 6.81 K/9 in the first half to a 7.67 K/9 in the second half.

While xK/9 may miss something—deception and command—when it comes to pitchers that outperform their SwStr%, xK/9 seems to find a reason to be optimistic when it comes to pitchers like Kris Medlen and Jarrod Parker who have underperformed their SwStr% and strikeout potential.

Devon Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading him, follow him on Twitter @devonjjordan.


Why Colby Rasmus Should be Considered One of the Game’s Great CFs

When Colby Rasmus was dealt to the Blue Jays from the St. Louis Cardinals in a blockbuster trade on July 27, 2011, there were mixed emotions in Toronto regarding the deal. On the one hand, he was (and arguably still is) just a few years removed from being a blue-chip, five-tool prospect with power and plus defense. On the other, there was the much-publicized family feud with then-Cards manager Tony La Russa, the seemingly lethargic attitude at bat, in the field and in media interviews (a reputation unaided by his laid back, southern drawl), the strikeouts, and most of all: the unshakeable stigma of not living up to his foretold potential.

The overwhelming consensus as the deal was struck and following it as well was one of relative indifference, and with good reason. After coming over from St. Louis in 2011 he didn’t exactly set the world on fire in a 35-game stint with the Blue Jays (.173/.201/.316). His slash line from last year (.223/.289/.400) seemed to be building on 2011 and the mounting strikeouts failed to endear him to a disgruntled Blue Jays fan base hungry for something to cheer about. With his average at .225 on April 28, this year looked to be more of the same. It has become increasingly clear however, that something has changed this season. Before being sidelined by an oblique injury, Colby was putting together an impressive season—despite receiving next to no credit for it. He hadn’t missed a step since coming off the DL either, homering in each of the four games after returning, becoming just the 10th different Blue Jay to do so.  Unfortunately, it lasted just six games as Rasmus was sidelined by an errant Anthony Gose warm-up throw to the face.

Let’s start with the traditional measures: a .276 average, 22 home runs, and 66 RBI to go along with an .840 OPS in 118 games. He is now one of only four Blue Jays all-time to hit 20 bombs in back-to-back seasons (the others are Vernon Wells, Jose Cruz Jr., and Lloyd Moseby). He will finish up one off his career high of 23 home runs set last season and 9 off his best mark of 75 RBI also set last year. These totals would obviously be higher as well had Rasmus not missed a month due to an oblique strain and even more time because of his facial injury. As of September 19 (before going down for a second time and after missing a month), he was near the top of many major statistical categories for AL centre fielders: second in home runs (22), slugging (.507), and OPS (.845); third with 66 driven in and 4th with 49 extra base hits. Last year, Colby went 24.2 plate appearances in between home runs and this season he was at 20.8, which practically equates him with Baltimore star Adam Jones (Jones went deep every 20.9 plate appearances). Colby’s 2013 home run prowess on average per game as a centre fielder is also superior to the likes of Carlos Gomez, Mike Trout, Shin-Soo Choo, and Andrew McCutchen. He went deep every 6.6 games in 2012 and every 5.4 on average this year. Even with the time he’s missed, only Jones, McCutchen, Trout, and Gomez have driven in more runs as a centre fielder in Major League Baseball. Rasmus’s .840 OPS is surpassed only by Trout, McCutchen and Choo. Those are impressive stats and equally impressive company to be grouped with.

Even given the aforementioned statistical information, there are always those who will refuse to qualify a player’s worth and contribution without the use of sabermetrics and so in fairness this aspect must be investigated as well. I cannot pretend I understand the drawn out calculations though I understand what the numbers mean. I will be firstly using Baseball-Reference’s WAR data summarized by ESPN. Colby Rasmus has a wins above replacement of 4.8, fifth best of any centre fielder in baseball. Simply put, the number is great and to put it in perspective, he trails just Trout, Gomez, McCutchen and Jacoby Ellsbury in this regard. He is ahead of players considered league-wide to be great, or at least above-average: Adam Jones, Shin-Soo Choo, Austin Jackson, Desmond Jennings, Andre Ethier, Matt Kemp (in limited time), Michael Bourn, Denard Span, and Curtis Granderson (in limited time, and might be over the hump, I know) to name a few prominent ones. FanGraphs also puts Rasmus at 4.8 WAR, and according to their rating system both Baseball-Reference and FanGraphs would qualify him as an All-Star (a player with 4-5 WAR is deemed All-Star-worthy).

As we have seen, Rasmus obviously brings quite a bit to the table offensively, but what about defensively? What if I were to suggest that he has a better defensive WAR and range factor than Mike Trout? Or that there are only three players with over 100 starts in centre (Leonys Martin, Ellsbury, and Gomez) that have a greater dWAR than Rasmus? And only three with a better range factor? These are all in fact true statements. He sits at 1.6 dWAR compared to -0.8 for Trout and has a 2.77 range factor compared to Trout’s 2.61 mark. Obviously Trout’s oWAR (10.1) and WAR (9.2) are off-the-charts good and this is not an attempt to bolster Colby Rasmus at the expense of Mike Trout. But a point needs to be made, so bear with me. Mike Trout’s dWAR was 2.2 last year in 108 starts in centre and as aforementioned, it is -0.8 this year in 106 starts. His range factor was 2.7 last year and 2.61 this season. He had 268 total chances in 886 innings in 2012. In 2013, he had only 273 in 937 full innings in centre field. He is less valuable defensively to the Angels, has apparently less range, and has gotten to fewer balls.

2013 Mike Trout vs. Colby Rasmus

WAR dWAR oWAR Range Factor
Mike Trout 9.2 -0.8 10.1 2.61
Colby Rasmus 4.8 1.6 3.5 2.77

 

However, a crucial point remains: Trout made a name for himself (and rightfully so) last year as an elite defensive player to complement his superb offensive skills. His reputation as a defensive wizard has stuck with him into this season—there has not been any mention about any defensive regression. Instead he is heralded as a possible MVP candidate despite the fact the Angels will miss the postseason as they did last year. And just as Trout’s reputation as an above-average fielder has outlasted his ability (only up until the end of 2013), the opposite has been true for Rasmus. His status as an underachieving strikeout machine has overshadowed his amazing progression as an all-around player. Consider the power, the average, runs driven in, and OPS combined with the much-improved wins above replacement numbers (overall, offensive, and defensive). His overall WAR of 4.8 is a career high by over one full win (3.6 in 2010), defensively he has improved every season since 2010 and now sits at 1.6. Offensively he is at 3.5 wins above replacement and has improved by at least two runs every year in that category since becoming a Blue Jay.

Colby Rasmus as a Blue Jay

WAR dWAR oWAR
2011 -1.0 -0.0 -0.9
2012 1.7 1.0 1.1
2013 4.8 1.6 3.5

 

I think it is safe to say that he has become more of a well-rounded player but more importantly, he is on an upward trajectory. Conversely, take the highly-coveted, soon-to-be free agent Shin-Soo Choo, who at age 30 is seemingly regressing defensively (a career-worst -1.9 dWAR both this and last year). His offensive numbers are impressive, don’t get me wrong, but it remains to be seen how much longer he can be an effective outfielder. A .424 on base percentage with 20+ homers is nice, but Baseball-Reference reveals that his WAR (4.0) is still lower than Rasmus’s (4.8) with the latter seemingly on an upswing. I do think Choo is good, but it is all but certain that he will be overpaid and consequently Colby Rasmus will look like a far better option.

I believed I have put forth at least a half-decent argument that Colby Rasmus is extremely valuable and even elite. I argue that his numbers on average rival the best in the game at his position and that he should get a little more credit for his impressive body of work. Some would point out that perhaps I have not examined his numbers from all possible perspectives, which I plan to do now using various data presented by FanGraphs. A comparison of this season with his 2011 and 2012 campaigns reveal ominous similarities. He struck out 29.5% of the time in 2013, which is actually up from 23.8% last year, and walked an insignificant 0.6% of the time more often (he still only walks 8.1% of the time). His BB/K ratio is also down to 0.27 from 0.32 in 2012 and 0.43 in 2011. He swung at 29.3% of pitches out of the zone in 2013 compared to 31.8% last year, and while perhaps showing a bit more patience, the number from this season equals his career average exactly. He has swung at basically the same amount of pitches inside the zone this year and last, and 2013’s mark of 67.2% is slightly off his career average of 70.6%. As for the balls he made contact with: while the percentage of pitches he made contact with inside the zone is almost exactly the same as 2012, the pitches he made contact with outside the strike zone was at 55.4% from 62.2% last season. So is he simply getting lucky by swinging and missing more often, thereby not making weak outs and having a shot at the next pitch? There may be some truth to that considering (as we have seen) that he swings at almost the same amount of pitches out of the zone as last year. On the other hand, he did strike out more in 2013 than 2012, which may discount the luck idea. The main bullet point here is that there does not seem to be much deviation from this year and the two preceding it and that there must be another explanation to help explain his success.

Based on these findings, one might think Rasmus would have had a similar year in 2013 to 2012 and 2011. But the numbers do not corroborate this as we have clearly seen. So what is different? BABIP. Rasmus has the worrisome distinction of having an unusually high batting average on balls put in play. BABIP can have a profound effect on a player’s batting average and a player with an unusually high or low BABIP will likely regress back to their career rate the following season. Proponents of sabermetrics will also convey that a very high BABIP may suggest that a player is having a fluky season. As for Rasmus, his batting average on balls in play was .356 this season compared to .259 last year and .267 in 2011. During his breakout 2010 campaign, it was .354. These are not small discrepancies. He hit .276 both this year and in 2010 and .223 and .225 last year and 2011, respectively. There is a definite link and it seems to have to do with BABIP. He has averaged .298 over his career in that department, which is considered normal.

So for the most part, he has been either well above or below it throughout his five years in the big leagues. Is he just having an especially good year? We won’t know until next season if he will regress but there are a few reasons to think he will be fine. His 2010 and 2013 numbers are more of what people expect than the years in between based on his ability. Maybe a .356 batting average on balls in play isn’t outrageously high and maybe 2012 was the fluky year. This season Rasmus hit a greater percentage of balls in play for line drives (22.0%) than ever before in his career (average: 19.5%). Also, more of his fly balls left the yard this season (13.2% last year and 17.3% this season). So maybe he is hitting the ball harder, and a few extra fly balls are hanging up just long enough to clear the fence. Although, ESPN’s Home Run Tracker considers just three of his home runs to have “Just Enough” distance while the other 19 were no doubters or had cleared the wall by “Plenty”. Another interesting point is that Mike Trout’s batting average on balls in play over the last two years is .379. Will he be able to keep that up? It is as much a question for Trout as it is for Rasmus.

This analysis of course not definitive but it merely is alternative to the fluke theory. It is possible that Rasmus can repeat his stellar 2013 season. One thing is clear though: this year, he was up there with the best centre fielders in the business. This was shown using traditional measures as well as new-age sabermetrics. He was near the top in most significant offensive and defensive categories and had he not been hurt he would have set career-highs and perhaps received a little more (and well-deserved) credit. He flew under the radar and it’s unfortunate that he is not appreciated as he should be. If he has a good 2014, I believe he will finally shake the lackadaisical, under-achieving, strikeout machine stigma and instead be seen as a quietly confident, budding star with an ability to hit for average and power to go along with graceful and effortless defense.