## The Case for No Starting Pitchers in the National League

I’ve watched many a baseball game over my lifetime (that’s 50+ years), and I’ve cringed every time I see a National League manager send his starting pitcher up to bat any time prior to the seventh inning. Especially with runners on base! Doesn’t he know that pitchers can’t hit? Doesn’t he know that if he would just pinch-hit for the lame-batting starter he’d improve his team’s chances of winning?

So, after years of pondering this problem for five seconds at a time every couple of days, I decided to see if I could build a solid quantitative case for never letting a pitcher come to the plate for a National League team (obviously this is not an issue for the American League with their designated hitters). How would this change the look of the team’s pitching staff? And more importantly, how many more games would a team expect to win in a season if they adopted a “pitchers never bat” strategy?

The answer to the first question is pretty easy. The staff would “look” different. There were would be no more “starting pitchers.” A team’s pitching staff would consist only of “relievers.” Sure, one of the “relievers” would throw the first pitch of the game and could technically be called a “starter,” but given that he’ll be taken out of the game as soon as his spot in the batting line-up comes up, he’s effectively a “reliever,” just like the other 10 or 11 guys on the staff.

Now, the conventional wisdom would say that the current starting pitchers, especially the “aces,” get in a groove, and can give you six or seven solid innings. Why would anyone take them out the game in the second or third inning? Well, let’s do a “cost-benefit” analysis and see if we can make a case for “The Pitchers Never Bat” strategy.

Key Components of the Case:

The two primary components of the analysis are 1) how many more runs would a team expect to score in a season by pinch-hitting for every pitcher, and 2) how many more runs would a team expect to give up in a season because their starting pitchers are no longer going six, seven, or more innings in an outing? Or, maybe the team adopting such a strategy would actually give up FEWER runs per year by giving up on the century-old strategy of planning for the starting pitcher to pitch deep into the game.

A third component of the analysis could include the benefit of being able to choose from any of the team’s entire staff (probably 11 or 12 pitchers) and use only the ones that look like they’ve got their “stuff” while warming up before the game, instead of sticking with the “starter” who is scheduled to pitch today because it’s his turn in the “rotation.”

A fourth component of the analysis could include the benefit a team could achieve because the other team can no longer stack their starting batting order with a lot of lefties (to face a right-handed starter), or with lot of righties (to face a left-handed starter), because the team with no “starters” will pinch-hit for their first pitcher after one, two, or three innings. So, in total, the “handedness battle” tilts slightly more in favor of the team implementing the new strategy.

A fifth component could include the cost (or benefit) of reducing the size of the pitching staff by one or two, and adding one or two more everyday players, who would be needed to pinch-hit in the early innings.

A sixth component could be an added benefit that batters will not be able to get “used to” a pitcher by seeing them multiple times in a single game. Under the new strategy batters will see each pitcher once, or, at most, twice in a game.

I’m going to focus on the two primary components above, and let the lessor components alone for now. Perhaps others can weigh in on how to quantify the potential impacts of these changes.

Component #1: How much more offense will the “Pitchers Never Bat” strategy create?

This is the easiest of the components to quantify. I will use the wOBA (weighted On Base Average) statistic as defined and measured by FanGraphs to evaluate this component. Let’s start with some basic information and rules-of-thumb.

Using data from the National League for the 2015 season I find that pinch-hitters have a wOBA of .275 across the entire league, while pitchers, when batting, had a wOBA of just .148 across the entire league. The difference in wOBA between pinch-hitters and pitchers is .127 (that’s .275 minus .148.) Note that all position players in the NL combined for an average wOBA of .318 in 2015. I’m assuming that our new pinch-hitters won’t get anywhere near that figure, but will be comparable to the 2015 pinch-hitters, who came in way lower, at .275.

Now, let’s assume we can replace every pitcher’s plate appearance (PA) with a pinch-hitter. This improvement of .127 in wOBA needs to be applied 336 times per season, because that was the average number of times that a National League team sent their pitchers up to the plate in 2015. And lastly, we need to know two rules of thumb from FanGraphs that are needed to complete the analysis of the first component: 1) every additional 20 points in wOBA is expected to result in an additional 10 runs per 600 plate appearances, and 2) every 10 additional runs a team expects to score in season translates into one additional win per year. OK – so, let’s do the math:

If 20 additional points of wOBA translates into 10 runs per 600 PA, then our new pinch-hitters who are now batting for pitchers will provide the team with 63.5 incremental runs per 600 PA (which equals 127/20 * 10.) And since these pinch-hitters will be coming to the plate 336 times, not 600 times, we need to reduce the 63.5 incremental runs per season down to 35.6 incremental runs per season (which is 336 / 600 * 63.5).

Finally, the last step is to take our 35.6 incremental runs per season and translate that into incremental wins per year using the rule-of-thumb that ten runs equates to one win. Therefore, our 35.6 extra runs results in an expected 3.6 incremental wins per year. That’s a decent-sized pick-up in expected wins.

OK, so now, what about the pitching staff? Will replacing the conventional pitching staff with a staff consisting of no starters and all relievers cause the runs allowed to increase, and if so, by how much? Enough to offset our 3.6 extra wins that we just picked up on offense?

Component #2: How many more runs will pitchers give up using the “Pitchers Never Bat” strategy?

Imagine, for the moment, that a GM is to build his pitching staff from scratch. (We’ll worry about how to transition from a conventional staff to an all-reliever staff later.) And let’s just assume he’ll pick just 11 pitchers. (Most NL teams use 12-man staffs while some use 13, so that will give the team one or two additional position players.) Currently, starting pitchers typically throw 160-200 innings per season, and relievers tend to throw 50-80 innings per season. But with the new all-reliever strategy, and using only 11 pitchers, each of our new guys will need to average around 130 innings each, with perhaps some pitching as much as 160, and some as low as 100 innings per year. So, the GM is looking for 11 guys who can each contribute 100-160 innings per season. Each outing will be for about one to three innings for each pitcher. How will they fare?

Let’s look at the National League’s pitchers for 2015. Starting pitchers had an aggregate WHIP (Walks Plus Hits per Inning Pitched) of 1.299, while relievers, in total, recorded an identical WHIP of 1.299. So my takeaway from this is that the average starter was equally as good (or bad) as the average reliever. From this, I am going to take a leap of faith, and assume that a staff of 11 new-style relievers could be expected to perform equivalently. (And that doesn’t even factor in some of the lesser elements of the new strategy, as mentioned above, such as Components 3 and 4 of the analysis.)

From this, albeit simplified, evaluation of Component #2, I estimate that a team moving to an all-reliever pitching staff will have an expected change in Runs Allowed of zero, and therefore the change will neither offset, nor supplement, the offensive benefit evaluated in Component #1.

Conclusion and Final Thoughts

In summary, using the two primary components of my analysis, I estimate that adopting a “Pitchers Never Bat” strategy in the National League (a.k.a. an “All Reliever Pitching Staff” strategy) will improve a team’s offense by an expected 36 runs per year, which will increase the team’s expected win total by 3.6 games. I estimate that the impact on runs allowed will be near zero. Some lesser elements, Components #3 through #6, could also add some additional value to the strategy.

Implementing the strategy does not necessarily need to be a complete, 100% adoption of the “pitchers never bat” rule. Modifications can be made. Perhaps a pitcher is doing well through two innings and comes to bat with two out and no one on base. In this case the manager could let the pitcher bat, so that he can stay in and pitch another two or three innings. This would change the name of the strategy to something like the “Pitchers Very, Very Rarely Bat” strategy.

As far as transitioning to an all-reliever staff from a conventional staff, it could be done over time, or only in part, such that a team could maintain, say, its two top aces, and complement them with eight or nine relievers. This way, the aces could pitch as they do now, going six-plus innings, every fifth day, while limiting the “Pitchers Never Bat” strategy to the three out of the five days when the two starters are resting.

Finally, let’s try to put a dollar value on this new strategy. The guys at FanGraphs, and other places, have tried to estimate how much teams are willing to pay for each additional win. Without going into all the various estimates and approaches at trying to answer that question, let’s just go with a simple $8 million per win. I’m sure it could be argued to be more or less, but let’s just put$8 million out there as a base case. If that’s true, a 3.6-win strategy, such as the “Pitchers Never Bat” strategy, is worth about $29 million per year. Go ahead and implement the strategy now, and, if it takes, say, three years before any of the other NL teams catch on, you’ve just picked up a cool$87 million (3 * 29 million).

And if the other components of the analysis (#3 through #6) are quantified and it can be determined that they add another 0.5 wins per year, which I think is quite doable, then we can get the total up to 4.1 wins per year, for a value of $33 million per year, or just around a cool$100 million over the first three years. And that’s how you make $100 million without really trying! ## Predicting the Next 300-Game Winner With the special attention pitchers receive today, such as pitch counts, innings limits, as well as the host of PITCHf/x data that can notify teams of when a pitcher is fatigued, it seems like they days of 300-game winners have come and passed. And for the most part, some of this is true. We’ve seen pitchers be shut down during their earlier years to prevent injuries, such as the Nationals keeping a close eye on Stephen Strasburg. When we think of 300 wins, the math isn’t that hard. It’s some combination of 15+ seasons of 15+ wins over an entire career. Let’s dive in to what further breaks down these pitchers. I gathered data on pitchers who finished their careers after 1980 as well as pitchers younger than that; I did this to avoid looking at pitchers such as Cy Young who are a little tough to compare to the modern day, with rule changes and the different run-scoring environments. In my query, I looked at pitchers with at least 250 wins. This gave me more data, and since 250-win pitchers are reasonably close to 300, it will allow me to get at what exactly creates a pitcher of this caliber. My list included 19 names: Greg Maddux Roger Clemens Steve Carlton Nolan Ryan Don Sutton Phil Niekro Gaylord Perry Tom Seaver Tom Glavine Randy Johnson Tommy John Bert Blyleven Fergie Jenkins Jim Kaat Mike Mussina Jamie Moyer Jim Palmer Andy Pettitte Some of these guys were absolute iron men, pitching over 5000 innings in their career. Maddux did this, as well as Carlton, Ryan, and Sutton. Most of this group barely reached 12 wins per season, showing that they reached the 300-club with longevity, not necessarily dominance. The other guys on this list, by default, either had higher win totals or pitched forever, but without racking up a ton of innings (Kaat, Moyer). Surprisingly, or perhaps not, only four of the 19 pitchers did not pitch for 20 seasons, so again, dominance might not be the key factor — instead, longevity. I then looked at where these pitchers were at when they were 30 years old. Thirty years seems to be about a halfway point, but the data indicates otherwise. In fact, only three of these 19 pitchers had at least 150 wins at 30. This again drives home the point that these pitchers do not necessarily have to be untouchable every single year they pitched; it just means they have to be pitchers that stay healthy and can pitch for a long, long time. At the same time, the average pitcher on this list had 115 wins at 30, so they did need to have a productive youth in terms of racking up wins. Here is a table displaying the careers of our 19 pitchers: The amazing part, at least in my opinion, is that these pitchers almost seemed to get better with age, at least in terms of wins. I know that wins is not a good stat for tracking the effectiveness of pitchers, but since we are talking the 300-win club, it is what we have in front of us. Anyways, 17 of these 19 pitchers had more wins after 30 than they did before. Again, this hammers home the idea that longevity and durability is more important than complete dominance. Yes, you have to be a good, if not great, pitcher, but you also have to stay healthy. So when looking at current pitchers that possibly have a chance at 300, I filtered through active pitchers fulfilling a few different qualifications. First, the pitcher must have at least 190 innings pitcher per year, including years of injuries (this helps get at longevity and durability). Also, the pitcher must also average at least 12 wins per year. I came up with a group of pitchers who where close to matching these requirements. From this list of 14 pitchers, I think eight or so have the best chance of eclipsing 300. Here is a table of possible contenders: This list includes: Clayton Kershaw, Chris Sale, Justin Verlander, Madison Bumgarner, David Price, Rick Porcello, Jon Lester, and Felix Hernandez. CC Sabathia, although at 223 career wins, does not make this list, since I don’t think he has 5-8 more seasons of decent pitching in front of him. I will go into each pitcher in more detail to describe what each pitcher needs to do to have a chance. I’m going to start with Lester. Lester is currently at 146 wins, with 2003 regular-season innings pitched. He has been great through his first 11 seasons, in nine of which he was a full-time starter. In those nine seasons, he failed to pitch 200 innings just once, when he posted 191.2 innings pitched. He has been an iron man, and at age 32, the recipe is simple. He just needs to stay healthy and he needs his game to age well. This is going to be a repetitive theme, but to be honest, that’s what we would expect. Things helping Lester? Well, playing for the Cubs is one. Not only do they have a great defense, but they also create great run support, which can help Lester pick up a lot of wins. He was 19-5 this past year, matching his career high in Boston in 2010. Now on to Justin Verlander. After an injury-riddled 2015, Verlander was great this year, posting a 16-9 record and an ERA of 3.04 (FIP of 3.48). Currently, he sits at 173 wins and is 33 years old. I mentioned his injury struggles in 2015. He only pitched 133 innings. In his 11 years as a full-time starter, that was the only the second time he failed to reach 200 innings pitched. People may worry that Verlander is starting to lose his velocity, which could mitigate his effectiveness, but in 2016, he struck out batters at a career-high rate and also had a career-best strikeout to walk ratio. Verlander is back with the elite, and if he can avoid injury trouble, he deserves to be in the discussion for a possible 300-win flirtation. I’ll now move on to Clayton Kershaw. Kershaw has been the best pitcher in baseball for the past five years, and has only struggled with injuries for this past year, when he hit the DL with back issues. He still picked up 12 wins, and looked like peak Kershaw when he came back. Kershaw continues to strike out hitters and not allow walks, and in his shortened 2016, he posted a career-best FIP. Kershaw currently sits at 126 wins, and is 28 years old, in the middle of his prime. I think there are two factors that could keep Kershaw from getting close. The first one is his back. The Dodgers shut Kershaw down for half the year, and hopefully it heals, but if it is one of those lingering injuries that can also affect his timing a delivery as well as his overall health, he won’t be able to age his game to the necessary limits needed to hit 300. Also, he should get more wins. I’m not sure this will be a big factor now that the Dodgers have Andrew Friedman at the helm, but if he cannot get the run support he needs, that could lead to two or three fewer wins every year. Chris Sale is next. Sale sits at 74 wins and is 27. He has some work to do. He has been relatively healthy, however, over his five full years as a starter. I think the best bet for Sale is to get out of Chicago, or at least the White Sox, and get on a team that can give him some good defense and offense. His win totals just aren’t high enough, but he is young enough where if he finds a new team and can age well, he might be able to hit 250. I’ll do Bumgarner next. He really hasn’t had any injury trouble in his six years as a full-time starter. He is 27 and has 100 wins. He is a little harder to project, but I would say he’s got a better shot than Sale. I mean, he is already at 100 and only 27. Kershaw might have a leg up on him, but MadBum has been able to stay healthy. To be honest, Kershaw had been healthy too before this year, which somewhat shows that pitching 20 full seasons does not happen to often. Anyways, Bumgarner hasn’t quite been as dominant as some of the other names on this list, but he has been very good, and has stayed healthy. He is on a solid team with a good defense. The conditions are correct, he just needs to age well and stay healthy. I still like Kershaw’s odds a little more, but Bumgarner’s are not far behind. Now I’ll move on to David Price. Price is 31, has 121 wins, and has pitched relatively healthy for seven full seasons. He is on the Red Sox now, which — although their poor defense won’t help some of his pitching metrics, they should give him the run support he needs. He wasn’t terrible this year; I have a feeling people think he fell off the map. He had 17 wins, and a ERA of 3.99 and a FIP of 3.60. His ERA and FIP were at career highs, but the FIP really wasn’t too far off what we’d expect. I’d credit the higher ERA to playing in Fenway with not the best defense behind him. Price may not be as dominant as he once was, but the Red Sox should give him support. He might be a little behind pace, but he could be the next CC Sabathia or Mike Mussina, where upon retirement, we say, “I didn’t realize he had 260 wins!” For the record, I doubt CC gets there, but the point is that if Price can stay healthy and moderately effective on a team that will support him, he may be able to move up in the wins chart. Will he hit 300? I don’t see it, but realistically, I’m not sure any of these guys will. Now I’ll move on to the other Red Sox pitcher on this list: Rick Porcello. Porcello had a modest beginning in Detroit, but his FIP always seemed to outperform his ERA, so he has that going for him. Porcello is only 27 and somehow has 107 wins already. Although he is on the Red Sox, who can support him, Porcello really hasn’t been able to stay healthy over his career, and only eclipsed 200 innings pitched in a season twice: 2014 in Detroit, and this past season in Boston. Still, he is young, and if he can hang around awhile, he might be able to pick up 100 wins or more if he can stay decent on an offensive team. Again, he doesn’t need to contend for the Cy Young, but he has to stay relatively effective, so he keeps his starting spot and racks up wins. Finally, I move on to my dark horse, King Felix Hernandez. Felix is only 30, but has been a full-time starter for 11 years. He sits at 154 wins. I feel like as a baseball community, we tend to forget about Felix. He has been very durable, although he hit the DL this past season by injuring his calf when celebrating a win. But hey, forgive the guy; he plays in Seattle, who hadn’t given him much help until recently. He is my dark horse on the list. He now plays on a good Seattle team, so he should be able to pick up wins. He might not be as good as he once was, but if he can stay effective, he has the best chance of anyone on this list. He can age well, he has stayed healthy, and he now plays on a winning team. The conditions are there, and I think he has the best shot of anyone on this list. Realistically, if I had to choose between none of them winning 300 or one of them winning, that would be a much harder choice than picking one out of the group. Realistically, do I think any of these guys have a shot? Sure, but a shot is a lot different than actually getting there. Who knows, maybe one of these guys will age well and will stay healthy. Your guess may be as good as mine. ## Hardball Retrospective – What Might Have Been – The “Original” 1902 Orphans In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills. Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com. Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here. # Terminology OWAR – Wins Above Replacement for players on “original” teams OWS – Win Shares for players on “original” teams OPW% – Pythagorean Won-Loss record for the “original” teams AWAR – Wins Above Replacement for players on “actual” teams AWS – Win Shares for players on “actual” teams APW% – Pythagorean Won-Loss record for the “actual” teams # Assessment The 1902 Chicago Orphans OWAR: 37.4 OWS: 280 OPW%: .527 (74-66) AWAR: 29.9 AWS: 203 APW%: .496 (68-69) WARdiff: 7.5 WSdiff: 77 The 1902 “Original” Orphans finished in third place, ten games behind the Reds. Bill Bradley (.340/11/77) thrived against opposing hurlers, notching career-bests in base hits (187), runs scored (104), doubles (39), home runs and batting average. “Bad” Bill Dahlen drilled 25 two-baggers and swiped 20 bags. Danny Green delivered a .302 BA and pilfered 35 bases. Jimmy “Pony” Ryan slashed 32 two-base knocks and produced a .320 BA. Johnny “Noisy” Kling succeeded on 25 stolen base attempts. Jimmy “Rabbit” Slagle executed 41 thefts and supplied a .315 BA for the “Actual” Orphans. Bill Dahlen rated twenty-first among shortstops in the “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Orphans teammates registered in the “NBJHBA” top 100 rankings include Frank Chance (25th-1B), Johnny Evers (25th-2B), Jimmy Ryan (26th-CF), Joe Tinker (33rd-SS), Bill Bradley (46th-3B), Johnny Kling (48th-C) and Tom Daly (55th-2B). “Actuals” second-sacker Bobby Lowe placed fifty-sixth. Original 1902 Orphans Actual 1902 Orphans  STARTING LINEUP POS OWAR OWS STARTING LINEUP POS OWAR OWS Jimmy Ryan LF/CF 3.26 18.59 Jimmy Slagle LF 5.11 22.25 Davy Jones CF 2.67 13.4 Davy Jones CF 2.67 13.4 Danny Green RF 3.52 20.73 John Dobbs RF/CF 0.8 8.31 Frank Chance 1B 2.66 12.37 Frank Chance 1B 2.66 12.37 Tom Daly 2B -1.87 10.46 Bobby Lowe 2B 0.79 10.24 Bill Dahlen SS 4.65 21.9 Joe Tinker SS 3.31 16.58 Bill Bradley 3B 5.38 25.61 Charlie Dexter 3B -0.47 4.12 Johnny Kling C 2.47 17.06 Johnny Kling C 2.47 17.06 BENCH POS OWAR OWS BENCH POS OWAR OWS Charlie Irwin 3B 0.74 17.4 Dusty Miller LF -0.25 3.95 Joe Tinker SS 3.31 16.58 Art Williams RF -0.33 2.46 Harry Wolverton 3B 0.41 10.43 Larry Schlafly RF 0.46 2.15 Frank Isbell 1B -0.32 9.1 Bunk Congalton RF -0.99 1.55 Art Nichols 1B 0.09 8.68 Johnny Evers 2B -0.17 1.27 Malachi Kittridge C 0.59 8.44 Hal O’Hagan 1B -0.06 1.09 Duke Farrell C -0.06 5.46 Jack Hendricks RF 0.19 0.91 Dusty Miller LF -0.25 3.95 Germany Schaefer 3B -2.27 0.71 Art Williams RF -0.33 2.46 Sammy Strang 3B 0.07 0.42 Larry Schlafly RF 0.46 2.15 Jim Murray RF -0.52 0.27 Zaza Harvey RF 0.15 1.58 Mike Jacobs SS -0.15 0.18 Bunk Congalton RF -0.99 1.55 Mike Lynch CF -0.34 0.14 Johnny Evers 2B -0.17 1.27 Snapper Kennedy CF -0.06 0.14 Germany Schaefer 3B -2.27 0.71 Ed Glenn SS -0.08 0.1 Jim Murray RF -0.52 0.27 Mike Kahoe C -0.11 0.09 Mike Jacobs SS -0.15 0.18 Pete Lamer C -0.06 0.07 Mike Lynch CF -0.34 0.14 Dad Clark 1B -0.31 0.05 Snapper Kennedy CF -0.06 0.14 Chick Pedroes RF -0.1 0.03 Jim Delahanty RF -0.14 0.09 R.E. Hillebrand RF -0.06 0.01 Pete Lamer C -0.06 0.07 Joe Hughes RF -0.05 0 Dad Clark 1B -0.31 0.05 Chick Pedroes RF -0.1 0.03 R.E. Hillebrand RF -0.06 0.01 Joe Hughes RF -0.05 0 Jack W. Taylor (23-11, 1.29) paced the National League in ERA, shutouts (8) and WHIP (0.953). Mal “Kid” Eason contributed 10 victories with a 2.76 ERA and Carl Lundgren (9-9, 1.97) completed 17 of 18 starts during his rookie campaign. Jock Menefee (12-10, 2.42) and Pop Williams (11-16, 2.49) rounded out the rotation for the “Actuals”. Original 1902 Orphans Actual 1902 Orphans  ROTATION POS OWAR OWS ROTATION POS OWAR OWS Jack Taylor SP 7.47 31.24 Jack Taylor SP 7.47 31.24 Mal Eason SP 0.55 12.06 Jock Menefee SP 1.82 14.41 Carl Lundgren SP 0.89 10.79 Pop Williams SP 0.7 13.84 Tom Hughes SP 1.4 9 Carl Lundgren SP 0.89 10.79 BULLPEN POS OWAR OWS BULLPEN POS OWAR OWS Jim St.Vrain SP 0.51 5.85 Jim St.Vrain SP 0.51 5.85 Bob Rhoads SP -1.48 3.4 Bob Rhoads SP -1.48 3.4 Jack Katoll SP -1.74 3.04 Frank Morrissey SP 0.05 2.12 Alex Hardy SP -0.29 1.16 Mal Eason SP 0.13 1.41 Fred Glade SP -0.49 0.27 Alex Hardy SP -0.29 1.16 Jim Gardner SP -0.1 1.01 Fred Glade SP -0.49 0.27 Notable Transactions Bill Bradley Before 1901 Season: Jumped from the Chicago Orphans to the Cleveland Blues. Bill Dahlen January 25, 1899: Traded by the Chicago Orphans to the Baltimore Orioles for Gene DeMontreville. March 11, 1899: Assigned to the Brooklyn Superbas by the Baltimore Orioles. Danny Green Before 1902 Season: Jumped from the Chicago Orphans to the Chicago White Sox. Jimmy Ryan Before 1902 Season: To the Washington Senators in unknown transaction. Charlie Irwin July 11, 1901: Released by the Cincinnati Reds. July 12, 1901: Signed as a Free Agent with the Brooklyn Superbas. # Honorable Mention The 1966 Chicago Cubs OWAR: 43.3 OWS: 235 OPW%: .510 (83-79) AWAR: 27.1 AWS: 176 APW%: .364 (59-103) WARdiff: 16.2 WSdiff: 59 The “Original” 1966 Cubs placed fourth with a record north of .500 yet fifteen games off the pace of the Giants. Ron Santo (.312/30/94) merited Gold Glove honors for the third straight season and paced the circuit with 95 bases on balls and a .412 OBP. Lou Brock aka “The Franchise” tallied 94 runs and topped the National League with 74 stolen bases. “Sweet Swingin’” Billy L. Williams socked 29 long balls and registered 100 runs scored. Al “Red” Worthington (2.46, 16 SV) fashioned a 1.018 WHIP and secured the late-inning leads. Ernie “Mr. Cub” Banks contributed 23 two-baggers and a .272 BA. Ken Holtzman collected 11 victories while furnishing an ERA of 3.79 in his inaugural season. # On Deck What Might Have Been – The “Original” 1921 Tigers # References and Resources Baseball America – Executive Database Baseball-Reference James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print. James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print. Retrosheet – Transactions Database The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”. Seamheads – Baseball Gauge Sean Lahman Baseball Archive ## The Giants Don’t Need an Overhaul, But an Upgrade The Giants started off their 2016 campaign with a 57-33 record before the All-star break, before finishing 87-75. There were plenty of downfalls in the second half of the season, but ultimately the bullpen led the Giants to their fate. In the first half of the season the combined ERA of the bullpen was 2.27, with 26 saves and a K/9 of 9.7. This being said, they had 42 save opportunities, which means they blew a save 38% of the time. In the second half of the season they combined for a 2.85 ERA, with 17 saves and a K/9 of 8.4. They blew 13 saves in 30 opportunities during the second half, which means they blew a save 43% of the time. The bullpen was heavily criticized in the second half of the season due to the team’s inability to replicate the same win rate they saw in the first half. However, the bullpen was only slightly better in the first half then it was in the second half. To me, the Giants were in dire need of acquiring a threat in the bullpen before the trade deadline approached. They went after Will Smith, who came in to the Giants’ pen with a 2.12 ERA, 7.9 K/9 and three blown save opportunities. With the Giants he had an ERA of 2.94, a 12.8 K/9 and a blown save. He was not able to convert a save all season, and although he proved to be a nice piece in the bullpen in hold situations, he was not a guy who could come into the 9th inning and dominate the game. In the postseason the Giants were 0/2 in save situations and, in their final game against the Cubs, their bullpen collapse was maybe the worst the league has ever seen in the playoffs. However, their rookie Ty Blach came in for 3.2 innings of relief during the postseason and did not allow an earned run. He looked promising at the end of the regular season and pitched well in high-pressure situations during October baseball. It was surprising to see him and Santiago Casilla sit out their final game, as they watched their bullpen drop four runs in the 9th. Furthermore, we saw Clayton Kershaw close the Dodgers’ final game against the Nationals to move on to the NLCS. It would have been interesting to see what kind of performance Madison Bumgarner could have shown the Cubs’ batters in that final inning. Finally, with the veteran relievers of Javier Lopez, Sergio Romo and Casilla needing new contracts for the 2017 campaign, and the Giants in need of finding someone who can come into a 9th inning and pose a legitimate threat, it will be interesting to see what the team does in the offseason to improve their bullpen. Here are my top five predictions for the Giants’ next closer. #1: Kenley Jansen: It is unlikely that Aroldis Chapman will be looking for a new home this offseason, as he looks comfortable in Chicago and will have a hard time finding a team with that amount of talent. Jansen, however, may flee from the aging Dodgers, especially if someone is willing to pay. The Giants will have a bit of salary space to work with and would benefit greatly from this signing. #2: Mark Melancon: Although Melancon is a few steps below the elite Jansen and Chapman, he showed he can work a 9th inning as well as anyone this season. He may be a bit more team-friendly as far as salary space, and that may be intriguing to the Giants who will be looking to add a heavy-hitting left fielder. #3: Jonathan Papelbon: Papelbon was replaced by Melancon for the Nationals’ closing position in the second half of the 2016 season. He had a great first half, and showed he is capable of being a dominant closer in the MLB. However, his fight with Bryce Harper in 2015 and his rough second half of the season may make him a risky candidate. This may lower his cost and if the Giants are unable to sign Jansen or Melancon, they would be smart to see what Papelbon could do for their bullpen. #4: Derek Law: Derek Law debuted in 2016 and had a pretty good campaign. With a 2.13 ERA in 55 innings of relief, he may have a shot at being the Giants’ closer. However, it would be unlikely for him to start the 2017 season off as the Giants’ closer, unless they are unable to sign someone to fill that duty this offseason. He is an unlikely candidate, but if he can improve from his 2016 season, there is no reason he would not be able to become a legitimate MLB closer. #5 Aroldis Chapman: Chapman will likely return to the Cubs, especially if they make it to the World Series this October. However, he has been on three teams in the past two years, and if the Giants are able to show him more money than the Cubs, they might be able to acquire the hard-throwing lefty. If they do, they might lose the power they need to fill left field but they would come into the 2017 season looking stronger than they did a season ago. ## The Non-Decline and Fall of the San Francisco Giants The Chicago Cubs, hinting that this year they may have magick stronger than The Goat, recently brought the San Francisco Giants’ even-year playoff dominance to an end. It was an offensively offensive series; add the two teams’ OPS together and you’re just 100 points better than David Ortiz. The low-velocity Giants staff struck out a batter an inning, and both lineups walked at a lower rate than the unwalkable Royals. My working theory was that this series represented the final demise of the already waning power of the current edition of the Giants, and that the next chart-topping version of Big Head Bruce and the Monsters would have mostly new musicians. Turns out that this theory is only partially correct. Your 2016 San Francisco Baseball Giants were actually a little better than the world-beating 2014 squad, at least when resort is had to statistics: Stat 2016 (MLB rank) 2014 (MLB rank) Position Player fWAR 26.7 (4) 23.0 (9) SP fWAR 15.0 (5) 10.1 (21) RP fWAR 2.1 (22) 1.4 (24) Position Player wRC+ 98 (t12) 99 (9) SP FIP- 96 (t7) 104 (19) RP FIP- 97 (20) 98 (18) Run differential/game +0.51 +0.31 Let’s pause a minute to consider the bullpen numbers, which are the very essence of “meh” both years. The Giants have had the reputation of having a good, cheap bullpen. It’s certainly cheap: Sergio Romo is the plutocrat of the unit at a relatively unimposing$9 million. But “good” is more of a stretch; the Giants relievers have delivered value pretty much consistent with what they’ve been paid.

Some commentators have carpeted Bochy for his bullpen usage during the NLDS, but (perhaps because I’m not actually a Giants fan) I take a longer view. The miscellaneous roadies Big Head Bruce has had to work with will hardly make anyone forget The Nasty Boys, but he has often been able to squeeze value out of them when it’s mattered most. In order to maximize value out of this motley crue (I’m in town all week — try the garlic fries) Bochy has had to be very active in the late innings, and the more decisions any manager has to make, the more that will go wrong.

Giants general manager Brian Sabean has correctly recognized that in Bruce Bochy he employs one of the best tacticians in the game today. Sabean has maximized the value of this skill by handing Bochy a collection of misfit bullpen toys and saying “here, you figure this out.” On most nights Bochy does, but every once in a while he fails, as happened in the star-crossed six-pitcher 9th in Game 4. If you want to see what a bullpen meltdown looks like in graphic form, here it is. (Younger or more sensitive Giants fans are advised not to click on that link.)

My guess is that Bochy has had a few other bad bullpen nights, but most of those have happened when the East Coast was already asleep. When you happen to have a bad night nationwide, people may be a little too inclined to draw definitive conclusions. (I do not cut Buck Showalter this kind of slack. Bochy has a bunch of semi-interchangeable parts that present numerous non-obvious choices. Buck doesn’t.)

But back to our regularly scheduled program: the 2016 Giants were, by most measures, a better squad than the 2014 one. This is a roster that’s peaking, and perhaps fell victim to what will soon be a storied Cubs team, or (more prosaically) to the bad luck inherently possible in a short series. So the Giants can look forward to an extended run of playoff contention!

Or not. The Giants are heading in full sail toward the dragon-pocked part of the map. This an old team — the Giants have the sixth-oldest set of position players in the majors and the oldest pitching staff. They have just two regular players under 27, Madison Bumgarner (still just 26) and Joe Panik (25). To borrow a Casey Stengel line, in 15 years Bumgarner may be in the Hall of Fame. In 15 years, Joe Panik will be 40.

The Giants’ farm will provide little aid. Their system has just two MLB top-100 prospects, with the best being the positionless Christian Arroyo at #79 (though the excellent Bernie Pleskoff is less hostile to his defense than I am). Austin Slater isn’t in the top 100, but he raked at AAA at age 23 with good plate discipline, so he may be able to fill the outfield spot Angel Pagan is likely to vacate.

On the bright side, the contracts of Jake Peavy and Pagan expire this year, taking $26 million off the books. Romo and Santiago Casilla will be departing for broadcasting careers as well, taking$15 million more of liabilities with them. The Giants need one or two outfielders and starting pitching, but especially with respect to the latter, next year’s free-agent class would make a cow laugh. The 2018 list is a better one, but between now and both free-agent classes likely interposes a new collective bargaining agreement, so there’s enough fog to compel Sabean to operate his lights on low beam.

And the competition isn’t sitting still. Regardless of how the hated Los Angeles Dodgers fare in the NLCS, they are poised to compete for a while. The Rockies have an exciting core of young talent, even if casual Rox fans despair of the team at the moment. The Outlaw A.J. Preller merits a blog post all his own (say, there’s an idea!), and while the Padres seem to have a bit of transmission loss between talent and wins, some improvement there is possible as well, especially if Tyson Ross can make a successful return from thoracic outlet surgery. (What? You say there’s another team in the NL West? Hmm … I’ll research that and get back to you.)

So the Giants may be stalling or even slipping backward in a division where at least two of the teams are making progress. The Giants have a good but mostly older core which could use the kind of help that free agency and prospect trades are unlikely to provide in 2017. So 2016 may indeed be the last gasp of this once-in-a-while mighty franchise, at least for the moment. Sabean has pulled a whole warren of rabbits out of his hat during his long tenure, but in 2017 he’s going to have to dig deep.

Perhaps there will be a powerful goat looking for work …

## Dr. Hendricks and Mr. Gray

Randomness and circumstances are important driving forces in everything that happens in the world. Although they usually work hand in hand with our own actions and decisions, they have the ability to pick you up when you hit the jackpot at the casino, or throw you down when your car gets crushed by a falling tree (hopefully you’re comfortably sleeping in your bed when that happens).  They can also be the difference between a pitcher having an average season on the mound, and having an outstanding one. Such is the case with the seasons Jon Gray and Kyle Hendricks had this year.

I’m not going to make the argument that these two pitchers performed equally well this season, with the main differences being random chance and circumstances, because they didn’t. Hendricks was the better pitcher; it just wasn’t the 2.48-run difference their ERAs show. The similarities between the two performances can be summarized in basically two stats. If we take a look at xFIP and SIERA (two important ERA estimators available here at FanGraphs), Hendricks’ numbers of 3.59 and 3.70, respectively, are eerily similar to Gray’s 3.61 and 3.72. From there on, however, the numbers separate abruptly.

Much like Dr. Jekyll and Mr. Hyde represent the good and the bad within a person, Hendricks’ and Gray’s seasons represent two sides of the same coin. On the one hand, circumstantial factors and good fortune turned Hendricks’ very good performance into a historical season, while a different set of circumstances and some bad fortune turned Gray’s good performance into merely an average one. In this piece, we’ll take a look at the factors that influenced these diametrically opposed results.

I’ll start by saying that Kyle Hendricks had a remarkable and impressive season. He had an average strikeout rate (8.05 K/9), didn’t walk many batters (2.08 BB/9), and allowed very few longballs (0.71 HR/9), which resulted in a really good 3.20 FIP, which ranked 4th in the majors. His ERA, however, ended up all the way down to 2.13; a whopping 1.07 runs less than his FIP. Despite being a big difference, it’s not all that uncommon, as nearly 2% of individual seasons by starters in the history of the game have had an E-F (ERA minus FIP) of -1.07 or lower. Nonetheless, that difference is hardly sustainable through multiple seasons. In major-league history, out of 2259 pitchers with at least 500 innings pitched, only two had a career E-F below -1.00, and both of them were full-time relievers (in case you’re curious, they are Alan Mills and Al Levine).

On the other side of the spectrum, Jon Gray also had a very solid season. He had an outstanding 9.91 strikeouts per 9 innings (that ranked him 9th among qualifying starters), an average walk rate of 3.16 BB/9, and a solid home-run rate (0.94 HR/9), lower than league average despite pitching half of his innings at Coors Field. His performance was good enough for a 3.60 FIP, but his actual ERA rocketed to 4.61. This 1.01 positive difference is just as unusual as Hendricks’ negative one, as about 2% of individual seasons throughout history have resulted in differences of 1.01 or higher. For visualizing purposes, here’s a table summarizing both pitchers’ numbers.

So the question still remains: what were the determining factors in these two pitchers having such a massive difference in results? Let’s dive right into it.

First of all, I decided to look at the correlation factors between E-F and a wide array of pitching stats, using data from every pitcher in MLB history with 500+ innings. As a general rule of thumb, a correlation factor between 0.40 and 0.69 indicates a strong relationship between the two variables. The following table shows the stats that had at least a 0.40 correlation factor with E-F:

Welp, that’s a pretty lame table. Keep in mind, I analyzed correlations for stats as varied as pitch-type percentages, pitch-type vertical and horizontal movements, and Soft, Medium, and Hard-hit rates, as well as K, BB, and HR per 9, or HR/FB%. None of those had even a moderate relationship with E-F. So let’s stick with the stats presented on the table.

The first two stats are really no surprise. FIP basically assumes league-average BABIP and LOB% to estimate what a pitcher’s ERA should look like. So, if a pitcher has a high BABIP, FIP is going to estimate a lower ERA than the actual one, resulting in a higher E-F; thus the positive correlation. On the other hand, if a pitcher has a higher LOB%, he’ll allow fewer runs than his FIP would suggest, resulting in a lower E-F. This explains the negative correlation shown in the table. The last stat, however, came as a real surprise, at least for me. ERA seems to be positively correlated with E-F, which means that pitchers with higher ERA tend to have higher E-F than pitchers with lower ERA.

The next logical step would be to determine which factors, if any, explain BABIP and/or LOB% among pitchers. Using the same pitching stats than in the previous step, I ran correlations with BABIP and LOB% separately. The following table shows the stats that had a strong (0.40 to 0.69) or moderate (0.30 to 0.39) relationship.

As was the case in the first table, both of these stats are correlated strongly with E-F, showing factors of 0.58 and -0.42, respectively. It doesn’t come as a shock either, that they are strongly correlated with each other. The negative correlating factor (-0.42) indicates, as you would expect, that a high BABIP leads to a low LOB%, and vice versa. On the BABIP side, a positive strong relationship with ERA is almost too obvious, as more balls in play falling for hits leads to more runs being scored. Also, since fly balls in play (not counting home runs) turn more often into outs than ground balls do, it makes sense that BABIP holds a negative relationship with the former, and a positive one with the latter. This fact, however, goes against a somewhat popular belief that ground-ball pitchers tend to have lower BABIPs.

The factors that correlate to LOB% are more interesting. The first one is not unexpected: a higher strikeout rate seems to lead to more runners getting stranded, and that’s a pretty easy concept to wrap your head around. The second one, however, is really mind-boggling, and I really can’t say I can find a reasonable explanation for it. It indicates that the higher the home-run rate allowed by a pitcher, the more runners are going to be left on base. It is quite possible that this is just a spurious correlation, having no causality at all. Finally, the last factor listed on the table is very interesting and useful in this particular case. It suggests that high percentages of soft contact lead to higher LOB%. We’ll get to that later on in this article.

So let’s go back to our pitchers and check if any of this makes sense. We know that E-F is mainly affected by BABIP and LOB%. Hendricks and Gray had very different numbers in these two stats. The Cubs’ righty had a .250 BABIP and a LOB% of 81.5, while the Rockies’ fireballer had .308 and 66.4%. Considering that the league averages were .298 and 72.9%, respectively, we can say that Hendricks did considerably better than average, while Gray did just the opposite. So far so good, right? These facts go a long way towards explaining the differing outcomes. However, BABIP and LOB% aren’t exactly pitcher-dependent; in fact, they’re the marquee stats for the generic term “luck.”

Looking at the stats from the second table, few of them help out in figuring this out. High strikeout rates, for example, are supposed to increase LOB%, but Gray still managed a really low 66.4% despite a 9.91 K/9. On the other hand, Hendricks’ 81.5% LOB ranked 5th among qualified starters, even though his strikeout rate of 8.05 was right around league average. Similarly, groundball percentage is shown to have a positive correlation with BABIP. Nonetheless, Hendricks’ higher-than-average rate of 48.4% (league average was 44.7%) resulted in a ridiculously low BABIP of .250, while Gray’s below-average rate of 43.5% came with a .308 BABIP. Almost the same thing happens when you look at the fly-ball rates.

The only factor from that second table that does make sense in these particular examples is soft-contact rate. Hendricks ranked 1st in this regard among qualified starters, with an impressive 25.1% (league average was 18.8%), while Gray had a below-average rate of 17.8%, which ranked him 50th out of 73 qualified starters. This stat is very much pitcher-dependent, and it does help explain some of the differences in LOB%. It has, however, a moderate relationship with LOB%, as evidenced by its factor of -0.37. Is that enough to account for the massive difference in the results? Intuitively, I’ll say no. There is one more factor, however, that we haven’t even discussed yet.

FIP stands for Fielding Independent Pitching, so the very thing that FIP is trying to subtract from the equation might hold the key to answering our question. Defensive performances can heavily influence the outcome of the game, and make up a big chunk of what we generally call “luck” in a pitcher’s final results. In order to have a numerical confirmation of this idea, I looked at the correlations between teams’ yearly defensive component of WAR and its staff’s BABIP, LOB%, and E-F. The data I used for this exercise was every individual team season from 1989 (the first year in which play-by-play data contained information on hits and outs location) to 2016.

We can see here that a team’s defense has a strong correlation with all three of the stats, especially E-F. Higher values of the defensive component of WAR lead to lower BABIP, higher LOB%, and lower E-F, just as you would expect.

Saying that the Cubs had a great defensive performance this year is an understatement. Not only was it the best defense in 2016 by a bunch — it was also the best defense of the last 17 years, according to FanGraphs’ defensive component of WAR. Of the 814 individual team seasons played in MLB since 1989, this year’s Cubs rank 8th. That’ll put a serious dent on opponents’ BABIP. In fact, the Cubs’ average on balls in play of .255 (yes, that is the whole pitching staff’s BABIP) is the absolute lowest since the ’82 Padres. Oh, and also the Cubs pitching staff’s LOB% of 77.5% is tied for 2nd highest since 1989. All of this adds up to a team E-F of -0.62. Wow. Just wow.

The Rockies defense, on the other hand, wasn’t bad, but it also wasn’t great. According to FanGraphs, it was 17.9 runs above average, which ranked 12th in MLB. Again, that’s really not bad at all, just miles away from the 115.5 runs above average the Cubs had. The Rockies’ staff as a whole had a .317 BABIP, and a 68.0% LOB%; not unexpected from a team that plays half their games at altitude. Still, both of these values are worse than league average, resulting in a team E-F of 0.54.

All in all, Kyle Hendricks still had a better season than Jon Gray, and people will remember the 2.13 ERA and not the 4.61. This analysis just puts it a little bit more in perspective, and helps shed some light on the little details that make big differences in the course of a long season.

The old football adage says that “defense wins championships.” That doesn’t really apply to baseball, but in the future, when I think back to the 2016 Cubs, I’ll definitely think about their defense.

## 2016 ALCS Game One: Batter vs. Pitcher Stats

The FanGraphs Twitter page tweeted out a bingo card for Game One of the ALCS. As I looked through it, I thought it was a terrific idea by Michelle Jay and a fun way to follow the game that night. I was going to play along, but then I had another idea. Some slots were much more likely to happen, such as the “Pitcher v hitter stats are mentioned” slot. I figured I would let somebody else receive a t-shirt and just count up exactly how many times the TBS broadcast team mentioned batter vs. pitcher stats. We all know announcers love doing this, and we all know that it’s pretty useless for predicting the outcome of that particular at-bat. I just thought it would be cool to experiment and see how many times they actually mentioned these stats.

First, I’ll just go over the final numbers for batter vs. pitcher stats. There were 65 batters in this game, and batter vs. pitcher stats were either mentioned by the announcers or shown on a graphic for eight of those batters.  There were two separate times where they showed a graphic and then mentioned the stats later in the plate appearance, or vice versa. Four of the eight instances occurred when the Jays were hitting against Corey Kluber, three of the eight came when Andrew Miller was pitching, and the last one came when Marco Estrada was on the mound. It’s interesting that they would mention those stats more often when a reliever is pitching, considering the sample size is sure to be even smaller against relievers, rather than starters.

For fun, I marked each occurrence and tried to quickly type out how the announcer mentioned these stats:

1. Top 1, Josh Donaldson vs. Corey Kluber: “He’s got some pretty good numbers, 6 for 16 with a jack, so he sees him well” -Cal Ripken
2. Top 1, Russell Martin vs. Corey Kluber: “Martin is only 2 for 10 in his career against Kluber, both home runs…in fact, two of his last seven off Kluber have been home runs” -Ernie Johnson (graphic added later in the plate appearance reading “2 for last 7 off Kluber with 2 HR”
3. Top 2, Michael Saunders vs. Corey Kluber: “Saunders steps in, he’s 3 for 8 in his career against Kluber, and he fouls it off” -Ernie Johnson
4. Top 6, Michael Saunders vs. Corey Kluber: “Saunders with his two hits, now 5 for 10 off Kluber” -Ron Darling
5. Bottom 6, Jason Kipnis vs. Marco Estrada: graphic shown reading “0 for 7 4 K VS ESTRADA”
6. Top 7, Melvin Upton Jr. vs. Andrew Miller: “Upton’s got some numbers against Miller, 5 for 12 with three home runs” -Ron Darling (“That is some numbers” -Cal Ripken)
7. Top 8, Edwin Encarnacion vs. Andrew Miller: “Encarnacion in his last six at-bats against Miller a couple of home runs and a double” -Ernie Johnson
8. Top 8, Jose Bautista vs. Andrew Miller: graphic shown reading “.286 (2 for 7) 1 HR 2 BB VS MILLER” (later in the plate appearance: “One of the two hits that Bautista has off Miller…long ball” -Ron Darling

I’m not trying to knock these announcers by saying that they’re not good at what they do or anything. I would be a terrible announcer. I just think these stats are pretty useless and it was interesting to see how many times they actually mentioned them during a game. Mike Petriello pointed out on Twitter an example of why these numbers aren’t good to look at.

This would be kind of fun to track during the regular season for the really good ones, such as “so and so: 1 for 2 (.500), single career vs. so and so.” Maybe this can be a new metric or something, bpBAAR (batter pitcher Baseball Announcer Above Replacement).

## Clustering Pitchers With PITCHf/x

At any point, feel free to scroll down to the bottom to see some of the tables of pitcher clusters.

## Clustering Pitches

Clustering individual pitches using data from PITCHf/x is a fairly simple task. All you need to do is pick out the important attributes that you believe define a pitch (velocity, movement, etc.) and use a clustering algorithm, such as K-Means clustering.

With K-Means clustering, you decide what K (the number of clusters) should be. For my analysis, I chose K to be 500 (rather arbitrarily). Different pitch clusters can represent the same type of pitch (i.e. fastball) but with varying attributes. For example, clusters 50 and 100 might both correspond to fastballs, but cluster 50 might be a typical Chris Young fastball whereas cluster 100 might be a typical Aroldis Chapman fastball.

One important point to remember is that you, the analyst, must decide what the clusters represent. By looking at attributes of the pitches in a given cluster, you might identity the cluster as “lefty changeups” or “submariner fastballs” (which is actually a category you will discover).

## The Problem of Clustering Pitchers

We can identify every pitch that a pitcher throws as belonging to a cluster from 1 to 500. Therefore, we know the distribution of pitch clusters for a given pitcher. The difficult problem, however, is how do we compare two pitchers using this information? Let’s say we have two pitchers:

• Pitcher A’s pitches are 50% from cluster 1 and 50% from cluster 200.
• Pitcher B’s pitches are 33% from cluster 1, 33% from cluster 300, and 33% from cluster 139.

The question remains, are Pitcher A and Pitcher B similar pitchers?

The problem of clustering pitchers is a more complicated one than clustering pitches because we now have a collection of pitches instead of just individual pitches to compare. In order to cluster pitchers, I use a model that is typically used for topic modeling called Latent Dirichlet Allocation (LDA).

## An Aside on LDA

In LDA for topic modeling, our data is a collection of documents.

Let’s imagine that our collection of documents is articles from the New York Times. There are global topics that govern how these articles are generated. For example, if you think of a newspaper, the topics might be sports, finance, health, politics, etc. Additionally, each article can be a mixture of these topics. We might imagine there is an article in the sports section titled, “Yankees payroll exceeds \$300 million”, which our algorithm may discover is 50% about sports and 50% about finance.

Similar to what is mentioned above, the analyst must figure out what the topics actually are. You do not tell the algorithm that there is a sports topic. You discover that the topic is sports by observing that the most probable words are “baseball”, “Jeter”, “LeBron”, “touchdown”, etc. The algorithm will tell you that a particular document is 50% about topic 1 and 50% about topic 20, but you must ultimately infer what topics 1 and topics 20 are.

I am harping on this point mainly just to mention that there is no magic to these clustering algorithms. An algorithm can cluster data, but it cannot tell you what these clusters mean.

## Relevance of LDA to Pitchers

Anyway, how can this model be used to analyze pitchers? We just need to use our imagination. Instead of a collection of documents, we now have a collection of pitcher seasons. Whereas each document is made up of a collection of words, each pitcher season is made up of a collection of pitches. We have already discretized each pitch using K-Means clustering in order to create our own “dictionary” of pitches. In our baseball model, we imagine that each pitcher is a mixture of repertoires, whereas in topic modeling, each document was a mixture of topics. We can then cluster pitchers together by figuring out who has the most similar repertoires.

## Nitty Gritty Details

If you are not interested in getting into the nitty gritty details, feel free to skip ahead to the next section to just see the cluster groupings.

• Data used is from 2007-2014.
• The dictionary of pitches (500 clusters) was created by running K-Means using all of the pitches from 2014. The choice of 2014 is arbitrary, but I used just one year’s worth of data because I thought it might be a sufficient amount and it was much quicker to run K-Means.
• The PITCHf/x attributes that were used to cluster pitches were start_speed, pfx_x/pfx_z (horizontal/vertical movement), px/pz (horizontal/vertical location), vx0/vz0 (components of velocity).
• For each pitcher from 2007-2014, each pitch was assigned to its closest cluster (determined by distance to the cluster center). I filtered out pitcher seasons in which the pitcher threw fewer than 500 pitches.
• I then ran LDA on pitcher seasons, choosing the number of repertoires (topics) to be 5.
• I used the method from this paper to get a vector representation of each pitcher season. I could have used the inferred repertoire proportions as my vector representations, but for various reasons, this did not produce as nice of clusters.
• Finally, I ran K-Means (K=100) on these vectors to get clusters of pitchers.
• Whereas in topic modeling, it is often interesting to interpret what the global topics actually are, I am not really interested in what the global “repertoires” are for the model. I am really using LDA as a dimensionality reduction technique to produce smaller vectors (5 vs. 500) that can be clustered together.

## Some Observations

The actual clusters along with some relevant FanGraphs statistics are provided below. Each table is sortable. For brevity, I have only included clusters in which there are 10 or fewer pitchers. Only the first cluster shown (cluster 3) has more than 10 pitchers, which I simply included to demonstrate that a cluster could be quite big.

• As is probably expected, clusters are almost always entirely righties or lefties even though this is not an input to the model.
• Guys with similar numbers of batters faced cluster together. This is by design, as the way I determined the repertoire proportions accounts for the number of times a particular pitch is thrown.
• Sometimes weird clusters can form, such as Cluster 37, which contains both Chapman and Wakefield. Cluster 37 is mostly cohesive with hard-throwing left-handers and I believe Wakefield ends up here simply because he did not fit well into any cluster.
• This is not to say that the algorithm cannot find clusters of knuckleballers. Cluster 14 is all R.A. Dickey from years 2011-2014.
• There are also other clusters that contain exclusively one (or almost one) pitcher. Cluster 8 is 5 Kershaw years and one Hamels year. Cluster 68 is 5 Verlander years. I believe these clusters form partially because their stuff is so good. There are other pitchers who fall into almost exclusively one cluster but who are joined by many other pitchers. Another factor is that they might be able to repeat their mechanics so well that they remain in the same cluster because they are always throwing the same pitch types.
• Clusters of individual pitchers also happens if a pitcher has an incredibly unique style. Justin Masterson has his own cluster because he is such an extreme ground-ball pitcher. Josh Collmenter does as well due to the extreme rise he generates on his “fastball”.
• Cluster 29 contains just Kershaw’s 2014 season and J.A. Happ’s 2009 season. If you do a Ctrl-F for J.A. Happ, he finds himself in some pretty flattering clusters. This is especially interesting because from 2007-2014, he does not have particularly good seasons, but he has been quite good the last two years. This is not to suggest that these clusters can uncover hidden gems, but it’s not fully out of the realm of possibility.
• Most clusters produce quite similar ground-ball percentages. One of the factors that goes into clustering pitches (and therefore pitchers) is horizontal and vertical movement, which play a huge factor in a pitcher’s ability to produce ground-balls.
• Submarine pitchers always end up together. Check out Clusters 9, 60, and 92.

Overall, I think this is pretty interesting stuff. I was honestly surprised that the clusters turned out to be as cohesive as they were. Additionally, besides being a descriptive tool, I have to wonder whether this information can be used for predictive purposes. For example, we often talk about regression to the mean when discussing a player’s performance, whether it be a pitcher of a batter. It is possible that the appropriate mean for many pitchers is the cluster mean that they happen to fall into.

Cluster 3

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Chris Carpenter Cardinals 750 6.73 1.78 0.33 55.0 28.0 4.6 5.5
2010 Hiroki Kuroda Dodgers 810 7.29 2.20 0.69 51.1 32.1 8.0 4.3
2010 Gavin Floyd White Sox 798 7.25 2.79 0.67 49.9 32.1 7.6 4.1
2008 Hiroki Kuroda Dodgers 776 5.69 2.06 0.64 51.3 28.6 7.6 3.6
2012 Doug Fister Tigers 673 7.63 2.06 0.84 51.0 26.7 11.6 3.4
2011 Josh Beckett Red Sox 767 8.16 2.42 0.98 40.1 42.2 9.6 3.3
2011 Michael Pineda Mariners 696 9.11 2.89 0.95 36.3 44.8 9.0 3.2
2012 A.J. Burnett Pirates 851 8.01 2.76 0.80 56.9 24.3 12.7 3.0
2013 Rick Porcello Tigers 736 7.22 2.14 0.92 55.3 23.7 14.1 2.9
2008 Carlos Zambrano Cubs 796 6.20 3.43 0.86 47.2 34.9 9.0 2.8
2013 Andrew Cashner Padres 707 6.58 2.42 0.62 52.5 28.7 8.1 2.7
2012 Jeff Samardzija Cubs 723 9.27 2.89 1.03 44.6 33.1 12.8 2.7
2010 Scott Baker Twins 725 7.82 2.27 1.22 35.6 43.5 10.2 2.6
2014 Kyle Gibson Twins 757 5.37 2.86 0.60 54.4 26.6 7.8 2.3
2012 Tim Hudson Braves 749 5.13 2.41 0.60 55.5 25.2 8.3 2.1
2014 Henderson Alvarez Marlins 772 5.34 1.59 0.67 53.8 24.3 9.5 2.1
2008 Todd Wellemeyer Cardinals 807 6.29 2.91 1.17 39.3 39.8 10.6 2.0
2010 Rick Porcello Tigers 700 4.65 2.10 1.00 50.3 32.1 9.9 1.7
2011 Luke Hochevar Royals 835 5.82 2.82 1.05 49.8 32.2 11.5 1.7
2008 Jason Marquis Cubs 738 4.90 3.77 0.81 47.6 32.5 8.3 1.7
2014 Charlie Morton Pirates 666 7.21 3.26 0.51 55.7 22.8 8.8 1.6
2012 Luis Mendoza Royals 709 5.64 3.20 0.81 52.1 27.1 10.6 1.5
2009 Aaron Cook Rockies 675 4.44 2.68 1.08 56.5 24.7 14.2 1.4
2014 Doug Fister Nationals 662 5.38 1.32 0.99 48.9 34.2 10.1 1.4
2010 Mitch Talbot Indians 696 4.97 3.90 0.73 47.8 35.3 7.0 1.2
2008 Armando Galarraga Tigers 746 6.35 3.07 1.41 43.5 39.7 13.0 1.2
2008 Carlos Silva Mariners 689 4.05 1.88 1.17 44.0 33.3 10.4 1.2
2009 Ross Ohlendorf Pirates 725 5.55 2.70 1.27 40.6 42.1 11.1 1.2
2008 Vicente Padilla Rangers 757 6.68 3.42 1.37 42.7 38.1 12.5 1.1
2012 Luke Hochevar Royals 800 6.99 2.96 1.31 43.3 35.0 13.5 1.1
2012 Derek Lowe – – – 640 3.47 3.22 0.63 59.2 21.0 9.1 1.0
2013 Edinson Volquez – – – 777 7.50 4.07 1.00 47.6 29.6 11.9 0.9
2011 Chris Volstad Marlins 719 6.36 2.66 1.25 52.3 27.7 15.5 0.7
2010 Jeremy Bonderman Tigers 754 5.89 3.16 1.32 44.7 39.2 11.4 0.7
2010 Brad Bergesen Orioles 746 4.29 2.70 1.38 48.7 36.6 11.9 0.6
2014 Hector Noesi – – – 733 6.42 2.92 1.46 38.0 40.6 12.7 0.3
2009 Armando Galarraga Tigers 642 5.95 4.20 1.50 39.9 38.6 13.3 0.2
2008 Kyle Kendrick Phillies 722 3.93 3.30 1.33 44.3 28.7 14.0 0.1
2014 Roberto Hernandez – – – 722 5.74 3.99 1.04 49.7 29.9 12.2 0.0
2013 Lucas Harrell Astros 707 5.21 5.15 1.17 51.5 27.4 14.3 -0.8

Cluster 5

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Cliff Lee – – – 843 7.84 0.76 0.68 41.9 40.4 6.3 7.0
2011 Cliff Lee Phillies 920 9.21 1.62 0.70 46.3 32.4 9.0 6.8
2009 Jon Lester Red Sox 843 9.96 2.83 0.89 47.7 34.5 10.6 5.3
2014 Jose Quintana White Sox 830 8.00 2.34 0.45 44.7 33.2 5.1 5.1
2013 Derek Holland Rangers 894 7.99 2.70 0.85 40.8 36.4 8.8 4.3
2012 Matt Moore Rays 759 8.88 4.11 0.91 37.4 42.9 8.6 2.7
2013 Wade Miley Diamondbacks 847 6.53 2.93 0.93 52.0 27.2 12.5 1.8

Cluster 6

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 CC Sabathia Indians 975 7.80 1.38 0.75 45.0 36.6 7.8 6.4
2014 Jake McGee Rays 274 11.36 2.02 0.25 38.0 42.9 2.9 2.6
2014 Tyler Matzek Rockies 503 6.96 3.37 0.69 49.7 30.3 8.3 1.7
2013 J.A. Happ Blue Jays 415 7.48 4.37 0.97 36.5 46.0 7.6 1.1
2010 J.A. Happ – – – 374 7.21 4.84 0.82 39.0 43.4 7.4 1.0
2009 Sean West Marlins 467 6.10 3.83 0.96 40.2 40.8 8.0 1.0
2009 Andrew Miller Marlins 366 6.64 4.84 0.79 48.0 30.0 9.3 0.7
2012 Drew Pomeranz Rockies 434 7.73 4.28 1.30 43.9 35.9 13.6 0.7
2013 Jake McGee Rays 260 10.77 3.16 1.15 42.5 38.8 12.9 0.6
2008 Jo-Jo Reyes Braves 512 6.21 4.14 1.43 48.5 31.8 15.5 0.2

Cluster 8

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Clayton Kershaw Dodgers 908 8.85 1.98 0.42 46.0 31.3 5.8 7.1
2011 Clayton Kershaw Dodgers 912 9.57 2.08 0.58 43.2 38.6 6.7 7.1
2012 Clayton Kershaw Dodgers 901 9.05 2.49 0.63 46.9 34.0 8.1 5.9
2010 Clayton Kershaw Dodgers 848 9.34 3.57 0.57 40.1 42.1 5.8 4.7
2009 Clayton Kershaw Dodgers 701 9.74 4.79 0.37 39.4 41.6 4.1 4.4
2010 Cole Hamels Phillies 856 9.10 2.63 1.12 45.4 37.9 12.3 3.5

Cluster 9

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Peter Moylan Braves 309 7.52 4.32 0.00 62.4 19.5 0.0 1.4
2014 Joe Smith Angels 285 8.20 1.81 0.48 59.1 25.9 8.0 1.0
2011 Joe Smith Indians 267 6.04 2.82 0.13 56.6 23.5 2.2 1.0
2009 Brad Ziegler Athletics 313 6.63 3.44 0.25 62.3 19.7 4.4 1.0
2013 Brad Ziegler Diamondbacks 297 5.42 2.71 0.37 70.4 10.8 12.5 0.6
2012 Brad Ziegler Diamondbacks 263 5.50 2.75 0.26 75.5 7.7 13.3 0.6
2012 Joe Smith Indians 278 7.12 3.36 0.54 58.0 24.9 8.3 0.6
2008 Cla Meredith Padres 302 6.27 3.07 0.77 66.8 17.3 15.8 0.3
2010 Peter Moylan Braves 271 7.35 5.23 0.71 67.8 21.3 13.5 -0.3

Cluster 14

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 R.A. Dickey Mets 927 8.86 2.08 0.92 46.1 34.1 11.3 5.0
2011 R.A. Dickey Mets 876 5.78 2.33 0.78 50.8 32.9 8.3 2.5
2014 R.A. Dickey Blue Jays 914 7.22 3.09 1.09 42.0 37.6 10.7 1.7
2013 R.A. Dickey Blue Jays 943 7.09 2.84 1.40 40.3 40.5 12.7 1.7

Cluster 16

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Max Scherzer Tigers 836 10.08 2.35 0.76 36.3 44.6 7.6 6.1
2014 Max Scherzer Tigers 904 10.29 2.57 0.74 36.7 41.6 7.5 5.2
2011 Daniel Hudson Diamondbacks 921 6.85 2.03 0.69 41.7 39.1 6.4 4.6
2012 Max Scherzer Tigers 787 11.08 2.88 1.10 36.5 41.5 11.6 4.4
2014 Jeff Samardzija – – – 879 8.28 1.76 0.82 50.2 30.5 10.6 4.1
2014 Lance Lynn Cardinals 866 8.00 3.18 0.57 44.3 36.0 6.1 3.4

Cluster 18

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Brandon Webb Diamondbacks 944 7.27 2.58 0.52 64.4 20.4 9.6 5.5
2013 Justin Masterson Indians 803 9.09 3.54 0.61 58.0 24.2 10.7 3.5
2012 Justin Masterson Indians 906 6.94 3.84 0.79 55.7 25.0 11.4 2.3
2011 Derek Lowe Braves 830 6.59 3.37 0.67 59.0 22.5 10.2 2.1

Cluster 20

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 John Danks White Sox 878 6.85 2.96 0.76 45.4 38.9 7.4 4.4
2010 Brian Matusz Orioles 760 7.33 3.23 0.97 36.2 45.0 7.9 3.0
2009 John Danks White Sox 839 6.69 3.28 1.26 44.2 40.9 11.5 2.7
2013 Felix Doubront Red Sox 705 7.71 3.94 0.72 45.6 34.4 7.8 2.2
2014 J.A. Happ Blue Jays 673 7.58 2.91 1.25 40.6 39.5 11.5 1.0

Cluster 24

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 CC Sabathia – – – 1023 8.93 2.10 0.68 46.6 31.7 8.8 7.3
2011 CC Sabathia Yankees 985 8.72 2.31 0.64 46.6 30.3 8.4 6.4
2010 David Price Rays 861 8.11 3.41 0.65 43.7 39.6 6.5 4.2

Cluster 29

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Clayton Kershaw Dodgers 749 10.85 1.41 0.41 51.8 29.2 6.6 7.6
2009 J.A. Happ Phillies 685 6.45 3.04 1.08 38.4 42.9 9.5 1.7

Cluster 35

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Chris Young Mariners 688 5.89 3.27 1.42 22.3 58.7 8.8 0.1
2014 Marco Estrada Brewers 624 7.59 2.63 1.73 32.7 49.5 13.2 -0.1

Cluster 36

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Justin Masterson Indians 908 6.58 2.71 0.46 55.1 26.7 6.3 4.2
2010 Justin Masterson Indians 802 7.00 3.65 0.70 59.9 24.9 10.0 2.3

Cluster 37

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Aroldis Chapman Reds 276 15.32 2.89 0.50 37.3 42.9 7.4 3.3
2009 Matt Thornton White Sox 291 10.82 2.49 0.62 46.4 36.3 7.7 2.3
2008 Matt Thornton White Sox 268 10.29 2.54 0.67 53.0 27.4 10.9 1.7
2012 Drew Smyly Tigers 416 8.52 2.99 1.09 39.9 41.3 10.3 1.7
2008 Clayton Kershaw Dodgers 470 8.36 4.35 0.92 48.0 31.3 11.6 1.5
2008 Tim Wakefield Red Sox 754 5.82 2.98 1.24 35.5 48.9 9.1 1.1
2011 Tim Wakefield Red Sox 677 5.41 2.73 1.45 38.4 45.8 10.5 0.2

Cluster 38

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2013 Cliff Lee Phillies 876 8.97 1.29 0.89 44.3 33.3 10.9 5.5
2008 Johan Santana Mets 964 7.91 2.42 0.88 41.2 36.4 9.4 5.3
2010 Jon Lester Red Sox 861 9.74 3.59 0.61 53.6 29.6 8.9 4.8
2012 CC Sabathia Yankees 833 8.87 1.98 0.99 48.2 30.7 12.5 4.7
2008 Jon Lester Red Sox 874 6.50 2.82 0.60 47.5 31.6 7.0 4.1
2013 Hyun-Jin Ryu Dodgers 783 7.22 2.30 0.70 50.6 30.5 8.7 3.6
2014 Wei-Yin Chen Orioles 772 6.59 1.70 1.11 41.0 37.5 10.5 2.4
2010 Jonathan Sanchez Giants 812 9.54 4.47 0.98 41.5 43.7 9.8 2.3
2014 Wade Miley Diamondbacks 866 8.18 3.35 1.03 51.1 28.0 13.9 1.6

Cluster 44

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Cole Hamels Phillies 850 8.08 1.83 0.79 52.3 32.6 9.9 4.9
2008 Cole Hamels Phillies 914 7.76 2.10 1.11 39.5 38.7 11.2 4.8
2008 John Danks White Sox 804 7.34 2.63 0.69 42.8 35.4 7.4 4.8
2009 Cole Hamels Phillies 814 7.81 2.00 1.12 40.4 38.7 10.7 3.9
2014 Danny Duffy Royals 606 6.81 3.19 0.72 35.8 46.0 6.1 1.9
2011 J.A. Happ Astros 698 7.71 4.78 1.21 33.0 44.2 10.2 0.6

Cluster 46

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2010 Roy Halladay Phillies 993 7.86 1.08 0.86 51.2 29.7 11.3 6.1
2013 Lance Lynn Cardinals 856 8.84 3.39 0.62 43.1 34.4 7.4 3.7
2008 Mike Pelfrey Mets 851 4.93 2.87 0.54 49.6 29.6 6.3 3.1
2009 A.J. Burnett Yankees 896 8.48 4.22 1.09 42.8 39.2 10.8 3.0
2010 Roberto Hernandez Indians 880 5.31 3.08 0.73 55.6 30.8 8.3 2.6
2009 Derek Lowe Braves 855 5.13 2.91 0.74 56.3 25.8 9.4 2.5
2010 Derek Lowe Braves 824 6.32 2.83 0.84 58.8 22.6 13.1 2.2

Cluster 49

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Aroldis Chapman Reds 202 17.67 4.00 0.17 43.5 34.8 4.2 2.8
2014 James Paxton Mariners 303 7.18 3.53 0.36 54.8 22.6 6.4 1.2
2013 Rex Brothers Rockies 281 10.16 4.81 0.67 48.8 32.5 9.3 0.9
2012 Antonio Bastardo Phillies 224 14.02 4.50 1.21 27.7 50.0 12.5 0.8
2012 Tim Collins Royals 295 12.01 4.39 1.03 40.9 42.8 11.8 0.7
2012 Christian Friedrich Rockies 377 7.87 3.19 1.49 42.2 34.6 15.4 0.7
2013 Justin Wilson Pirates 295 7.21 3.42 0.49 53.0 30.0 6.7 0.6
2011 Aroldis Chapman Reds 207 12.78 7.38 0.36 52.7 30.8 7.1 0.5
2014 Justin Wilson Pirates 256 9.15 4.50 0.60 51.3 34.4 7.3 0.2
2011 Mike Dunn Marlins 267 9.71 4.43 1.29 38.5 46.0 12.2 -0.2

Cluster 51

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Cliff Lee – – – 969 7.03 1.67 0.66 41.3 36.5 6.5 6.3
2009 CC Sabathia Yankees 938 7.71 2.62 0.70 42.9 37.3 7.4 5.9
2010 CC Sabathia Yankees 970 7.46 2.80 0.76 50.7 34.1 8.6 5.1

Cluster 54

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Hisashi Iwakuma Mariners 709 7.74 1.06 1.01 50.2 28.7 13.2 3.1
2009 Justin Masterson – – – 568 8.28 4.18 0.84 53.6 31.4 10.4 1.5
2014 Justin Masterson – – – 592 8.11 4.83 0.84 58.2 21.6 14.6 0.4

Cluster 58

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 David Price – – – 1009 9.82 1.38 0.91 41.2 38.1 9.7 6.0
2014 Jon Lester – – – 885 9.01 1.97 0.66 42.4 37.0 7.2 5.6
2012 Gio Gonzalez Nationals 822 9.35 3.43 0.41 48.2 30.0 5.8 5.0
2011 David Price Rays 918 8.75 2.53 0.88 44.3 36.9 9.7 4.4
2013 Gio Gonzalez Nationals 819 8.83 3.50 0.78 43.9 33.3 9.7 3.2
2011 Gio Gonzalez Athletics 864 8.78 4.05 0.76 47.5 34.1 8.9 3.1
2010 Gio Gonzalez Athletics 851 7.67 4.13 0.67 49.3 35.3 7.4 3.1

Cluster 60

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Brad Ziegler – – – 239 6.79 2.93 0.00 68.6 13.4 0.0 1.0
2007 Cla Meredith Padres 342 6.67 1.92 0.68 72.0 13.6 17.1 1.0
2008 Brad Ziegler Athletics 229 4.53 3.32 0.30 64.7 18.8 6.3 0.5
2013 Joe Smith Indians 259 7.71 3.29 0.71 49.1 30.1 9.6 0.5
2008 Chad Bradford – – – 241 2.58 2.28 0.46 66.5 16.0 9.4 0.4
2012 Cody Eppley Yankees 194 6.26 3.33 0.59 60.3 19.1 11.1 0.3
2008 Joe Smith Mets 271 7.39 4.41 0.57 62.6 17.9 12.5 0.3
2009 Cla Meredith – – – 283 5.10 3.44 0.55 62.9 21.1 8.9 0.2
2010 Brad Ziegler Athletics 257 6.08 4.15 0.59 54.4 26.9 8.2 0.1
2014 Brad Ziegler Diamondbacks 281 7.25 3.22 0.67 63.8 18.9 13.5 0.1

Cluster 68

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Justin Verlander Tigers 982 10.09 2.36 0.75 36.0 42.8 7.4 7.7
2012 Justin Verlander Tigers 956 9.03 2.27 0.72 42.3 35.6 8.3 6.8
2011 Justin Verlander Tigers 969 8.96 2.04 0.86 40.2 42.1 8.8 6.4
2010 Justin Verlander Tigers 925 8.79 2.85 0.56 41.0 40.3 5.6 6.3
2013 Justin Verlander Tigers 925 8.95 3.09 0.78 38.4 38.9 7.8 4.9

Cluster 69

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Manny Parra Brewers 741 7.97 4.07 0.98 51.6 26.6 13.5 2.3
2014 Drew Smyly – – – 618 7.82 2.47 1.06 36.6 43.4 9.5 2.2
2012 J.A. Happ – – – 627 8.96 3.48 1.18 44.0 38.9 11.9 1.9

Cluster 70

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Gerrit Cole Pirates 571 9.00 2.61 0.72 49.2 31.8 9.4 2.3
2009 Luke Hochevar Royals 631 6.67 2.90 1.45 46.6 35.8 13.8 1.0
2012 Joe Kelly Cardinals 457 6.31 3.03 0.84 51.7 27.5 11.0 0.9
2008 Sidney Ponson – – – 612 3.85 3.18 0.93 54.5 26.2 10.9 0.9
2013 Joe Kelly Cardinals 532 5.73 3.19 0.73 51.1 28.2 8.9 0.7
2009 Roberto Hernandez Indians 596 5.67 5.03 1.15 55.2 27.0 13.7 0.0

Cluster 71

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Chris Young Padres 434 8.18 4.22 1.14 21.7 53.4 8.7 1.4
2012 Chris Young Mets 493 6.26 2.82 1.25 22.3 58.2 7.7 1.2
2013 Josh Collmenter Diamondbacks 384 8.32 3.23 0.78 32.7 46.8 6.9 1.0
2012 Josh Collmenter Diamondbacks 375 7.97 2.19 1.30 37.4 43.1 11.5 0.8
2009 Chris Young Padres 336 5.92 4.74 1.42 30.2 51.7 10.0 0.0

Cluster 72

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Madison Bumgarner Giants 873 9.07 1.78 0.87 44.4 35.8 10.0 4.0
2013 Jon Lester Red Sox 903 7.47 2.83 0.80 45.0 35.4 8.3 3.5

Cluster 77

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Josh Collmenter Diamondbacks 621 5.83 1.63 0.99 33.3 47.0 7.7 2.3
2014 Josh Collmenter Diamondbacks 719 5.77 1.96 0.90 38.8 39.9 8.3 1.9

Cluster 78

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2007 Rich Hill Cubs 812 8.45 2.91 1.25 36.0 42.9 11.7 3.1
2014 Tyler Skaggs Angels 464 6.85 2.39 0.72 50.1 30.9 8.7 1.5
2011 Danny Duffy Royals 474 7.43 4.36 1.28 37.5 40.3 11.5 0.5
2010 Manny Parra Brewers 560 9.52 4.65 1.33 47.2 34.5 14.8 0.3

Cluster 79

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 David Price Rays 836 8.74 2.52 0.68 53.1 27.0 10.5 5.0
2011 C.J. Wilson Rangers 915 8.30 2.98 0.64 49.3 31.9 8.2 4.9
2010 C.J. Wilson Rangers 850 7.50 4.10 0.44 49.2 33.5 5.3 4.1
2013 C.J. Wilson Angels 913 7.97 3.60 0.64 44.4 33.4 7.2 3.2
2012 Madison Bumgarner Giants 849 8.25 2.12 0.99 47.9 33.3 11.7 3.1
2011 Derek Holland Rangers 843 7.36 3.05 1.00 46.4 33.6 11.0 3.0
2012 Wandy Rodriguez – – – 875 6.08 2.45 0.92 48.0 31.6 10.1 2.5
2014 Jason Vargas Royals 790 6.16 1.97 0.91 38.3 38.7 8.2 2.2
2012 C.J. Wilson Angels 865 7.70 4.05 0.85 50.3 29.9 10.8 2.2

Cluster 85

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2012 Cliff Lee Phillies 847 8.83 1.19 1.11 45.0 36.9 11.8 5.0
2014 Cole Hamels Phillies 829 8.71 2.59 0.62 46.4 31.1 8.2 4.3
2009 Wandy Rodriguez Astros 849 8.45 2.76 0.92 44.9 37.1 9.9 4.1
2012 Wade Miley Diamondbacks 807 6.66 1.71 0.65 43.3 33.7 6.9 4.1
2013 Jose Quintana White Sox 832 7.38 2.52 1.03 42.5 37.4 10.2 3.5
2009 Andy Pettitte Yankees 834 6.84 3.51 0.92 42.9 37.8 8.9 3.4
2012 Wei-Yin Chen Orioles 818 7.19 2.66 1.35 37.1 42.1 11.7 2.3

Cluster 86

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2009 Josh Beckett Red Sox 883 8.43 2.33 1.06 47.2 31.7 12.8 4.2
2010 Max Scherzer Tigers 800 8.46 3.22 0.92 40.3 40.0 9.6 3.7
2014 Nathan Eovaldi Marlins 854 6.40 1.94 0.63 44.8 32.9 6.6 2.9
2012 Lucas Harrell Astros 827 6.51 3.62 0.60 57.2 22.5 9.7 2.8
2013 Jeff Samardzija Cubs 914 9.01 3.29 1.05 48.2 31.4 13.3 2.7
2011 Max Scherzer Tigers 833 8.03 2.58 1.34 40.3 39.5 12.6 2.2
2009 Mike Pelfrey Mets 824 5.22 3.22 0.88 51.3 30.0 9.5 1.7
2011 Roberto Hernandez Indians 833 5.20 2.86 1.05 54.8 26.6 13.0 0.9

Cluster 92

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2014 Steve Cishek Marlins 275 11.57 2.89 0.41 42.7 31.1 5.9 2.0
2007 Sean Green Mariners 304 7.01 4.50 0.26 60.9 18.8 5.1 0.7
2008 Sean Green Mariners 358 7.06 4.10 0.34 63.3 19.5 6.1 0.7
2011 Shawn Camp Blue Jays 292 4.34 2.98 0.41 53.5 25.7 5.2 0.3
2010 Shawn Camp Blue Jays 298 5.72 2.24 1.00 52.0 31.4 11.1 0.2

Cluster 95

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2008 Cliff Lee Indians 891 6.85 1.37 0.48 45.9 35.1 5.1 6.7
2012 Cole Hamels Phillies 867 9.03 2.17 1.00 43.4 35.1 11.9 4.6
2013 Cole Hamels Phillies 905 8.26 2.05 0.86 42.7 36.7 9.1 4.5
2008 Scott Kazmir Rays 641 9.81 4.14 1.36 30.8 48.9 12.0 2.0

Cluster 97

year Name Team TBF K9 BB9 HR9 GB_pct FB_pct HR_FB WAR
2011 Jered Weaver Angels 926 7.56 2.14 0.76 32.5 48.6 6.3 5.7
2009 Jered Weaver Angels 882 7.42 2.82 1.11 30.9 50.4 8.3 3.9
2014 Chris Tillman Orioles 871 6.51 2.86 0.91 40.6 39.3 8.3 2.3
2009 Joe Blanton Phillies 837 7.51 2.72 1.38 40.6 39.5 12.9 2.2
2013 Chris Tillman Orioles 845 7.81 2.97 1.44 38.6 39.8 14.2 1.9

## A Year In xISO

For the type of baseball fan I’ve become — one who follows the sport as a whole rather than focuses on a particular team — 2016 was the season of Statcast. Even for those who watch the hometown team’s broadcast on a nightly basis, exit velocity and launch angle have probably become familiar terms. While Statcast was around last season, it seems fans and commentators alike have really embraced it in 2016.

Personally, I commend MLB for democratizing Statcast data, at least partially, especially when they are under no apparent obligation to do so. I’ve enjoyed the Statcast Podcast this season, but most of all, I’ve benefited from the tools available at Baseball Savant. For it is that tool which has allowed me to explore xISO. I first introduced an attempt to incorporate exit velocity into a player’s expected isolated slugging (xISO). I subsequently updated the model and discussed some notable first half players. Alex Chamberlain was kind enough to include my version of xISO in the RotoGraphs x-stats Omnibus, and I’ve been maintaining a daily updated xISO resource ever since.

Happily for science, all of my 2016 first half “Overperformers” saw ISO declines in the second half, while most of my first half “Underperformers” saw large drops in second half playing time. Rather than focus on individuals, though, let’s try to estimate the predictive value of xISO in 2016.

Yuck. This plot shows how well first-half ISO predicted second-half ISO, compared to how well first-half xISO predicted the same, for 2016 first AND second-half qualified hitters. Both of these are calculated using the model as it was at the All-Star break. There are two takeaways: First-half ISO was a pretty bad predictor of second-half ISO, and first-half xISO was also a pretty bad predictor of second-half ISO. Mercifully though, first-half xISO was a bit better than ISO at predicting future ISO. This is consistent with the findings in my first article, and a basic requirement I set out to satisfy.

Now, an interesting thing happened recently. After weeks of hinting, Mike Petriello unveiled “Barrels”. Put simply, Barrels are meant to be a classification of the best kind of batted balls. Shortly thereafter, Baseball Savant began tabulating total Barrels, Barrels per batted ball (Brls/BBE), and Barrels per plate appearance (Brls/PA). In a way, this is similar to Andrew Perpetua’s approach to using granular batted-ball data to track expected outcomes for each batted ball, except that the Statcast folks have taken only a slice of launch angles and exit velocities to report as Barrels.

By definition, these angles and velocities are those for which the expected slugging percentage is over 1.500, so it would appear that this stat could be a direct replacement for my xISO. Not so fast! First of all, because ISO is on a per at-bat (AB) basis, we definitely need to calculate Brls/AB from Brls/PA. This is not so hard if we export a quick FanGraphs leaderboard. Let’s check how well Brls/AB works in a single-predictor linear model for ISO:

Not too bad. The plot reports both R-squared and adjusted R-squared, for comparison with multiple regression models. I won’t show it, but this is almost exactly the coefficient of determination that my original xISO achieves with the same training data. I still notice a hint of nonlinearity, and I bet we can do better.

Hey now, that’s nice. In terms of adjusted R-squared, we’ve picked up about 0.06, which is not insignificant. The correlation plot also looks better to my eye. So what did I do? As is my way, I added a second-order term, and sprinkled in FB% and GB% as predictors. The latter two are perhaps controversial inclusions. FB% and/or GB% might be suspected to be strongly correlated with Brls/AB, introducing some undesired multicollinearity. While I won’t show the plots, it doesn’t actually turn out to be a big problem in this case. Both FB% and GB% have Pearson correlation coefficients close to 0.5 with Brls/AB (negative correlation in the case of GB%). Here’s the functional form of the multiple regression model plotted above, which was trained on all 2016 qualified hitters:

$\inline&space;\dpi{200}&space;{\color{Blue}&space;2.01179*Brls/AB+0.12122*FB-0.08887*GB-4.92145*\left&space;(&space;Brls/AB&space;\right&space;)^2+0.09044}$

To be honest, there is something about my first model that I liked better. This version, using Barrels, feels like a bit of a half-measure between Andrew Perpetua’s bucketed approach and my previous philosophy of using only average exit-velocity values and batted-ball mix. My original intent was to create a metric that could be easily calculated from readily available resources, so in that sense, I’m still succeeding. Going forward, I will be calculating both versions on my spreadsheet. I’m excited to see which version serves the community better heading into 2017!

As always, I’m happy to entertain comments, questions, or criticisms.

## Did the Cubs and Giants Have the Best Pitcher-Hitting Series Ever?

With a wild comeback in Game 4 on Tuesday night, the Cubs secured their spot in the NLCS for the second straight season. Considering where the team was just five years ago, this is obviously an impressive achievement. But maybe more impressive is how they reached that second consecutive NLCS. The Cubs scored 17 runs against the Giants in their NLDS showdown, and six of those were driven in by their pitchers! That’s an absurd 35% of the Cubs’ run output coming from the guys who usually do the run prevention.

When Travis Wood hit his incredible home run as a relief pitcher in Game 2, it was the first postseason home run from a pitcher since Joe Blanton took Edwin Jackson deep in Game 4 of the 2008 World Series, and the first postseason home run from a reliever since 1924.

When Jake Arrieta left the yard in the first inning of the very next game, it became the first postseason series with multiple home runs off the bats of pitchers since the 1968 World Series, when Mickey Lolich and Bob Gibson each went deep in a seven-game series. Of course, Lolich and Gibson were rivals, not teammates, making the Wood-Arrieta accomplishment even more impressive — and rare. In fact, it was only the second time in the history of baseball (per Baseball-Reference Play Index) that two pitchers, on the same team, hit home runs in the same series. The only other time with in the 1924 World Series, when New York Giant teammates, and pitchers, Jack Bentley and Rosy Ryan homered in Games 3 and 5 of the epic seven-game series. Wood and Arrieta were the only ones to do so in back-to-back games.

* * *

Now, it wasn’t just the Cubs pitchers getting in on the fun. For a while Tuesday night, it looked as though Giants starter, Matt Moore, was going to be a two-fold hero. Shutting down the Cubs offense from the mound, and knocking in the first run of the game for the Giants in the bottom of the fourth. While that was the only hit from Giants pitchers in the series, it was still enough to set the combined hitting totals for the two teams to: .250 batting average, with a .625 slugging percentage, while knocking in 23 percent of the total runs scored.

Those are some pretty crazy totals, but are they the best ever?

Using the aforementioned Play Index search of all-time postseason home runs from pitchers, there are 18 different series (including the 2016 NLDS) in which a pitcher homered. In those series, on three occasions, the pitcher who hit the home run was the only pitcher to get a hit in the entire series (1984 Rick Sutcliffe, 1978 Steve Carlton, 1975 Don Gullet). Only twice did pitchers combine for more than the 10 total bases from the Giants and Cubs, and only once did they drive in more than the seven runs (and they never topped the percent of runs driven in). Let’s go to the chart:

Top Team Pitcher Performances in the Playoffs

 Year Hits AB BA TB SLG RBI Series runs % of RBI 2016 NLDS 4 16 0.250 10 0.625 7 30 23.33 2008 WS 2 13 0.154 5 0.385 1 39 2.56 2006 NLCS 2 25 0.080 5 0.200 1 55 1.82 2003 NLCS 3 28 0.107 6 0.214 3 82 3.66 1984 NLCS 4 17 0.235 7 0.412 1 48 2.08 1978 NLCS 2 17 0.118 5 0.294 4 38 10.53 1975 NLCS 2 12 0.167 5 0.417 3 26 11.54 1974 WS 4 20 0.200 8 0.400 1 27 3.70 1970 WS 2 25 0.080 5 0.200 4 53 7.55 1970 ALCS 5 18 0.278 10 0.556 6 37 16.22 1969 WS 5 26 0.192 10 0.385 5 24 20.83 1968 WS 5 36 0.139 11 0.306 4 63 6.35 1967 WS 2 30 0.067 8 0.267 2 46 4.35 1965 WS 5 32 0.156 9 0.281 6 44 13.64 1958 WS 7 37 0.189 10 0.270 8 54 14.81 1940 WS 3 39 0.077 7 0.179 2 50 4.00 1926 WS 4 39 0.103 8 0.205 2 52 3.85 1924 WS 8 42 0.190 14 0.333 5 53 9.43 1920 WS 6 39 0.154 9 0.231 3 29 10.34

After a brief peruse, it’s clear that there are only a few cases in which the pitchers in a series can even come close to what we just saw. Let’s take a look at the five best, in ascending order:

1968 World Series

This was one of the three series before the 2016 NLDS in which multiple pitchers hit home runs. In 1968, it was, as noted above, Bob Gibson and Mickey Lolich who homered in the series, one each for the Cardinals and Tigers. The reason this series is in fifth in the challengers to Cubs-Giants is because those two pitchers were really it. They drove in the only four runs from pitchers in the series (three of the four RBI coming on the two home-run swings), and there was only hit to hit come from a non-Gibson/Lolich pitcher.

1969 World Series

Just a year after our first entry into this challenge, the Mets and Orioles played in the first World Series to be led off with a League Championship Series. The extra-long season didn’t stop the Mets and Orioles pitchers from contributing all over the diamond, however, as they crammed five hits, 10 total bases, and five RBI into just a five-game series. Because of the abbreviated length of the series, this is one of the few series that can challenge the 2016 NLDS in terms of percentages. That being said, the Cubs-Giants pitchers take all three percentage categories, leaving there no real room for debate on this one.

1958 World Series

The 1958 series stands out in that it was the highest RBI total for pitchers in any postseason series to date. That was thanks in large part to top two pitchers for the Braves, Warren Spahn and Lew Burdette, tallying three RBI apiece. Burdette did it with the long ball, while Spahn preferred the death-by-a-thousand-cuts method, tallying his three RBI on four hits in the series. The Yankees got two RBI of their own from Bob Turley, but I’m not quite willing to give these guys the edge over the Cubs-Giants pitchers. The easiest argument for this year’s NLDS is that the Cubs-Giants pitchers tallied as many total bases and only one less RBI in three fewer games, as the 1958 World Series went to seven games, while this year’s NLDS went just four games.

1924 World Series

Here’s where the challenge gets real stiff. The 1924 World Series is the other series in which we have two home runs from pitchers, the aforementioned Bentley and Ryan teammates for the Giants. This series tops our charts in hits (8) and total bases (14), and is a reasonable choice for best-hitting series from a group of pitchers. I’m still giving the edge to Cubs-Giants in this showdown, though, and for a couple of reasons. Actually, really one reason with a couple different explanations: opportunity. Similar to the 1958 World Series, the 1924 World Series went to seven games, meaning that pitchers had far more games to rack up those hits and total bases. Pitchers were also left in games far longer in the 1920s, and as such, tallied almost three times as many at bats as the 2016 NLDS pitchers. When comparing batting average (.250 to .190) and, even more so, slugging percentage (.625 to .333) it becomes clear that this year’s Cubs-Giants pitchers still reign supreme.

1970 ALCS

Here’s our winner. The only series that I believe tops the recently concluded Cubs-Giants NLDS in terms of output from pitchers at the plate. This was an even shorter series than Cubs-Giants, as the Orioles only needed three games to dispatch the Twins. And their pitchers were a good chunk of the reason why. The Orioles used just four pitchers in the series, but all four got hits, combining for all of the offense you see above. (Twins pitchers were 0-for-5 in the series.) Not only did all four get hits, but all three starters got extra-base hits, as Dave McNally, Jim Palmer, and Mike Cuellar (Dick Hall was the reliever) all showed what they were capable of on the other side of the ball. Of course, the very next season, these three starters, along with Pat Dobson, would form just the second-ever set of four 20-game winners on the same team, proving just how awesome the late 60s and early 70s Orioles really were. They reign supreme for now, but let’s see how those Cubs starting pitchers do for the rest of the 2016 playoffs.