Archive for Research

Platooning Kolten Wong and Jedd Gyorko

Last week, the Cardinals announced that Kolten Wong would be part of a platoon with Jedd Gyorko. As Mark Saxon noted, Wong did not react well to the news (although he later clarified that he would prefer to stay in St. Louis). Kolten and the Redbirds agreed to a five-year, $25.5-million contract last spring, but what might have served as a confidence booster for the young second baseman resulted in a slash line of .240/.327/.355 over 121 games.

The reason for this platoon is Gyorko’s bat. He did lead the Cardinals with 30 home runs in 2016, a number that had him tied for 11th in the National League. But is Gyorko that much better offensively to offset Wong’s defensive attributes?

Let’s look at this from two different perspectives: in the field and at the plate.

In the Field

Using data from 2015 and 2016, Wong and Gyorko played over 1000 innings at second base — a large enough sample size to use in an analysis. Examining the Ultimate Zone Rating per 150 games (UZR/150), we see that Kolten and Jedd have scores of 3.1 and 4.2, respectively. So, while Gyorko seems to have an advantage here, over the course of a 162-game season, this is a relatively insignificant difference.

Looking at Def, which measures the number of runs above or below average a player is worth, we see that Wong scores 8.2, while Gyorko scores 5.3. According to FanGraphs Rules of Thumb for interpreting this statistic, both players are between “above average” and “great defenders.” Wong’s advantage here equates to about 1/3 of a win. Again, no significant difference in their fielding abilities.

If we look at the Inside Edge Fielding statistics from FanGraphs, we see, as a whole, that Kolten makes more difficult plays, but Jedd makes the easier play a greater percentage of the time. For instance, look at the percentage of “unlikely” plays that each player made. An “unlikely” play is a play that is made 10-40% of the time. Kolten made 27% of these plays, while Jedd did not make a single one. At the same time, looking at plays that are “likely,” Kolten made 73% of them, while Jedd made significantly more (88%).

An analysis of these statistics shows us that, in the field, Kolten may make more web gems, but Jedd is the more consistent everyday second baseman. Nevertheless, there is not much separating these two on the defensive end.

At the Plate

At first, this part of the debate seems relatively simple. Gyorko led the team in HRs last year, he is clearly the much better hitter, right? Let’s take a look. At first glance, the players look very similar, with Jedd posting a line of .245/.301/.445 and Kolten producing a line of .254/.323/.375.

One key aspect of a platoon is starting the right-handed hitter against southpaws and vice versa. So let’s look at the zone profiles.

Kolten vs. Righties

Jedd vs. Lefties

Kolten is, far and away, the better hitter in this platoon in terms of average. But Gyorko’s greatest success was his power, right? So let’s look at the slugging zones.

Kolten slugging vs. Righties

Jedd slugging vs. Lefties

Although there are places where Jedd has the higher slugging percentage, Kolten has slightly lower, but similar zone ratings over a longer period of time. Even with advanced statistics, these two players are very difficult to separate.

By the eye test, Kolten seems to have the advantage in the field, but the statistics tell us that these two players are actually very similar. In addition, Jedd seems like the better hitter, but the statistics tell us that, again, they are very similar. Perhaps there is one thing that we can glean from this analysis: Kolten should be put in a place where he can reach base in front of players who drive the ball and Jedd should be placed where he can drive runners in.

To respond to the question asked at the beginning, should this platoon continue? The statistics tell us yes. As a younger player who just signed a large extension, Kolten has more upside. However, if we are to make a decision for this year, not the future, the numbers tell us that the platoon should continue because neither player has separated himself from the other.

Measuring Offensive Efficiency

Runs Created was one the first sabermetric statistics I took it upon myself to learn about.  After all, it was one of the first statistics developed by Bill James himself.  I am also pretty sure RC is the formula written on a whiteboard in Moneyball (the most influential Brad Pitt movie I have ever seen).  Anyways, Runs Created is not discussed much because there are other, more sophisticated alternatives – wRC, wRC+, etc.  I still appreciate RC because of its simplicity, and it is can still be used as an effective tool for measuring the efficiency of offensive production.

That is precisely what I set out to do.  The question I sought to answer with this study is, “which teams were the most efficient in scoring runs?”  A pretty basic question — which I decided to complicate.  Using team statistics from last year, I calculated the Runs Created for each team’s offense.  The largest separation between Runs Created and actual runs scored came from the San Diego Padres, who scored 686 times, despite “creating” only 621.38 runs.

While ranking in 19th in total runs, the Padres were actually incredibly efficient. I discovered this after trying to develop a way to measure offensive efficiency.  To do so, I created the Runs Conversion Rate (RCR).  While relatively rudimentary, this ratio between runs scored and Runs Created provides, in my mind, a good measurement for the efficiency of offenses.

Run Conversion Rate = Runs Scored / Runs Created

The purpose of this, again, is to gauge the overall efficiency of offenses.  All I really did was give a fancy name to the margin of error of Runs Created.  However, what I sought to do was use this statistic in a different way — to examine which teams made the most of what they produced (efficiency), and which did not.  Think of this article as a new way of looking at an old statistic, not me trying “discover” a new stat.  Below is a table, sorted by runs scored (i.e. from most productive offenses to least productive).  Green values represent teams in the top 10 of a category, and red the bottom 10.

2016 Run Conversion Rates
TEAM Runs Created Runs Scored Run Conversion Rate
Red Sox 905.26 878 0.970
Rockies 856.84 845 0.986
Cubs 790.93 808 1.022
Cardinals 784.92 779 0.992
Indians 770.06 777 1.009
Mariners 769.39 768 0.998
Rangers 755.83 765 1.012
Nationals 752.18 763 1.014
Blue Jays 759.72 759 0.999
D-Backs 775.15 752 0.970
Tigers 791.98 750 0.947
Orioles 768.79 744 0.968
Pirates 724.74 729 1.006
Dodgers 709.32 725 1.022
Astros 727.58 724 0.995
Angels 700.20 717 1.024
Giants 725.10 715 0.986
Twins 742.03 690 0.930
Padres 621.38 686 1.104
White Sox 713.38 686 0.962
Reds 699.02 678 0.970
Royals 685.69 675 0.984
Rays 701.08 672 0.959
Brewers 694.02 671 0.967
Mets 707.39 671 0.949
Marlins 695.80 655 0.941
Athletics 655.47 653 0.996
Braves 671.35 649 0.967
Yankees 690.17 647 0.937
Phillies 617.22 610 0.988

After looking at the table, I noted a few observations to be made: teams ranked top 10 in scoring and top 10 RCR last year were, for the most part, the best teams in the league, the two highest-scoring teams did not score as many runs as they could have, and some teams capped out their production, albeit not a high level of scoring.

First, let’s look at the teams who ranked top 10 in scoring and top 10 in RCR in 2016: the World Champion Chicago Cubs, the American League Champion Cleveland Indians, the Seattle Mariners (second in AL West), the Texas Rangers (AL West Champs), the Washington Nationals (NL East Champs), and the Toronto Blue Jays (AL Wild Card).  All these teams were both productive and efficient.  Both are key indicators of good ball clubs.  They created an equal balance of the two, and, outside of the Mariners, played postseason baseball.

While the last paragraph was basically a no-brainer, this is where the study got interesting.  The Boston Red Sox scored 878 runs last year — short of their roughly 905 “created” runs.  According to their RCR, they were only 97% efficient.  So, what does this mean? The Red Sox, while more productive than anyone else, did not hit their ceiling.  They came close (RCR of 0.970), but still only ranked in the middle third of offensive efficiency.  What if the post-Ortiz Red Sox put up around the same numbers they did last year, but became more efficient in doing so?  In my opinion, the AL East should be scared.  Other teams falling into the top 10 scoring, middle 10 RCR category are the Colorado Rockies, St. Louis Cardinals, and Arizona Diamondbacks.  The Rockies certainly receive a boost in production because they played 81 games in Coors Field.  The Cardinals and Diamondbacks, like the Red Sox, scored often, but not as often as they could have.  So maybe their problem is not a low ceiling, but rather getting away from their floor troubles them.

Our third group of relatively important teams in this study are those who ranked in the middle 10 in scoring and top 10 in RCR: the Pittsburgh Pirates, Los Angeles Dodgers, the Los Angeles Angels, and the San Diego Padres.  Essentially, these offenses were middle of the road in terms of productivity, but scored as many runs as possible given their level of production.  The Angels, ranked in the bottom 10 in Runs Created by their offense in 2016, but were second in RCR, scoring 2.4% more runs than they “created.”  The only team ahead them were the lowly San Diego Padres, who turned in 10.4% more runs.  The Dodgers, who won 91 games in a comparatively weak NL West division, were middle-of-the-road in terms of offensive production, and came in third in terms of RCR.  These teams were ruthlessly efficient, milking the most out of what their offense provided.

I do not know what qualities are common in high-RCR teams.  Maybe a high average with runners in position, a low number of runners left on base, or maybe just plain luck.  That could be the topic of an entirely different study, perhaps.

To sum things up, a high RCR was a common denominator in the teams who saw great success in 2016, and I would like to think it is useful in measuring the efficiency of teams’ offenses.  It will be exciting to see who will rise in 2017 as the most potent offense.  For me, it will be just as exciting to see who is the most efficient.


FanGraphs and were instrumental in the production of this article.  Theodore Hooper is an undergraduate student at the University of Tennessee in Knoxville.  He can be found on LinkedIn at or on Twitter at @_superhooper_

When Do Pitchers Try Harder?

Pitch counts have become an integral part of the game of baseball, so much so that it’s impossible to find a TV telecast that doesn’t display the pitch count side-by-side with the score and the inning. Yet pitch counts continue to be maybe the most annoyingly simple and arbitrary metric used to craft crucial in-game strategy. 99 mph fastball down the middle: +1 pitch. 76 mph curveball in the dirt: +1 pitch. Intentional ball: +1 pitch. Dirty ball tossed to the umpire: +0 pitches. Pitchout +1 pitch. Warmup pitches: +0 pitches. My goal here is not to fix this problem — just explore some interesting data that I believe should eventually be used to bring pitch count into the modern era.

Right now, I’m just going to look at 4-seam fastballs and how hard they’re thrown. All data comes from the 2016 regular season. Thank you Baseball Savant. The question I set out to answer is simple: When a pitcher needs to make a pitch, does he try harder? Common sense says yes, of course this is what happens. Relievers throw harder than starters in general because they don’t have to worry about throwing more quality pitches in later innings. But the data shows that pitchers change their effort levels within innings as well, especially when they have two strikes and/or runners in scoring position. Eventually, we should be able to use this knowledge to craft a better pitch count that takes this extra effort into account. Read the rest of this entry »

Zack Greinke and the Future of Pitching Contracts

Spending money is an interesting avenue to build a pitching staff. Many of the deals are conventional; a superstar pitcher around 30 years old gets a contract in the neighborhood of at least 7/$175M. But something unconventional is the nature of the contract that Zack Greinke signed with Arizona; 6/$206M. We have seen pitching contracts at or exceeding $175M several times in recent years; they have all been at least seven years in length. Never before has a contract in Major League Baseball history paid so much money in so little time. In fact, Greinke’s $34.5M take-home in 2016 was the highest single-season pay in Major League Baseball history. Now, with stricter luxury taxes in place, the higher average annual value (AAV) is certainly a unique burden on Arizona, but what about the burden of the seventh, eighth, or even ninth year of a deal for every other team? Arizona’s braintrust decided that, rather than having Greinke hamstring their payroll for seven or eight years, he will only do so for six, albeit at a slightly higher rate. I think they are onto something.

Here’s a look at every major pitching contract signed from the 2000-2011 seasons worth at least five years. Compare the values produced in the first four years of those deals to the value of the whole contract, and look at the following years as well.

Pitching Contracts and Subsequent Performance in $, 2000-2011 Seasons

Player Contract Value in Yrs 1-4 Value in Yr 5 Value in Yr 6 Value in Yr 7 Value in Yr 8 Value in Yr 9
Mike Hampton 8/121M 28M 4.6M 2.3M 5.3M .7M
Mike Mussina 6/87M 84M 12.2M 24.9M
Roy Oswalt 5/73M 99.5M 20.5M
Daisuke Matsuzaka 6/52M 46.2M .6M -2.2M
Chris Carpenter 5/63.5M 58M 36.5M
Barry Zito 7/126M 41.1M -4.1M 5.8M -3.4M
Carlos Zambrano 5/91.5M 57.6M 3.1M
Johan Santana 6/137.5M 74.4M 10.7M INJ
A.J. Burnett 5/82.5M 54.4M 31M
C.C. Sabathia 9/211M 147.4M 18.9M .9M 9.5M 21.2M N/A**
John Lackey 5/82.5M 44.7M 17.9M
Cliff Lee 5/120M 138.8M INJ
Jered Weaver 5/85M 53.1M -1.2M
C.J. Wilson 5/77.5M 54.4M INJ
John Danks 5/65M 20M -.8M
Gio Gonzalez 5/42M 109.8M 22.9M
Yu Darvish 6/60M 91.1M 21.6M N/A**

*all contract data via Baseball Reference, all valuations via FanGraphs by conversion of (fWar)($/fWAR)

** these seasons will be played out in 2017

Of course, there are some contracts in here that went south from the start. Mike Hampton, Barry Zito, and John Danks are the culprits here. You probably notice that in most cases, years one through four go completely according to plan! Some of the exceptions are due to injury, and those are Johan Santana and John Lackey. But even other injury victims, such as Yu Darvish and Chris Carpenter, were so valuable in two or three years that they held up their end of the bargain.

However, the second thing you’ll notice is how quickly values go down on this list after year four. Of the 17 samples we have here, there are only seven success stories in year five (Oswalt, Carpenter, Burnett, Sabathia, Lackey, Gonzalez, and Darvish). Two of those cases are unique, as A.J. Burnett experienced a career revitalization in Pittsburgh under Ray Searage, and Darvish was a young international free agent. Overall, the success rate isn’t encouraging. The real black marks are the years following that. We have 11 samples on hand, and aside from modest renaissances from Mike Mussina and C.C. Sabathia, you get some really ugly numbers.

With this chart now in context, it brings us to wonder why any pitcher is even offered a deal in excess of four years. It is just not worth having so much dead payroll for one to five years. In fact, looking at how successful the first four years are, the values already come pretty close to the original contract anyways. Did the Phillies or Cliff Lee ever consider a four-year contract in that same $120M range? Probably not, but Lee would have taken it, and the Phillies would have been better off. I’m sure C.C. Sabathia never received a 4/$120M offer from the Yankees, but it would have let him hit the market again to potentially cash in one more time, and New York would have still recouped 75% of the value they ultimately got from him in nine years.

Thinking in present times, here’s a chart in a similar vein, but examining pitching contracts in the length of at least more than four years signed from just the 2012 season alone. Remember, these players have pitched the first four years of these contracts…

Pitching Contracts and Subsequent Performance in $, 2012-Present Seasons

Player Contract Value in Yrs 1-4 Value in Yr 5 Value in Yr 6 Value in Yr 7 Value in Yr 8
Matt Cain 5/112.5M 7M N/A N/A
Cole Hamels 6/144M 122.7M N/A
Hyun-Jin Ryu 6/36M 55.5M N/A N/A
Anibal Sanchez 5/80M 82.7M N/A
Matt Harrison 5/55M -1.1M INJ
Felix Hernandez 7/175M 120M N/A N/A N/A
Adam Wainwright 5/97.5M 116.7M N/A
Justin Verlander 5/140M 122.5M N/A N/A N/A N/A

*all contract data via Baseball Reference, all valuations via FanGraphs by conversion of (fWar)($/fWAR)

…and we see more of the same. Year five for these eight pitchers is 2017, and how many are a good bet to produce? Verlander, Hamels, most likely Hernandez, and…possibly Wainwright? Matt Harrison’s career is already over due to injury. Lengthy DL stints have ruined Ryu and Cain. Wainwright and Hernandez have also dealt with injury woes. Anibal Sanchez hasn’t been an effective pitcher since 2014. And yet, years one through four look beautiful for everybody but our two outliers.  

The same unorthodox contracts could apply to these guys. Anibal Sanchez is on the 2017 payroll for $16.8M, but what if Detroit had signed him for 4/$80M? Equal value would have been produced, and he wouldn’t be an albatross in 2017. 4/$100M for Adam Wainwright? That is similar to our previous Cliff Lee scenario. If Seattle had offered King Felix 4/$120M, he would have taken it in the hopes of cashing in one more time, and the Mariners would have received good value, similar to the Yankees and Sabathia.

Let’s condense all of the data from both charts and see what averages we get.

Sample Size Average Length Average Value Average Value (Yrs 1-4) Average Annual Value (AAV) Average Annual Value (AAV) – 4/73.14M
25 5.68 Yrs 96.68M 73.14M 17.02M 18.29M

Examining the averages, what if the average pitching contract shifted from nearly 6/$96.68M to 4/$73.14? The players would lose $23.54M on average over the length of the contract, but gain close to two free-agency years. Presumably, two free-agent years would be worth more than that, making for a worthy trade-off. As for the teams, they would pay $1.27M more per year in AAV, but eliminate two years of dead payroll (for those of you calculating that at home, that’s [17.02×1.68] – [1.27×4] for an average gain of $23.51M). That is a worthy trade-off for them as well. In other words, teams save millions, and players make more millions.

These condensed contracts have virtually no true precedent, but the 6/$206M deal that Greinke signed is closer to them than the current industry standard. Of course, pitching deals signed around the same time as Greinke are completely in tradition with this century (Max Scherzer, Jordan Zimmermann, David Price, Jeff Samardzija, Johnny Cueto, Mike Leake, Wei-Yin Chen, Ian Kennedy, and Stephen Strasburg), but that makes this one contract so potentially revolutionary among its contemporaries.

If you are thinking of the player and team who may follow these footsteps, I would imagine the perfect test case to be Matt Harvey. The Mets pitcher proved that he is an All-Star-level hurler in his comeback 2015 season from Tommy John surgery, but was hampered again in 2016 and diagnosed with thoracic outlet syndrome. All of this speculation is for naught if Harvey’s career is going to fizzle out or if he will need to be relegated to the bullpen, but let’s say the next two years are a comeback for him in the mold of 2015. After 2018, he would hit free agency going into his age-29 season. He would theoretically be in line for a five- to seven-year deal, but I don’t think someone with “Tommy John surgery” and “thoracic outlet syndrome” on his resume is a wise investment for that long. What if instead of something in the 6/$150M range, it’s a deal for 3/$100M? 4/$130M? If his production is equal to that type of contract, he would still hit free agency at age 33 or 34 and be in demand; it’s quid pro quo.             

Only time will tell if front offices of the future will adopt this strategy, and the harsher luxury-tax penalties surely dampen the idea. However, a team with cash to spend is always a team in need of pitching; perhaps we will see their contracts truly begin to condense.

Hardball Retrospective – What Might Have Been – The “Original” 2001 Rangers

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.


OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams


The 2001 Texas Rangers 

OWAR: 48.4     OWS: 278     OPW%: .513     (83-79)

AWAR: 34.2      AWS: 219     APW%: .451     (73-89)

WARdiff: 14.2                        WSdiff: 59  

The “Original” 2001 Rangers placed third in the American League West behind Seattle and Oakland. Sammy “Say It Ain’t” Sosa (.328/64/160) established personal bests in batting average, runs scored (146), RBI and bases on balls (116) while placing runner-up in the MVP balloting. Rich Aurilia (.324/37/97) contributed career-highs in nearly every batting classification including 114 tallies and 206 safeties. Juan “Igor” Gonzalez (.325/35/140) achieved his third All-Star invite and finished fifth in the American League MVP race. Ivan “Pudge” Rodriguez (.308/25/65) merited his tenth straight Gold Glove Award. Jose Hernandez swatted 26 two-baggers and 25 big-flies. The “Actuals” lineup featured Alex Rodriguez (.318/52/135) who paced the circuit in four-baggers and runs scored (133). Rafael Palmeiro (.273/47/123) surpassed the century mark in walks and equaled his single-season HR total. Frank Catalanotto batted at a .330 clip and ripped 31 two-base hits.

Ivan Rodriguez rated thirteenth among backstops according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Rangers registered in the “NBJHBA” top 100 ratings include Sammy Sosa (45th-RF), Juan Gonzalez (52nd-RF) and Ruben Sierra (70th-RF). Moreover, Alex Rodriguez (17th-SS), Rafael Palmeiro (19th-1B), Ken Caminiti (25th-3B) and Andres Galarraga (42nd-1B) achieved the distinction among members of the “Actuals” roster.

  Original 2001 Rangers                              Actual 2001 Rangers

Rusty Greer LF -0.04 5.64 Frank Catalanotto LF 2.19 16.86
Mark Little CF 0.39 2.69 Gabe Kapler CF 0.85 12.52
Sammy Sosa RF 9.56 43.85 Ricky Ledee RF -0.48 2.21
Juan Gonzalez DH/RF 4.21 23.5 Ruben Sierra DH 0.82 9.21
Carlos Pena 1B 0.21 2.01 Rafael Palmeiro 1B 3.62 24.62
Benji Gil 2B/SS 0.99 6.69 Randy Velarde 2B 1.57 8.75
Rich Aurilia SS 5.46 32.44 Alex Rodriguez SS 8.2 34.67
Mike Lamb 3B -0.03 6.37 Mike Lamb 3B -0.03 6.37
Ivan Rodriguez C 3.92 19.8 Ivan Rodriguez C 3.92 19.8
Rey Sanchez SS 2.37 13.45 Michael Young 2B 0.09 6.32
Jose Hernandez SS 2.7 12.63 Rusty Greer LF -0.04 5.64
Ruben Sierra DH 0.82 9.21 Bill Haselman C 0.09 3.71
Chad Kreuter C 1.28 9.03 Ken Caminiti 3B -0.07 3.66
Dean Palmer DH 0.14 4.53 Andres Galarraga DH -0.71 3.22
Bill Haselman C 0.09 3.71 Chad Curtis CF 0.14 2.04
Jeff Frye 2B -0.45 3.61 Carlos Pena 1B 0.21 2.01
Fernando Tatis 3B -0.26 2.25 Doug Mirabelli C -0.09 1.31
Ruben Mateo RF -0.61 1.31 Ruben Mateo RF -0.61 1.31
Andy Barkett LF 0.11 1.26 Scott Sheldon 3B -0.56 0.89
Kevin L. Brown C 0.11 1.09 Bo Porter LF -0.18 0.77
Craig Monroe RF 0.03 0.61 Craig Monroe RF 0.03 0.61
Warren Morris 2B -0.43 0.53 Mike Hubbard C 0.06 0.41
Cliff Brumbaugh RF -0.39 0.24 Marcus Jensen C -0.27 0.29
Scott Podsednik LF -0.06 0.04 Chris Magruder LF -0.32 0.12
Kelly Dransfeldt SS -0.03 0.04 Kelly Dransfeldt SS -0.03 0.04
Cliff Brumbaugh RF -0.16 0.02

Kevin J. Brown (10-4, 2.65) fashioned a 1.141 WHIP in an abbreviated season (19 starts). Robb Nen (3.01, 45 SV) struck out 93 batters in 77.2 innings and topped the circuit in saves. Jeff Zimmerman (2.40, 28 SV) was nearly unhittable out of the bullpen, producing a 0.897 WHIP.

  Original 2001 Rangers                            Actual 2001 Rangers 

Kevin J. Brown SP 2.66 10.63 Doug Davis SP 2.6 9.25
Doug Davis SP 2.6 9.25 Rick Helling SP 1.67 8.01
Jim Brower SP 1.47 8.13 Darren Oliver SP -0.06 3.78
Rick Helling SP 1.67 8.01 Kenny Rogers SP -0.37 1.96
Ryan Dempster SP 0.53 7.65 Aaron Myette SP -0.79 0.19
Robb Nen RP 1.3 13.82 Jeff Zimmerman RP 3.13 13.09
Jeff Zimmerman RP 3.13 13.09 Mike Venafro RP 0.24 4.77
Danny Patterson RP 1.29 6.66 Pat Mahomes RP -0.13 3.56
Scott Stewart RP 0.73 5.53 Juan Moreno RP 0.44 3
Mike Venafro RP 0.24 4.77 Chris Michalak RP 0.49 1.82
Darren Oliver SP -0.06 3.78 Danny Kolb RP 0.16 0.85
Bobby Witt SP 0.5 2.53 Jeff Brantley RP 0.09 0.68
Kenny Rogers SP -0.37 1.96 J. D. Smart RP -0.15 0.26
Scott Eyre RP 0.34 1.82 Mark Petkovsek RP -1.42 0.22
Brian Bohanon SP 0.11 1.78 Francisco Cordero RP 0.06 0.1
Joey Eischen RP 0 1.27 Rob Bell SP -1.14 0.08
Danny Kolb RP 0.16 0.85 R. A. Dickey RP -0.17 0.01
Luis Pineda RP -0.01 0.65 Kevin Foster RP -0.32 0.01
Mark Petkovsek RP -1.42 0.22 Joaquin Benoit SP -0.2 0
Billy Taylor RP 0.01 0.1 Tim Crabtree RP -0.39 0
R. A. Dickey RP -0.17 0.01 Justin Duchscherer SP -0.8 0
Joaquin Benoit SP -0.2 0 Ryan Glynn SP -0.51 0
Ryan Glynn SP -0.51 0 Jonathan Johnson RP -0.44 0
Jonathan Johnson RP -0.44 0 Mike Judd SP -0.33 0
Brandon Knight RP -0.54 0 Brandon Villafuerte RP -0.51 0
Matt Whiteside RP -0.61 0

 Notable Transactions

Sammy Sosa 

July 29, 1989: Traded by the Texas Rangers with Wilson Alvarez and Scott Fletcher to the Chicago White Sox for Harold Baines and Fred Manrique.

March 30, 1992: Traded by the Chicago White Sox with Ken Patterson to the Chicago Cubs for George Bell.

Rich Aurilia

December 22, 1994: Traded by the Texas Rangers with Desi Wilson to the San Francisco Giants for John Burkett.

Juan Gonzalez

November 2, 1999: Traded by the Texas Rangers with Danny Patterson and Gregg Zaun to the Detroit Tigers for Alan Webb (minors), Frank Catalanotto, Francisco Cordero, Bill Haselman, Gabe Kapler and Justin Thompson.

November 1, 2000: Granted Free Agency.

January 9, 2001: Signed as a Free Agent with the Cleveland Indians. 

Robb Nen

July 17, 1993: Traded by the Texas Rangers with Kurt Miller to the Florida Marlins for Cris Carpenter.

November 18, 1997: Traded by the Florida Marlins to the San Francisco Giants for Mick Pageler (minors), Mike Villano (minors) and Joe Fontenot.

Rey Sanchez 

January 3, 1990: Traded by the Texas Rangers to the Chicago Cubs for Bryan House (minors).

August 16, 1997: Traded by the Chicago Cubs to the New York Yankees for Frisco Parotte (minors).

November 3, 1997: Granted Free Agency.

January 22, 1998: Signed as a Free Agent with the San Francisco Giants.

November 5, 1998: Granted Free Agency.

December 11, 1998: Signed as a Free Agent with the Kansas City Royals.

October 29, 1999: Granted Free Agency.

December 7, 1999: Signed as a Free Agent with the Kansas City Royals. 

Jose Hernandez 

April 3, 1992: Selected off waivers by the Cleveland Indians from the Texas Rangers.

June 1, 1993: Traded by the Cleveland Indians to the Chicago Cubs for Heathcliff Slocumb.

July 31, 1999: Traded by the Chicago Cubs with Terry Mulholland to the Atlanta Braves for a player to be named later, Micah Bowie and Ruben Quevedo. The Atlanta Braves sent Joey Nation (August 24, 1999) to the Chicago Cubs to complete the trade.

November 5, 1999: Granted Free Agency.

December 16, 1999: Signed as a Free Agent with the Milwaukee Brewers.

Kevin J. Brown 

October 15, 1994: Granted Free Agency.

April 9, 1995: Signed as a Free Agent with the Baltimore Orioles.

November 3, 1995: Granted Free Agency.

December 22, 1995: Signed as a Free Agent with the Florida Marlins.

December 15, 1997: Traded by the Florida Marlins to the San Diego Padres for Steve Hoff (minors), Derrek Lee and Rafael Medina.

October 26, 1998: Granted Free Agency.

December 12, 1998: Signed as a Free Agent with the Los Angeles Dodgers.

Honorable Mention

The 2007 Texas Rangers 

OWAR: 36.9     OWS: 249     OPW%: .496     (80-82)

AWAR: 27.8      AWS: 225     APW%: .463     (75-87)

WARdiff: 9.1                        WSdiff: 24  

Texas finished a distant sixteen games behind Seattle in ’07. Carlos Pena (.282/46/121) registered 99 tallies and achieved personal-bests in virtually every offensive category. Mark Teixeira tagged 30 long balls, drove in 105 baserunners and contributed a .306 BA. Ian Kinsler swiped 23 bases in 25 attempts, scored 96 runs and clubbed 20 dingers during his sophomore season. Travis “Pronk” Hafner blasted 24 dingers and eclipsed the century mark in RBI for the fourth consecutive campaign. Ivan Rodriguez drilled 31 two-base hits while third-sacker Edwin Encarnacion delivered a .289 BA with 16 jacks. Aaron Harang (16-6, 3.73) posted a career-best 1.144 WHIP and placed fourth in the Cy Young balloting. Joaquin Benoit whiffed 87 batsmen over 82 innings while furnishing a 2.85 ERA along with a WHIP of 1.171.

On Deck

What Might Have Been – The “Original” 2003 Indians

References and Resources

Baseball America – Executive Database


James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database 

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

The 2017 Atlanta Braves: A .500 Team?

The 2016 Atlanta Braves were built to suck.  After all, starting a season 0-9 basically kills any hope left in the fan base, and gets them prepared for the contagious losing.  For the few fans who paid to go see their beloved Braves play in the now retired Turner Field, losing 93 games is heartbreaking.  A large volume of articles exists detailing the extent at which the Atlanta Braves, under both John Hart and John Coppolella, are remodeling their organization.  This article serves the purpose of examining one thing:

2016 Atlanta Braves



Runs Scored Runs Against

Run Differential

First Half

31-58 307 414 -107

Second Half

37-35 342 265


That’s right! The 2016 second-half Atlanta Braves won more games than they lost!  If you did not already know this you either (a) are not a Braves fan, or (b) could not manage to care less.  However, this could have some real value behind it.  While the Braves managed to be outscored by 20 runs in the second half, they still managed to win two more games than they lost.  They scored 35 more runs in 17 fewer games.  Their runs/game increased 3.45 to 4.75, which would have placed them in between the Mariners and Cardinals in that regard had it been 4.75 the entire 2016 season.  The most important takeaway is how much better the second-half Braves were at preventing runs — 149 fewer runs allowed than in the first half.  Shaving off that many runs in only 17 fewer games is huge.

But let’s not get ahead of ourselves.  A winning record is unsustainable at a deficit of 20 runs in 72 games.  But I am not asking whether the 2017 Atlanta Braves can win even 82 games.  Can they win 81?  Could the great finish down the stretch of the 2016 season carry over into 2017?  While going .500 is technically meaningless because a .500 team will not make the playoffs, not losing more than they win in the new SunTrust Park will energize the organization and the fan base, and prepare the team for future success.

When the 2016-2017 offseason kicked off, the Braves signed two popular starting pitchers, and acquired one via trade, to eat innings so their crop of young pitching could ripen on the farm.

Braves 2017 Offseason Acquisitions (2016 Statistics)





Bartolo Colon

15-8 3.43 3.99 1.50 6.01 1.21


R.A. Dickey

10-15 4.46 5.03 3.34 6.68 1.37


Jaime Garcia 10-13 4.67 4.49 2.99 7.86 1.37


Two of the three had subpar years in 2016.  The other one became an internet sensation for his antics in the batter’s box and even hit a homer against the San Diego Padres.  But let’s assess what each pitcher brings to Atlanta’s rotation.

Bartolo Colon ages like a fine wine.  His ERA was better last year than any Atlanta starter except Julio Teheran.  While pitching record is not a statistic to measure performance, it is worth noting he won more games last year than any Atlanta starter.  He was better pretty much across the board than anyone not named Julio Teheran.  But can he keep this level of production up?  I would like to think so.  His two-seam velocity has stayed relatively consistent over the past three years.  All the Braves should ask Colon to do is turn in around 20 quality starts (he turned in 19 last year).  Consistency was a hallmark of his time with the Mets, and should continue in Atlanta for at least the 2017 season.

The other old guy the Atlanta Braves picked up this offseason happens to be knuckleballer — R.A. Dickey, 2012 National League Cy Young Award winner.  While Dickey will more than likely not be in the running for any hardware as he nears his 43rd birthday, he can still meet the immediate needs of his new team.  From 2011 to 2015, Dickey’s lowest inning count was 208.2, and peaked in his legendary 2012 with 233.2 innings pitched.  This is what the Braves need.  They need Dickey to turn in a mountain of good, quality innings.  If he could get over 200 innings again, and remain viable at the big-league level, then it is mission accomplished.

The third major addition to the Atlanta rotation is southpaw pitcher Jaime Garcia.  On December 1 of last year, the St. Louis Cardinals accepted minor-league infielder Luke Dykstra, right-handed pitcher John Gant, and righty Chris Ellis for Garcia’s services.  First, let’s look at the positives of this — Garcia is a definite mid-rotation talent, who posted a 3.73 ERA in 31.1 IP and a 3.18 ERA in 28.1 IP in April and May of last year, respectively.  He gives Atlanta a lefty in a rotation filled with righties.  The downside?  His low ERAs early in the season turned into a 5.40 ERA in June and a 5.60 ERA in the second half of the season.  So much for success in the second half driving this article, right?  Let’s remain optimistic.  After all, that is the whole purpose of this.  Garcia’s HR/FB rate was up from 7.1% in 2015 to a ghastly 20.2% in 2016.  He got consumed by the league-wide power surge.  I do not think such a high rate is sustainable or will happen again.

Let’s make a prediction.  Bartolo Colon makes us all fall back in love with “The Great Bart-Bino” all over again and he turns in around 16-20 quality starts for the upstart Braves.  Dickey, the workhorse of the staff, follows suit and dizzies batters with his knuckler for over 200 innings.  Garcia returns to early-2016 form, and posts something in the ballpark of 1.5 WAR.  Of course, the likelihood of all three scenarios playing out is small, but what I am trying to get across is it is possible.

Now, time to switch gears. The Braves lineup has changed its look dramatically since this time last year, sticking with a solid mixture of recognizable names and some guy named Dansby Swanson.  Here is a look at their projected Opening Day lineup:

2017 Atlanta Braves


Name Bats 2016 WAR

Projected WAR


Ender Inciarte L 3.8



Dansby Swanson R 0.9



Freddie Freeman L 6.5



Matt Kemp R 0.0



Nick Markakis L 1.7



Brandon Phillips R 0.8



Adonis Garcia R 0.2



Tyler Flowers

R 0.3


The projected WAR was retrieved from ZiPS projection

Look at the first half of their lineup.  To me, those three guys, Inciarte, Swanson, and Freeman, look like the core of a team poised to wreak havoc on the NL East before the end of this decade.  It is hard to project exactly what we are going to get out of Dansby Swanson, but most Braves fans and analysts expect him to take reign as the face of the franchise.

Starting in the leadoff spot is Ender Inciarte, who was brought over as icing on the cake in the Shelby Miller trade that landed Swanson and pitching prospect Aaron Blair.  In his first year in Atlanta, Inciarte posted a .732 OPS and won a Gold Glove for his outstanding play in center field.  I really could not think of a better leadoff guy for the Braves.  He is signed through 2021 at a team-friendly cost of $30.5 million, with a $9-million team option in 2022.  In his first years in the bigs, Inciarte has played in at least 118 games, posted a WAR above 3.7 (produced a figure of 5.3 in 2015), and shows no sign of slowing down as his prime years lay ahead. What if he crosses the 3.0 WAR plateau for the fourth time in four seasons, and maybe even adds another Gold Glove?  That is all his organization needs out of him.

Inciarte is a vital part of the Braves defense, which, according to 2017 PECOTA projections, leads the NL East in Fielding Runs Above Average (they are projected to attain an average figure of 3.6, while the other four teams are either at 0.0 or negative). explains FRAA as an “individual defensive metric created using play-by-play data with adjustments made based on plays made, the expected numbers of plays per position, the handedness of the batter, the park, and base-out states.”  In short, the higher the number, the better the fielder, and vice versa.  The higher the team average, the better the team is overall in the field.  In his Gold Glove campaign, Inciarte registered a FRAA of 23.0, according to BP.  The graduation of Dansby Swanson and the addition of web-gem-prone second baseman Brandon Phillips will certainly strengthen the middle cone of the field.  Just how good is this team going to be at preventing runs?  Many projection systems think they will be around the top of their division, and many fans are excited to see the double-play tandem of Swanson and Phillips at work.

Freddie Freeman is the undisputed anchor of the lineup, and has finally seen the Braves ADD instead of SUBTRACT from the lineup around him.  The addition of Matt Kemp has helped tremendously.  With a recognizable slugger swinging behind Freeman, managers and pitchers had to pitch to him in the latter months of the year. With Kemp slotted behind him, Freeman hit to the tune of a .340/.456/.665 slash with 16 home runs and 18 doubles.  Kemp also matched the theme of this article with a strong second half — hitting .280/.336/.519 with 12 bombs in 241 plate appearances as a Brave.  The duo should have Braves fans excited for a full season of similar production from Freeman if Kemp is behind him.  Kemp, on the other hand, has a lower bar to pass, and could re-tool his value as an offensive player in his first full year off the West Coast.

So why is it unreasonable for the 2017 Atlanta Braves to win 81 games?  I do not think it is that far-fetched.  This article has not mentioned their incredibly deep farm system, which includes guys such as Ozzie Albies, Sean Newcomb, and Lucas Sims, but instead focuses on the immediate roster — a roster which has the potential to do unexpected things in 2017.  The dominoes would have to fall in all the right places, but this is baseball.  Anything is possible.

Theodore Hooper’s Official 2017 Atlanta Braves Prediction: 81-81


The statistics used in this study were found on, and, and the rosters on were a great help in referencing players and transactions. 

Let’s Build Our Own Catch Probability Metric

By now you’ve seen the Statcast Catch Probabilities. They’re great! Or, at the very least, they’re a shiny new toy to play with until the regular season rolls around. But, as you may have noticed, there are a few frustrating details about it — namely, the actual math behind the statistic is completely opaque, and the details about when an individual catch happened are hard to find. So let’s fix those two problems! We’ll create a catch probability metric that anyone can compute in Excel, using data that anyone can download easily.

You may have noticed a problem with this plan, though — the data that is used for the official Statcast catch probability isn’t easily accessible. We’ll have to make do with what we can get from the Statcast search at Baseball Savant. Specifically, instead of using hang time and distance traveled, we’ll use exit velocity and launch angle. Note that this completely disregards defensive positioning and it even disregards the horizontal angle off the bat*! It’s going to make for a less perfect metric, of course, but (spoiler alert) it will turn out okay.

*This really makes more sense if you think about it in terms of probability of the hitter making an out. The old saying goes “hit ’em where they ain’t” but in recent years we’ve come to understand that it’s really “hit it hard and in the air.”

I’m not going to go into the details of how I computed this metric; it’s standard machine learning stuff. If you want to follow along with the computation, I’ve put my code up on GitHub. Instead of going through all that here, I’ll just jump to the finish line: the formula for catch probability ends up being

1/(1+exp(-(-10.152 + 0.057 * hit_speed + 0.218 * hit_angle)))

Now you might be worried that such a simple formula, excluding tons of information, might be totally worthless. I was worried about that too! But applying this formula to a test set revealed this formula to be surprisingly accurate:

Catch Probability Assessment
Statistic Value
Accuracy 0.8385
Precision 0.8338
Recall 0.8671
F1 0.8501

(if you’ve never seen those numbers before — closer to 1 is better. Trust me, it’s pretty good.)

Well, that’s all well and good, but how can you get this for yourself and play around with it? Start by downloading the data you’re interested in from Baseball Savant. For instance, you can get all the data from, say, May 1 of last year by going here. Download the CSV with the link at the bottom and then you can simply add the above formula in a new column in Excel. If you need a concrete example of how this looks in Google Sheets, I’ve put one here.

Okay, now you’ve got this, but what are you going to do with it? One possibility is to use this to try to figure out which plays the official metric estimated as being difficult. For instance, let’s say you’ve noticed that Miguel Sano made two highlight-quality plays but you don’t know Mike Petriello well enough to ask him which ones those are. Just compute your own probabilities and you’re off! Although, as expected, the numbers differ. Our numbers do have Sano making two plays in the 0-25% range, but they’re not the same ones that Statcast flagged (sorry about the quality of the GIFs).

Catch #1: estimated catch probability 18.3%

Catch #2: estimated catch probability 21.3%

The Twins announcers praised his first step in the former video, while in the second they talked about how the ball “hung up” for Sano to be able to catch it. Not spectacular plays by any means, but neither were the other two, of course.

Finally, because I’m sure you’re curious, here’s the top catch of 2016 according to this metric (estimated catch probability: 8.6%).

Of course it’s a Kevin Kiermaier catch. Hey, at least we know we’re doing something right.

Desert Optimism

I recently had the opportunity to tour Chase Field, home of the Arizona Diamondbacks.  While there, I saw a lot of banners for Zack Greinke.  After all, he is the face of the franchise (if you’re not considering Paul Goldschmidt).  After signing a six-year/$206.5-million contract before the 2016 season, Greinke changed the focus and the philosophy of the Diamondbacks.  Suddenly, they were contenders.

After signing Greinke, the D-Backs traded for Shelby Miller, who was coming off what many considered one of the best years in baseball.  However, his price was laughable.  It cost Arizona top prospect Dansby Swanson, who has emerged as a candidate for a franchise player in Atlanta.  They also coughed up Ender Inciarte, a very capable center fielder who posted a .732 OPS and won a Gold Glove in 2016.  But wait, there’s more! The Braves also received pitching prospect Aaron Blair.

The purpose of this study is not to criticize former General Manager Dave Stewart’s transactions.  After all, he truly believed, after signing ace Zack Greinke, the Diamondbacks were in a position to win — and rightly so.  Stewart felt, as did many people inside the Arizona organization, their core was established.  Below is their lineup in 2016, with the players being who played the most at their position:

POSITION Name 2016 WAR Total
C Welington Castillo 2.4
1B Paul Goldschmidt 4.8
2B Jean Segura 5.7
3B Jake Lamb 2.6
SS Nick Ahmed 0.2
LF Brandon Drury 0.0
CF Michael Bourn 0.3
RF Yasmany Tomas -0.4
Total 15.6


AJ Pollock, who was coming off an All-Star season in which he produced 7.4 WAR and posted an .865 OPS, played in 12 games.  Inciarte was traded to the Braves after providing 5.3 WAR playing right field in 2015.  David Peralta, who started in left field in 2015, played in 48 games last year.  Nick Ahmed also had an injury-plagued season following a strong 2015 in which he put up 2.5 WAR in his first full year in the MLB.

The injuries to Pollock, Peralta, and Ahmed were unfortunate.  The Diamondbacks got near or around replacement-level production from their positions in 2016.  In a hypothetical situation, let’s say the three guys stay healthy, and, after subtracting their counterparts’ production, up the total runs scored by the Diamondbacks from 752 to 790 runs.  After some number crunching, the Diamondbacks’ Pythagorean expectation comes out to around 71 wins.  Give or take a few, a healthy trio of Pollock, Peralta, and Ahmed would have helped Arizona’s win expectation increase by between two and five games.

But let’s be optimistic — the hypothetical healthy trio helps Arizona to an expected 74-88 record, far better than their 69-93 actual record.  That would have moved them up in the standings from fourth in the NL West to…drum roll please…fourth in the NL West.  The problem Arizona experienced in 2016 was run prevention, not run support.  As a matter of fact, total runs increased to 752 from 720 in 2015, when they went 82-80.  However, the real increase was in runs allowed — up to 890 (!!!) in 2016, as opposed to 713 in 2015.

So why does a pitching staff that added Zack Greinke, a bonafide ace and top-tier talent, and Shelby Miller, who would fit well in the center of any rotation, give up such a whopping number of runs?  Catching.  Below is a chart of how many runs these two respective pitchers had prevented or added by their respective catchers in 2015:

Pitcher Team Catcher Framing Runs Rank
Zack Greinke LAD Yasmani Grandal +23.3 1st
Shelby Miller ATL AJ Pierzynski -8.7 103rd


As you can see, any pitcher would love to pitch to Yasmani Grandal.  In 2015, he ranked as the best in framing runs.  Essentially, what the statistic does is quantify the catcher’s ability to get strikes called, which is incredibly valuable to a staff.  Positive is good and negative is bad.  While there is not as direct a correlation between Shelby Miller’s success and AJ Pierzynski’s lack of pitch-framing ability, it is apparent there is a direct link between Greinke’s 2015 performance and Yasmani Grandal.

In 2016, Greinke and Miller both joined a staff caught by Welington Castillo.  The best way to describe Welington is he’s an offense-first, defense-second catcher.  The theme of this study is to advocate for the use of defense-first, offense-second catchers.  Look at this chart of past World Series champion catchers:

Year Team Name Framing Runs Rank
2012 SFG Buster Posey +20.0 4th
2013 BOS Jarrod Saltalamacchia -4.6 93rd
2014 SFG Buster Posey +21.5 2nd
2015 KCR Salvador Perez -7.5 99th
2016 CHC Miguel Montero +14.6 4th

After looking at that chart, there are a couple of observations to make.  One, three out of the five previous World Series teams have had top-four catchers in terms of pitch framing and pitch presentation.  Second, Jarrod Saltalamacchia was replaced by AJ Pierzynski who was replaced by Blake Swihart who is now competing with Sandy Leon and Christian Vazquez, both of whom are defense-first catchers lauded for their ability to frame pitches.  Third, Salvador Perez is the heart and soul of the Kansas City Royals, and I guarantee Dayton Moore could not care less about his pitch-framing abilities.

Essentially, what you should take away from this is teams that win have skilled catchers.  Luckily for the Giants, Buster Posey can also hit the baseball.  To bring this full circle back to the Diamondbacks — Wellington Castillo is the wrong type of catcher.  He does not frame like Posey or Montero, and the bat is nothing too special.

But alas! Castillo is no longer part of the Arizona organization! This offseason, freshly-appointed general manager Mike Hazen has added four new catchers to the picture: Chris Iannetta, Jeff Mathis, Hank Conger, and Josh Thole.  Let’s look at their pitch-framing stats from last year:

Name Team Framing Chances Framing Runs Rank
Chris Iannetta SEA 5,495 -13.8 102nd
Jeff Mathis MIA 2,248 +7.2 15th
Hank Conger TBR 2,366 +3.6 25th
Josh Thole TOR 2,410 +4.6 21st


As you can see, the Diamondbacks have added a starting catcher who is not very good at framing pitches and three back-ups who do or might fit the desirable profile of this study.  Chris Iannetta signed a $1.5 million, one-year deal; Mathis signed a $4 million, one-year deal; the other two are minor-league contracts. Hazen, who came over to the Diamondbacks from the Boston Red Sox (who are leaning towards more defense-first options at catcher), made some efforts to boost his catching corps’ defensive ability, but was it enough?

In a perfect world, I think a guy like Jason Castro fits the bill perfectly in Arizona.  While the financial situations in Arizona may have made the price for Castro too high, he fits the type of catcher this study calls for, and the type of catcher Zack Greinke and Shelby Miller deserve.  He tallied +16.3 framing runs in 6,623 chances in 2016, good for third in MLB behind Buster Posey and Yasmani Grandal.  He signed for $24.5 million over three years with the Minnesota Twins, and will surely help their young staff develop.

Let’s not dwell on the hypotheticals.  The Diamondbacks have five and a half million dollars invested in two guys: Chris Iannetta and Jeff Mathis.  While Iannetta had an abysmal year in 2016 in terms of framing runs, his track record is mixed.  In 2013, for example, he recorded a framing-runs figure of -16.6, which is comparable to his 2016 number.  In 2015, however, he recorded a figure of +13.1, good for fifth in all of baseball.  What caused such a dramatic, roller-coaster shift?  I do not know — that question could be the subject of an entire different study.

Should Iannetta get most of the starts, I would say Mike Hazen would not care if he hits below the Mendoza line if his defensive statistics match his 2015 numbers.  Should he not get most of the starts at catcher, they will more than likely go to veteran backstop Jeff Mathis.  Mathis, who is lauded for his skills behind the plate, is essentially a cheap Jason Castro.  If you divide the number of framing runs Mathis achieved in 2,248 chances last year, and multiply the decimal by Castro’s number of chances, you get around a number of +21.2 framing runs.  That would have ranked him third behind Grandal and Posey.  Of course, this method is unreliable because every chance is another chance for his framing runs to drop as well as increase.  With that being said, the efficiency of Mathis behind the plate makes giving him a chance to handle the Diamondbacks’ staff worthwhile.

The addition of Taijuan Walker, who was the return on shipping Jean Segura to Seattle, is a healthy investment in the pitching staff.  With him slotting in along with Zack Greinke, Shelby Miller, Robbie Ray, and Patrick Corbin, the Diamondbacks have the makeup of a sleeper-type rotation — one that could surprise a lot of people in 2017.  If the front office has embraced the importance of defense at the catcher position like their offseason moves suggest, their staff could cut down on runs allowed dramatically, putting their lineup in position to do some damage in the NL West this year.

One team who should be noted in this study is the Houston Astros.  Whether Jeff Luhnow’s front office emphasized framing runs and having defensively-elite catchers or not, two of the catchers mentioned in this study were teammates in Houston — Jason Castro and Hank Conger.  Castro and Conger were the only two backstops on the 2015 Houston Astros, the year Dallas Keuchel won the American League Cy Young award.  This serves the purpose of further validating the benefits a defense-first catcher can have on a pitching staff.

In conclusion, baseball is trending toward sacrificing offense for defense at a premium position.  One club that can change the face of their organization by embracing the principles outlined in this study is the Arizona Diamondbacks.  While the Diamondbacks may face public scrutiny for far after Shelby Miller and Zack Greinke are gone, fans should be optimistic about 2017.  An elite defensive catcher can make a world of difference in the performance of a pitching staff.


The statistics used in this study were found on, the historical rosters and statistics were found on and, and was a great help in referencing players and transactions.

When Do Managers Use the Hook?

For the uninitiated, this piece heavily relies on my previous work around refining the inning/score matrix to quantify bullpen usage, and more recently, using RE24 to adjust the score differential for the base/out state in cases where the pitcher is not entering into a “clean” inning.

In that most recent piece, I concluded by alluding to a sort of “leaderboard” for base/out state adjustments. One hypothesis that you might have – certainly, one that this author had – was that we might see elite non-closers at the top of the list, implying that those pitchers are being brought in with runners on base more often than usual. Although closers are generally among the most highly-regarded relief pitchers in the game, the managerial status-quo has been to use closers almost exclusively in the “clean inning” state entering the 9th. Thus, while closers might not lead in terms of score adjustments due to inherited runners, an elite setup man certainly might.

Without further ado, here’s what that leaderboard looked like in 2016.

Largest Average Negative Score Adjustments
Player Team # Apps Mean Adj. Score Mean Adj. Inn Score Diff Inn Diff
Colton Murray PHI 24 -2.30 6.90 -0.22 0.15
Chaz Roe ATL 21 -0.73 7.57 -0.21 0.11
Gavin Floyd TOR 28 0.54 8.04 -0.21 0.11
Dean Kiekhefer STL 26 -1.78 7.59 -0.21 0.13
Alex Wilson DET 62 0.18 6.97 -0.19 0.13
Carl Edwards CHC 36 1.31 7.84 -0.19 0.15
James Hoyt HOU 22 -1.77 7.26 -0.18 0.26
Jordan Lyles COL 35 0.68 7.34 -0.18 0.09
Tommy Layne NYY 29 0.83 7.49 -0.17 0.25
Matt Bowman STL 59 1.08 7.28 -0.17 0.06

So… this isn’t exactly what I thought I’d find. There aren’t any closers in this group, but there really aren’t many top-flight middle relievers, either. If anything, this group came in when the team was tied or trailing more often than not. What’s going on here?

What we can’t discern is whether mid-inning appearances tend to be high-leverage affairs. There are most certainly cases where long men are used in the middle of the 4th inning to relieve an ineffective starter. That situation isn’t interesting in a vacuum; but it may be interesting to know what portion of those mid-inning appearances are of this low-leverage variety, and which are of the high-leverage variety.

One way that we can answer this question is to stratify qualifying relief pitchers by their average inning when entering the game. To accomplish this, let’s define a “closer” as a pitcher with an average inning of 8.5 or higher, and a “middle reliever” as a pitcher with an average inning between 7 and 8.5. Then we can look at the percentage of appearances for each group which were not “clean” innings.

(Click the graph for an interactive version)

As you might expect – even if you vehemently disagree with the practice – closers very rarely enter the game mid-inning. 85-90% of their appearances come in clean innings. Middle relievers, on the other hand, come into the game at the start of an inning closer to 60-65% of the time. That number has been on the rise recently, which seems a bit odd, or at least, at odds with what we’ve seen in the postseason recently (more on that in a bit).

Some small percentage of the time – the area between the lines of the same color – pitching changes are made with 1 or 2 outs in the inning but with no one on base. This is probably not optimal: The pitcher coming into that situation has an easier-than-average job, as they’re essentially getting a shortened inning to work through. If a guy like Dellin Betances can face 300 batters in a season, why waste 20 of them on situations that are easier than average?

The orange lines represent a subset of the overall middle relief group where the team in question is either tied or has no greater than a 3-run lead, in either the 7th or 8th inning. These are situations of high importance and leverage. An effective manager might be employing mid-inning pitching changes more often in these situations in order to limit damage and preserve leads.

Yet, this subset isn’t very different than the overall middle relief group. Whatever difference exited in 2012 and 2013 has been eroded in the last few years, as part of a general trend: Mid-inning appearances in the regular season are becoming less common.

As a final step, let’s contrast this picture of usage with an analogous graph on postseason appearances. We’ll maintain the same definitions of “closer” and “middle reliever” for consistency.

(Click the graph for an interactive version)

Chaos! This graph looks more disorganized than the regular-season version, but then again, the postseason is more chaotic in general. We’re dealing with smaller samples and we can’t put too much faith into these trends. That said, two things stand out when comparing postseason usage to regular-season usage:

  • Closers are no longer treated as a special species. Even through 2014, closers were entering postseason games in clean innings about 80% of the time. In the postseason! When the managers are paying attention! When there are high-leverage situations at every turn! But in the past two seasons, closers have been used increasingly with runners on base – in fact, even more so than middle relievers have in close/lead situations during that time. Again, small samples, but this screams efficiency. If your closer is your most effective weapon, you should be using him with runners on base and a late lead, instead of using your second-most effective weapon instead.
  • Middle relievers have been used more often in “matchup” situations. 2014 and 2016 stand out in this regard, and it probably has something to do with guys named Bochy and Maddon representing large shares of the sample in those years. Recall that the gap between the dotted and solid lines of the same color represents the frequency of “1+ out, 0 on” appearances. Those gaps are huge in 2014 and 2016! While mid-inning appearances among all classes of pitchers were highest in 2016, that’s not the case at all for “men on base” appearances, which were more or less in line with historical norms. This represents an increase in match-up-based thinking, not leverage-based thinking.

These graphs look different, and they probably always will. Teams have relatively fewer resource constraints in the bullpen come October. They have more days off between games, and fewer games to budget resources for in the future.

That said, there’s been no carryover at all from the wild, and relatively new, bullpen management seen in the postseasons of 2015 and 2016. Constraints will limit the extent to which managers can call upon their best arms with runners on base late in games, but it would be hard to imagine that a status quo which holds the closer for the 9th inning almost 90% of the time can’t be improved upon in some way. Teams have spent more on bullpens, but they haven’t figured out how to use them any more efficiently in the regular season, and the differences we’ve witnessed in the postseason show that they’re only getting it about half right, even when it matters most.

Basic Machine Learning With R (Part 3)

Previous parts in this series: Part 1 | Part 2

If you’ve read the first two parts of this series, you already know how to do some pretty cool machine-learning stuff, but there’s still a lot to learn. Today, we will be updating this nearly seven-year-old chart featured on Tom Tango’s website. We haven’t done anything with Statcast data yet, so that will be cool. More importantly, though, this will present us with a good opportunity to work with an imperfect data set. My motto is “machine learning is easy — getting the data is hard,” and this exercise will prove it. As always, the code presented here is on my GitHub.

The goal today is to take exit velocity and launch angle, and then predict the batted-ball type from those two features. Hopefully by now you can recognize that this is a classification problem. The question becomes, where do we get the data we need to solve it? Let’s head over to the invaluable Statcast search at Baseball Savant to take care of this. We want to restrict ourselves to just balls in play, and to simplify things, let’s just take 2016 data. You can download the data from Baseball Savant in CSV format, but if you ask it for too much data, it won’t let you. I recommend taking the data a month at a time, like in this example page. You’ll want to scroll down and click the little icon in the top right of the results to download your CSV.

View post on

Go ahead and do that for every month of the 2016 season and put all the resulting CSVs in the same folder (I called mine statcast_data). Once that’s done, we can begin processing it.

Let’s load the data into R using a trick I found online (Google is your friend when it comes to learning a new programming language — or even using one you’re already pretty good at!).

filenames <- list.files(path = "statcast_data", full.names=TRUE)
data_raw <-"rbind", lapply(filenames, read.csv, header = TRUE))

The columns we want here are “hit_speed”, “hit_angle”, and “events”, so let’s create a new data frame with only those columns and take a look at it.

data <- data_raw[,c("hit_speed","hit_angle","events")]


'data.frame':	127325 obs. of  3 variables:
 $ hit_speed: Factor w/ 883 levels "100.0","100.1",..: 787 11 643 ...
 $ hit_angle: Factor w/ 12868 levels "-0.01               ",..: 7766 1975 5158  ...
 $ events   : Factor w/ 25 levels "Batter Interference",..: 17 8 11 ...

Well, it had to happen eventually. See how all of these columns are listed as “Factor” even though some of them are clearly numeric? Let’s convert those columns to numeric values.

data$hit_speed <- as.numeric(as.character(data$hit_speed))
data$hit_angle <- as.numeric(as.character(data$hit_angle))

There is also some missing data in this data set. There are several ways to deal with such issues, but we’re just simply going to remove any rows with missing data.

data <- na.omit(data)

Let’s next take a look at the data in the “events” column, to see what we’re dealing with there.



 [1] Field Error         Flyout              Single             
 [4] Pop Out             Groundout           Double Play        
 [7] Lineout             Home Run            Double             
[10] Forceout            Grounded Into DP    Sac Fly            
[13] Triple              Fielders Choice Out Fielders Choice    
[16] Bunt Groundout      Sac Bunt            Sac Fly DP         
[19] Triple Play         Fan interference    Bunt Pop Out       
[22] Batter Interference
25 Levels: Batter Interference Bunt Groundout ... Sacrifice Bunt DP

The original classification from Tango’s site had only five levels — POP, GB, FLY, LD, HR — but we’ve got over 20. We’ll have to (a) restrict to columns that look like something we can classify and (b) convert them to the levels we’re after. Thanks to another tip I got from Googling, we can do it like this:

data$events <- revalue(data$events, c("Pop Out"="Pop",
      "Bunt Pop Out"="Pop","Flyout"="Fly","Sac Fly"="Fly",
      "Bunt Groundout"="GB","Groundout"="GB","Grounded Into DP"="GB",
      "Lineout"="Liner","Home Run"="HR"))
# Take another look to be sure
# The data looks good except there are too many levels.  Let's re-factor
data$events <- factor(data$events)
# Re-index to be sure
rownames(data) <- NULL
# Make 100% sure!

Oof! See how much work that was? We’re several dozen lines of code into this problem and we haven’t even started the machine learning yet! But that’s fine; the machine learning itself is the easy part. Let’s do that now.

inTrain <- createDataPartition(data$events,p=0.7,list=FALSE)
training <- data[inTrain,]
testing <- data[-inTrain,]

method <- 'rf' # sure, random forest again, why not
# train the model
ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 5)
modelFit <- train(events ~ ., method=method, data=training, trControl=ctrl)

# Run the model on the test set
predicted <- predict(modelFit,newdata=testing)
# Check out the confusion matrix
confusionMatrix(predicted, testing$events)


Prediction   GB  Pop  Fly   HR Liner
     GB    9059    5    4    1   244
     Pop      3 1156  123    0    20
     Fly      6  152 5166  367   457
     HR       0    0  360 1182    85
     Liner  230   13  449   77  2299

We did it! And the confusion matrix looks pretty good. All we need to do now is view it, and we can make a very pretty visualization of this data with the amazing Plotly package for R:

# Exit velocities from 40 to 120
x <- seq(40,120,by=1)
# Hit angles from 10 to 50
y <- seq(10,50,by=1)
# Make a data frame of the relevant x and y values
plotDF <- data.frame(expand.grid(x,y))
# Add the correct column names
colnames(plotDF) <- c('hit_speed','hit_angle')
# Add the classification
plotPredictions <- predict(modelFit,newdata=plotDF)
plotDF$pred <- plotPredictions

p <- plot_ly(data=plotDF, x=~hit_speed, y = ~hit_angle, color=~pred, type="scatter", mode="markers") %>%
    layout(title = "Exit Velocity + Launch Angle = WIN")

View post on

Awesome! It’s a *little* noisy, but overall not too bad. And it does kinda look like the original, which is reassuring.

That’s it! That’s all I have to say about machine learning. At this point, Google is your friend if you want to learn more. There are also some great classes online you can try, if you’re especially motivated. Enjoy, and I look forward to seeing what you can do with this!