How Many Elite Players Were Good Prospects?

I really enjoyed Jeff Sullivan’s piece on the prospect pedigree of good players, and it was interesting to see how many solid players never cracked the Baseball America 100 in any year. This is an extension of that article, and not a particularly original one. In fact, I think it’s about the most obvious next step: how many great players were prospects?

It was interesting to see that someone can have a decent season as a totally unheralded player, but there are a lot of players who have a 3-win season and promptly fade into ignominy. Players at that threshold in 2010 included Cliff Pennington and Dallas Braden, and in 2011, Emilio Bonifacio and Alexi Ogando. Cherry-picked names, to be sure, but it’s easy to imagine they (and players like them) are the source of that ~33% of un-ranked good players, and the real elite players are usually identified as at least good. That doesn’t mean it’s true, though, so I tested it.

I pulled the top 10 pitchers and the top 10 position players by WAR for each year from 2010 through 2014. If there was a tie for 10th, I included both players, so the sample ended up at 101 players. Then, for each player, I found their highest ranking on the BA lists. The same caveats as in Jeff’s article apply here, but again, BA is the industry standard, and their lists go back long enough to make them very useful. Following: a giant table, with every qualifying player-year, their WAR in that year (and how that ranked among all players), and their highest prospect ranking and the year of that ranking.

Name Season Team WAR WAR Rank Highest Prospect Rank Prospect Rank Year
Mike Trout 2013 Angels 10.5 1 2 2011
Mike Trout 2012 Angels 10.3 1 2 2011
Jacoby Ellsbury 2011 Red Sox 9.4 1 13 2008
Josh Hamilton 2010 Rangers 8.4 1 1 2001
Roy Halladay 2011 Phillies 8.4 1 12 1999
Mike Trout 2014 Angels 8.0 1 2 2011
Clayton Kershaw 2014 Dodgers 7.6 1 7 2008
Clayton Kershaw 2013 Dodgers 7.0 1 7 2008
Cliff Lee 2010 - – - 6.9 1 30 2003
Justin Verlander 2012 Tigers 6.7 1 8 2006
Andrew McCutchen 2013 Pirates 8.4 2 13 2007
Matt Kemp 2011 Dodgers 8.3 2 96 2006
Carl Crawford 2010 Rays 7.7 2 59 2002
Buster Posey 2012 Giants 7.7 2 7 2010
Corey Kluber 2014 Indians 7.2 2 Unranked Unranked
Clayton Kershaw 2011 Dodgers 7.1 2 7 2008
Andrew McCutchen 2014 Pirates 6.8 2 13 2007
Adam Wainwright 2013 Cardinals 6.6 2 18 2003
Roy Halladay 2010 Phillies 6.2 2 12 1999
Felix Hernandez 2012 Mariners 6.2 2 2 2005
Jose Bautista 2011 Blue Jays 8.1 3 Unranked Unranked
Robinson Cano 2012 Yankees 7.6 3 Unranked Unranked
Josh Donaldson 2013 Athletics 7.6 3 Unranked Unranked
Evan Longoria 2010 Rays 7.5 3 2 2008
Cliff Lee 2011 Phillies 6.8 3 30 2003
Alex Gordon 2014 Royals 6.6 3 2 2007
Matt Harvey 2013 Mets 6.5 3 54 2012
Justin Verlander 2010 Tigers 6.2 3 8 2006
Felix Hernandez 2014 Mariners 6.1 3 2 2005
Clayton Kershaw 2012 Dodgers 5.7 3 7 2008
Dustin Pedroia 2011 Red Sox 7.8 4 77 2006
Carlos Gomez 2013 Brewers 7.5 4 52 2008
Chase Headley 2012 Padres 7.5 4 32 2008
Joey Votto 2010 Reds 7.0 4 43 2007
Anthony Rendon 2014 Nationals 6.5 4 19 2012
CC Sabathia 2011 Yankees 6.4 4 Unranked Unranked
Jered Weaver 2010 Angels 6.1 4 57 2006
David Price 2014 - – - 6.1 4 2 2009
Max Scherzer 2013 Tigers 6.0 4 66 2008
David Price 2012 Rays 5.1 4 2 2009
David Wright 2012 Mets 7.4 5 21 2004
Miguel Cabrera 2013 Tigers 7.4 5 12 2003
Ian Kinsler 2011 Rangers 7.2 5 98 2005
Albert Pujols 2010 Cardinals 6.8 5 42 2001
Josh Donaldson 2014 Athletics 6.5 5 Unranked Unranked
Dan Haren 2011 Angels 6.4 5 Unranked Unranked
Felix Hernandez 2010 Mariners 6.0 5 2 2005
Anibal Sanchez 2013 Tigers 5.9 5 40 2006
Phil Hughes 2014 Twins 5.7 5 4 2007
Cliff Lee 2012 Phillies 5.1 5 30 2003
Ryan Braun 2012 Brewers 7.3 6 26 2007
Chris Davis 2013 Orioles 7.1 6 65 2008
Ryan Braun 2011 Brewers 7.1 6 26 2007
Ryan Zimmerman 2010 Nationals 6.6 6 15 2006
Michael Brantley 2014 Indians 6.3 6 Unranked Unranked
Justin Verlander 2011 Tigers 6.3 6 8 2006
Ubaldo Jimenez 2010 Rockies 5.9 6 82 2005
Felix Hernandez 2013 Mariners 5.7 6 2 2005
Jon Lester 2014 - – - 5.6 6 22 2006
Gio Gonzalez 2012 Nationals 5 6 26 2008
Matt Carpenter 2013 Cardinals 6.9 7 Unranked Unranked
Curtis Granderson 2011 Yankees 6.8 7 57 2005
Andrew McCutchen 2012 Pirates 6.8 7 13 2007
Jose Bautista 2010 Blue Jays 6.4 7 Unranked Unranked
Giancarlo Stanton 2014 Marlins 6.2 7 3 2010
Jered Weaver 2011 Angels 5.9 7 57 2006
Josh Johnson 2010 Marlins 5.8 7 80 2006
Cliff Lee 2013 Phillies 5.5 7 30 2003
Jordan Zimmermann 2014 Nationals 5.3 7 41 2009
Zack Greinke 2012 - – - 5.0 7 14 2004
Evan Longoria 2013 Rays 6.7 8 2 2008
Alex Gordon 2011 Royals 6.6 8 2 2007
Adrian Beltre 2012 Rangers 6.5 8 3 1998
Adrian Beltre 2010 Red Sox 6.4 8 3 1998
Jose Bautista 2014 Blue Jays 6.2 8 Unranked Unranked
Francisco Liriano 2010 Twins 5.7 8 6 2006
Doug Fister 2011 - – - 5.2 8 Unranked Unranked
Chris Sale 2014 White Sox 5.1 8 20 2011
R.A. Dickey 2012 Mets 4.9 8 Unranked Unranked
Mat Latos 2013 Reds 4.8 8 Unranked Unranked
Miguel Cabrera 2011 Tigers 6.5 9 12 2003
Jason Heyward 2012 Braves 6.5 9 1 2010
Robinson Cano 2010 Yankees 6.3 9 Unranked Unranked
Paul Goldschmidt 2013 Diamondbacks 6.3 9 Unranked Unranked
Jonathan Lucroy 2014 Brewers 6.2 9 Unranked Unranked
Adam Wainwright 2010 Cardinals 5.6 9 18 2003
Jake Arrieta 2014 Cubs 5.1 9 67 2009
Matt Cain 2011 Giants 5 9 10 2006
Justin Verlander 2013 Tigers 4.8 9 8 2006
Johnny Cueto 2012 Reds 4.7 9 34 2008
Joey Votto 2011 Reds 6.4 10 43 2007
Miguel Cabrera 2012 Tigers 6.4 10 12 2003
Andres Torres 2010 Giants 6.3 10 Unranked Unranked
Manny Machado 2013 Orioles 6.2 10 11 2012
Carlos Gomez 2014 Brewers 5.7 T-10 52 2008
Adrian Beltre 2014 Rangers 5.7 T-10 3 1998
CC Sabathia 2010 Yankees 5.1 10 Unranked Unranked
Max Scherzer 2014 Tigers 5.1 10 66 2008
Matt Garza 2011 Cubs 5.0 10 21 2007
CC Sabathia 2012 Yankees 4.7 10 Unranked Unranked
Chris Sale 2013 White Sox 4.7 10 20 2011

That is a big, ugly table, so here are some summary facts. Of this 101-player sample, 20 were never ranked by Baseball America, so indeed, top players appear to be more likely to have been a ranked prospect (80%) than good players (66%, per Jeff’s article). None of the unranked players were ever the best position player or pitcher in 2010-2014; the 1st place player with the lowest ranking was Cliff Lee, who topped out at 30th in 2003. The unranked players tended to be concentrated toward the bottom of the WAR leaderboards; 75% of the unranked players had a rank of 5th through 10th. I expected more of the people in 8th through 10th in a given season to be beneficiaries of a fluke season, but there are a lot fewer of those than I expected. The unranked players with the least impressive careers outside their top seasons are probably Andres Torres and RA Dickey, but the other unranked players are pretty uniformly great. Maybe not top-10-WAR-every-year-great, but still, great.

What about pitchers versus position players? If the top 10 by WAR of one group was more likely to include unranked players than the other, that would suggest that group was more difficult to scout and accurately predict. But while the split between pitchers and hitters among the unranked players is not totally even, 12 to 8, it’s well within what I would expect from random variation. Maybe a bigger sample could pull something meaningful out, but I’m not comfortable concluding there’s a difference based on this alone.

The following chart digs more into the individual ranks in each season. The x-axis is the WAR rank, and the bar height is the percentage of players at that point that were in the BA top 100. The line running across the chart is the average BA ranking of the players that were ranked.

chart 1

What this shows is a pretty steady decline in the percentage of players ranked in the BA Top 100 as you move down the WAR leaderboard, and a totally random average ranking of those ranked players. This fits with my perception of prospect rankings – being good enough to be ranked is pretty important, but the exact position on those rankings is not very predictive. As Jeff showed, it’s very tough to be good without being ranked, but this suggests it’s not tough for a prospect to be ranked as if he’ll be merely good, but be great some season.

What about consistent greatness? This list I created really doesn’t capture the best players of the last five years, but the best player-seasons. Can someone be really excellent over a sustained period of time if they weren’t ranked? For this, rather than looking at individual seasons, I grabbed the top 25 hitters and the top 25 pitchers by total WAR from 2010 through 2014. I thought about doing several five-year periods, but I didn’t want to double-count someone like Miguel Cabrera, who would show up for both 2010-14 and 2009-13. Below, a slightly less-giant table than the first, containing similar information: their WAR from 2010-2014, their highest BA ranking (if any), and the year that ranking came in.

Name Team WAR Highest Prospect Rank Year
Clayton Kershaw Dodgers 32.2 7 2008
Miguel Cabrera Tigers 31.4 12 2003
Andrew McCutchen Pirates 30.9 13 2007
Robinson Cano - – - 29.9 Unranked Unranked
Mike Trout Angels 29.5 2 2011
Adrian Beltre - – - 29.1 3 1998
Felix Hernandez Mariners 28.9 2 2005
Jose Bautista Blue Jays 27.8 Unranked Unranked
Justin Verlander Tigers 26.7 8 2006
Ben Zobrist Rays 26.7 Unranked Unranked
Cliff Lee - – - 26.2 30 2003
Joey Votto Reds 26.2 43 2007
Evan Longoria Rays 26.1 2 2008
Dustin Pedroia Red Sox 24.9 77 2006
David Price - – - 24.5 2 2009
Buster Posey Giants 23.8 7 2010
Matt Holliday Cardinals 22.8 Unranked Unranked
Troy Tulowitzki Rockies 22.7 15 2007
Chase Headley - – - 22 32 2008
Cole Hamels Phillies 21.9 17 2004
Alex Gordon Royals 21.7 2 2007
Jason Heyward Braves 21.7 1 2010
Ian Kinsler - – - 21.3 98 2005
Zack Greinke - – - 21.2 14 2004
Adam Wainwright Cardinals 21.2 18 2003
Max Scherzer Tigers 21.1 66 2008
Giancarlo Stanton Marlins 21 3 2010
Yadier Molina Cardinals 21 Unranked Unranked
Chase Utley Phillies 21 81 2003
Adrian Gonzalez - – - 20.6 31 2003
Ryan Braun Brewers 20.5 26 2007
David Wright Mets 20.5 21 2004
Jacoby Ellsbury - – - 20 13 2008
Josh Hamilton - – - 19.8 1 2001
Anibal Sanchez - – - 19.7 40 2006
Jered Weaver Angels 19.7 57 2006
Jon Lester - – - 19.2 22 2006
CC Sabathia Yankees 18.8 Unranked Unranked
James Shields - – - 18.3 Unranked Unranked
Hiroki Kuroda - – - 17.8 Unranked Unranked
Madison Bumgarner Giants 17.8 9 2009
Gio Gonzalez - – - 17.7 26 2008
Mat Latos - – - 17.6 Unranked Unranked
Doug Fister - – - 16.9 Unranked Unranked
Roy Halladay Phillies 16.5 12 1999
Chris Sale White Sox 16.1 20 2011
C.J. Wilson - – - 15.6 Unranked Unranked
Dan Haren - – - 15.5 Unranked Unranked
Jordan Zimmermann Nationals 15.5 41 2009
Johnny Cueto Reds 15.5 34 2008

Of these 50 players, 12 were unranked, or almost the exact same percentage as the single-season leaders (24% for the five-year vs. 20% for the single-season). Of the 12 unranked players, 7 came between 38th and 50th on the leaderboard, but 3 came in the top 10 (Robinson Cano, Jose Bautista, and Ben Zobrist). At first glance, there was no meaningful split in the unranked players between pitchers and hitters (7 vs. 5), but interestingly, all 7 of the unranked pitchers were in the bottom half of the pitcher leaderboard. All of the top 12 pitchers in the last five years were ranked, with Max Scherzer (#66 on BA’s 2008 list) the lowest, so perhaps it’s less likely a pitcher will be truly elite out of nowhere than a hitter. Again, with this small a sample, I’m not comfortable concluding anything, but it’s certainly interesting.

This is kind of an anticlimactic article, because none of my expectations were turned upside down. A great player was likely to have been ranked at some point, more likely than a merely good player, but there are still some who come out of nowhere. Of those ranked, the actual rank seems to matter less than the fact that they cracked the top 100. None of that is very surprising, but hopefully it’s still interesting to see it all laid out.


Six Feet Under: Evaluating Short Pitchers

It’s September 10th, 1999, and the small flame-throwing right-hander from the Dominican Republic just struck out Scott Brosius and Darryl Strawberry. He’s about to get Chuck Knoblauch swinging (and missing) on 1-2 count for his 17th strikeout of the night to finish the game. He does, and the fans at the old Yankee Stadium go nuts, for they’ve just seen Pedro Martinez’ finest start in the greatest pitching season of all time. The final score is 3-1, with the only Yankee run, and hit, coming off a Chili Davis home run. Pedro is 5’11’’ and 170 lb, one of the smallest pitchers in baseball. While most players tower over him off the mound, Pedro writes a different story when he’s pitching. The Yankee hitters fail to notice his height when he kicks his leg up, down, and serves a 95-mph fastball from a three-quarters delivery at their eyes.

The average male height in the U.S. is 5’10’’. You’d never know this from watching a baseball game, where the average height is about 6’2’’, with pitchers just a little taller at about 6’3’’. We all remember the success Randy Johnson had at 6’10’’, and his height was always considered an advantage. When we watched Pedro Martinez, however, commentators and baseball men viewed him as an exception to some obscure and unwritten rule: that shorter athletes can’t become successful pitchers.

Six feet, like 30 home runs or a .300 batting average, has become a number associated with a distinct meaning. If you hit 30 home runs, you’re a power hitter. Hit 29 homers, and you have some pop. If you hit .300, you’re a great hitter. Hit .299, and you just missed hitting .300. Similarly, if you’re six feet, you can pitch. If not, you’re short, but at least you might get an interesting nickname like Tim Lincecum’s (5’11”) “The Freak.”

Most Major League pitchers fall between 6’1’’ and 6’4’’. We can look at the height distribution for pitching seasons of the last 5 years and see that it’s approximately normal:

By this approximation, the chance of randomly selecting a pitcher of the last 5 years who is shorter than 5’11’’ is about 5%.

Are short pitchers really destined to fail? We’ve all been told that it’s better to be taller if you pitch. But is this true? Let’s consider short pitchers to be 5’11’’ or under and examine their effectiveness and distribution in comparison to taller pitchers, who we’ll consider to be 6 feet or taller.

The top ten best pitching seasons for shorter pitchers of the last 5 years are:

We notice that Tim Lincecum appears on this list twice and Johnny Cueto appears on it three times. All of these pitchers are 5’11’’ with the exception of Kris Medlen, who is 5’10’’. So, we see that successful pitching seasons by short pitchers don’t come completely out of the blue. Short pitchers can be successful and can dominate batters, most of whom are much taller, as Cueto did last year and in 2012.

In fact, short pitchers aren’t all that rare to come by, although they’re considerably rarer than taller pitchers. In the last 5 years, there have been 23 instances of short starting pitchers throwing at least 150 innings. In comparison, there have been 402 instances of this type for taller starting pitchers.

Shorter pitchers are generally relegated to the bullpen; there have been 95 instances in the last 5 years of full-time short relief pitchers and 968 instances of full-time taller relief pitchers.

We can see the average WAR breakdowns for all of these pools of players in the following table, along with P-Values for a two-sided t-test comparing the short relievers against the tall relievers and the short starters against the tall starters:

What the 0.0005 is telling us, here, is that we would observe these results by chance alone with probability 0.0005. Thus, there is actually a significant difference in the mean WAR for short relievers and the mean WAR for tall relievers (obviously favoring short relievers). On the other hand, the difference between the starters is not significant. Either way, we have no evidence to suggest that shorter pitchers are any less effective than taller pitchers.

Are shorter pitchers undervalued in the baseball market? If so, to what extent? We can approach this by examining the WAR value of a pitcher relative to his salary in free agency. We can do this by comparing the height groups within relievers and starters (since relievers are generally valued differently than starters).

However, we find that in the last five years, there are only 4 instances of a starter 5’11’’ or shorter pitching for a team that acquired him via free agency; and all of them are Bartolo Colon seasons from 2011-2014.

Fortunately, there are more instances of this in relievers, which is what we’ll examine. We notice the distribution of WAR and relievers’ salaries in free agency:

We see that short and tall relievers are clustered between -1 and 1 WAR and $1 million and $5 million dollars. However, we see several taller relievers past the $7.5 million mark with unremarkable WARs, which we don’t see for shorter relievers. From this, we would suspect that taller relievers are being overvalued while shorter relievers are being undervalued.

This is, in fact, the case: short relief pitchers are producing 2.33 WAR for every $10 million they earn in free agency while taller relievers are producing 1.36 WAR for every $10 million they earn. In comparing these values with a one-sided t-test, we acquire a P-Value of 0.0018, meaning these are results we would acquire by chance only .18% (a significant value) of the time. And so it goes, relievers under 6 feet are actually about 1.7 times as valuable as their taller counterparts.

Is there something inherently different about shorter pitchers that makes them less capable of pitching successfully in the big leagues? The evidence says no. In fact, it might be more worthwhile for General Managers to draft pitchers under 6 feet tall and reap the rewards.

Just because an athlete doesn’t tower over his opponents off the mound, doesn’t mean he can’t bring 55,000 dumbfounded Yankee fans to their feet on an unassuming September evening.


Do Pitchers Keep Defenses on Their Toes?

As a Blue Jays Fan, I’ve enjoyed the opportunity to watch Mark Buehrle pitch the last two years. Getting to see a player with below-average stuff (and that’s probably generous) retire major-league batters regularly is a real treat. On top of that, Mark Buehrle is one of the fastest-paced pitchers in all of baseball. He led all of baseball in 2014 in time between pitches, or pace. He was second in 2013 to teammate R.A. Dickey. He was first again in 2012. Games with Mark Buehrle on the mound move quickly. Often you will hear comments that this has the effect of keeping fielders “on their toes”.

Here’s his manager John Gibbons after a start last September – “He’s a teammate’s dream because he keeps his defence on their toes by working fast.” And here is a quote from Jose Bautista after a start last June – “He’s pitching great, throwing strikes, keeping people off balance and allowing us a chance to play defence behind him. It’s no surprise that every time he pitches there are plenty of good defensive plays made. He keeps everybody engaged in the game because he works quick.”

What Gibbons, Joey Bats, and many others, are saying is that, due to the quicker pace of play, fielders are more ready to react to balls towards them. The implication of this statement is that Mark Buehrle, and other similarly fast paced pitchers, receive better than expected defense, especially on the infield. I’ve often wondered if this belief had any merit so I decided to look into it myself.

I took a look at the rate at which groundballs hit off of Buehrle have been turned into outs throughout his career and compared his numbers to those of his teammates (Note that I would have liked to include only other starting pitchers from among Buehrle’s teammates but was unable to do so. I wouldn’t expect it to make much of a difference though). These numbers come from baseball-reference.com.

Here is what that data looks like:

Mark Buehrle's Groundout Rates

We can see that Buehrle’s ability to “keep infielders on their toes” does not translate to more outs on groundballs relative to his teammates in every year. In fact, in only seven of his 14 full seasons has Mark Buehrle’s rate of groundballs converted into outs exceeded those of his teammates. If being a fast-paced pitcher improved the defense behind you then we’d expect to see Mark Buehrle consistently outperform his teammates. Only once in the past five years has this been the case.

That one time in the past five years though, 2012, is interesting. In 2012, Buehrle’s one year with the Miami Marlins, only 41 of the 259 groundballs hit off of Buehrle went for hits. This 82% out rate was well above that of the rest of the team, which stood at 72%. Perhaps we could conclude that the Miami infielders were particularly impacted by Buehrle’s fast pace. This is likely not the case though as the primary shortstop of that team, Jose Reyes, was also the primary shortstop behind Buehrle in 2013 & 2014 with the Blue Jays. In those two years, Buehrle actually had worse infield defense behind him than his teammates (and, sadly, the two worst rates of groundball to out conversion in his career), so it’s likely that Buehrle’s success in 2012 was more due to luck.

This analysis doesn’t consider the average velocity of groundballs hit off of Buehrle compared to his fellow pitchers or anything to do with groundball trajectories, but it seems clear that any defensive advantage Buehrle gains from pitching quickly is minimal at best. Over the course of Buehrle’s 14 complete seasons, groundballs have been converted into outs 75.3% of the time, while the groundballs hit off of his teammates have turned into outs 74.0% of the time. This difference equates to between 5 and 6 extra outs a year. This isn’t a huge advantage, but 5 or 6 extra outs a season and regular two and a half hour games is better than a kick in the teeth.

Next, I wanted to see if the ability to “keep fielders on their toes” was seen in pitchers other than Mark Buehrle. I looked at the ten fastest-paced starting pitchers from 2014 (min 100 IP) to see if there was a noticeable increase in groundballs converted into outs when compared to their teammates. I also did the same for the ten slowest-paced starting pitchers. In the fast-paced group are Buehrle, Dickey, Doug Fister, Wade Miley, Jon Niese, Andrew Cashner, Michael Wacha, TJ House, Dan Haren, & Chris Young. In the slow-paced group are plodders Jorge de la Rosa, Yusmeiro Petit, Clay Buchholz, Tyler Skaggs, Edinson Volquez, Chris Archer, Hiroki Kuroda, Masahiro Tanaka, Yu Darvish, & Edwin Jackson (One pitcher had to be excluded from each group as they were traded midseason and therefore exact split data for their teammates was unavailable. These pitchers were David Price from the slow-paced group and Vidal Nuno from the fast-paced group).

The results are below:

Pitcher Groundout rates

Rather than seeing the fast-paced pitchers receiving better groundball defense than their slow-paced peers, we actually see the reverse. Groundballs off the bats of slow-paced pitchers were converted to outs more often than those off of fast-paced pitchers. Once again, this analysis doesn’t consider batted-ball velocity or trajectory, but it seems clear that the supposed benefit of a faster pace doesn’t show up in infield defense. And although the data table above showed that slow-paced pitchers benefited from stronger infield defense, it seems unlikely that this is caused by the slow pace of the pitchers. Rather this is almost certainly statistical noise.

With pace of play concerns becoming more prevalent in baseball these days, there may be some pressure on pitchers to take less time between pitches. If pitchers do make such changes, they shouldn’t expect to receive any stronger defense behind them, even if some may suggest as much. So the next time a broadcaster or anyone applauds a guy for “keeping the defense on its toes” with his fast pace, you can remain skeptical that such a benefit exists. After all, these are major-league ballplayers, many of whom are being paid millions of dollars. I’m sure they can pay attention for an extra ten seconds.


Should Players Try to Bunt for a Hit More?

This post will look at bunting for a hit and try to identify if it is a skill that can efficiently and effectively increase offensive production, and answer the general question of, should players bunt more?

Is Bunting for a Hit a Skill?

Before we answer the ultimate question of whether or not players should bunt more, we need to first identify whether or not bunting for a hit is a skill to begin with.

This is where data becomes an issue, but we should be able to make do.

Before 2002 there are no records on FanGraphs of bunt hits, so I looked at all qualified hitter seasons from 2002 to 2014 in which a player bunted three or more times in a season—since most players go a whole season without a bunt, three bunts or more in a season puts a player in the top fifty percentile for bunt attempts in a season.

From there I looked at the year-to-year correlation of a player’s bunt hit percentage—bunt hits divided by bunts (i.e. a player’s batting average on bunts)—for the entire population. Mind you, we only have record of the amount of times a player bunts, not the amount of times a player attempted to bunt for a hit. So in all reality, a player’s bunt hit percentage would be higher if we were able to tease out the amount of times that they laid down a sacrifice bunt from their total bunts. However, from the data we are still able to find a .33 year-to-year correlation on bunt hit percentage for our population of hitters.

Takeaway: bunting for a hit is a skill.

Should Players Bunt More?

Now that we’ve answered the question of whether or not bunting for a hit is skill, we can circle back to our original question of whether or not players should bunt more.

Because we want to have a large enough sample of attempted bunts for bunt hit percentage (BH%) to stabilize, we will look at all qualified hitter totals (i.e. multiple season totals), not individual seasons, from 2002 to 2012.

To answer our question we need to look at the expected value gained for a player when they have an at bat where they don’t attempt to bunt—a regular at bat—and subtract it from the expected value gained in at bats where they attempt to bunt for a hit—a bunt hit attempt.

To come up with the expected value of a regular at bat we have to look at the linear weight value added per plate appearance of a player’s at bats from 2002 to 2012, or their entire career value if their whole career falls within that period. We then multiply that linear weight value per plate appearance by probability that they achieve one of those outcomes.

Here’s the formula for Expected Value of a regular at bat (xRA):

  • =((((1B-Bunt Hits)*0.474)+(2B*0.764)+(3B*1.063)+(HR*1.409)+(HBP*0.385)+(IBB*0.102)+(UBB*0.33))/(PA-Bunt Attempts))*((1B-Bunt Hits)+2B+3B+HR+BB+HBP)/(PA-Bunt Attempts)

This formula looks much more complicated than it actually is, but you’ll be able to click into the cells in the live excel document below and visually see how the values are computed. All of the decimals that are part of the formula are linear weight values, which you can find here.

We need to go through the same process to figure out what the expected value added is for a player on a bunt hit attempt—the average value added with a bunt times the probability of a successful bunt hit (BH%).

I was unable to find the linear weight value of a bunt hit, but we do have a sufficient substitute. A bunt hit essentially adds the same value as a base hit with no runners on base—.266 runs per inning. A single with no runners on base is a good proxy for the happening of a bunt hit. Like a base hit with no runners on base, a bunt hit offers no opportunity for a runner on base to score or advance past the next base in front of them. Short of looking at box score data to find the average amount of runners that scored per inning after a successful bunt hit, which will need to be done for a more conclusive answer to our question, we will use the average linear weight value of a single with no runners on for each of the out states as our linear weight value (i.e. I averaged the linear weight value of a single with no runners on base and no outs, a single with no runners on base and one out, and a single with no runners on base and two outs to get the average linear weight value; this is not the exact way to get the linear weights value of a single with no runners on base, because there are undoubtedly a different amount of singles with no runners on base that occurred for each out state, but this should be close).

This is the formula for expected value gained on a bunt attempt (xBA):

  • Bunt Hit Average (bunt hits/total bunts)*.266 (our estimated linear weight value for a bunt hit)

Now that we’re able to come up with the expected value added for a player in a regular at bat (xRA) and the expected value added for a player on a bunt hit attempt (xBA), we can subtract the two values from each other—xRA minus xBA—to see which players have lost the most value per plate appearance by not bunting.

This chart shows the players with a minimum of ten bunt attempts that have lost the most value by not bunting (i.e. which players have the biggest difference between their expected value gained from a regular at bat and a hit attempt):

Click Here to See Chart with Results

Bunts: Bunt attempts

Bunt Hits: Hits on bunts

RA%: Chance that a positive offensive event occurs, outside of bunt hit

BA%: Chance that a player gets a hit on a bunt

xRA: Expected value added from a regular at bat

xBA: Expected value added from a bunt attempt

Net Value: xRA minus xBA

Implications

This research doesn’t mean to suggest that all players who have a higher expected value added on a bunt attempt than they do in a regular at bat should bunt every time. Carlos Santana gets a hit in 78% of the at bats where he bunts, but he has only attempted 14 bunts in his career, so we don’t have a large enough sample of bunt attempts to know what his actual average on bunt attempts would be; this goes for most if not all of the players on this list. There is most likely an inverse correlation between BA% and bunt attempts (i.e. the more you try and bunt for a hit, the less likely you will get a hit as the infield plays further up on the grass).

This research means to suggest that players have not reached the equilibrium for bunt attempts (i.e. they haven’t maximized their value). Players should increase the percentage of the time they bunt until their xRA and xBA are the same; at this point their value will be maximized. The more a player with a negative net value tries to bunt for a hit, the more expected value he will add. This will happen until his expected value added from a bunt falls beneath what he is able to achieve through a regular at bat; this happens when the defense starts to defend him more optimally, they align for the bunt hit, and his BH% falls. Once this occurs he will force the defense to play more honestly—the infielders will have to play farther in on the grass—and increase his expected value added in a regular at bat as more balls get past the infield from shallow play.

What’s interesting is that there are two different types of players on this list. The first type of player is the type that you would traditionally think of as player who would try and bunt for a hit: the speedster with very little power. The second type of player is the player who, as a result of the recent, extensive use of defensive shifts, has a high BA%—batting average on bunt hits—because the defense is not in a position to cover a bunt efficiently: Carlos Santana, Carlos Pena, Colby Rasmus, etc.

The voice for the question about why players don’t try to beat the shift with bunts down the third base line has grown louder, but there still hasn’t been a good answer as to why it hasn’t been done more; the evidence seems to suggest that it is valuable and should be done more. I’m not able to confirm that the 11 hits that Carlos Santana had on bunt hits came when the defense was in a shift, but I think it would be somewhat unreasonable to believe that he was able to beat out a throw to first on a bunt hit attempt when the defense was in a traditional alignment more than a few times.

Carlos Santana Spray Chart

Carlos Santana’s spray chart take from Bill Petti’s Spray Chart Tool

The image above is a spray chart of Carlos Santana’s ground ball distribution as a left-handed hitter; the white dots are hits and the red dots are outs. This chart suggests that it would be advantageous for teams to shift against Santana when he bats left-handed. I would argue that because of Santana’s success—his high BH%—at bunting for a hit, he should do this more, which will generate more value by itself, and increase the value generated in regular at bats as he forces the defense to change their defensive shift against him from the increase in bunt attempts. However, once he reaches the equilibrium, any further changes may ultimately be a zero sum game.

There are no silver bullets to get more runners on base, but there will always be more efficient, undervalued ways to achieve that goal. This research has proven that bunting for a hit is underutilized, and once more work is done to tease out sac bunts from a player’s bunt hit attempts and calculate an accurate BH%, along with the generation of linear weight values for a bunt hit, we will have a more definitive answer for what a bunt hit is worth.

Devon Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading about pitcher value in Fantasy Baseball, follow him on Twitter @devonjjordan.


Testing the Eye Test: Part 1

As long as I can remember, I’ve been a fan of good defense. Growing up my favorite player was Andy Van Slyke, and as a Braves fan I’ve had the privilege of rooting for defensive wizards such as Greg Maddux, Andruw Jones, and now Andrelton Simmons. Advanced defensive statistics are one of the things that drew me into sabermetrics and I spend entirely too much time obsessing over pitch framing.

Foremost among the new wave of statistics is UZR, Ultimate Zone Rating, which is the metric that is used to calculate the defensive portion of fWAR. In addition, Fangraphs also carries DRS and FSR, or Fans Scouting Report. While UZR is my preferred metric, I’ve always been intrigued by FSR. After all, I pride myself on my knowledge of the defensive ability of players on my favorite team and it makes sense to me that there is a wide population that has a pretty good idea of the quality of Chirs Johnson’s defense (namely, that it sucks but improved a lot in 2014).

I decided to take a look at the correlation between a player’s FSR and the components of his UZR (ARM, DPR, RngR, and ErrR, as well as total UZR). For this exercise, I pulled the defensive stats of every player who qualified (minimum of 900 innings) at a position from 2009-2014 (FSR data is only available for those 6 seasons on Fangraphs). I then disregarded catchers, as UZR does not cover the position. Likewise, pitchers are left out because they are not covered by UZR or FSR. That left me with 761 player seasons across the other seven positions. Here’s the correlations between FSR and UZR and its components for those seven positions:

Position |#    |ARM |DPR  |RngR  |ErrR   |UZR
1B           |118 |N/A   |0.213 |0.285 |0.320 |0.396
2B          |117  |N/A   |0.159 |0.470 |0.547  |0.637
3B          |107 |N/A   |0.154 |0.632 |0.261  |0.673
SS           |130|N/A   |0.363 |0.428 |0.344 |0.592
LF           | 71  |0.510 |N/A  |0.526  |0.186  |0.664
CF           |115 |0.237 |N/A  |0.493 |0.071  |0.548
RF           |103|0.214 |N/A  |0.541  |0.067  |0.613

There’s a lot to look at there, but first let me draw your attention to one fact: UZR has a higher correlation for every position than any one of its components at the same position. That’s a big plus for FSR, as it shows the fans don’t get so caught up in one area of a position to ignore how it fits into the whole. It also runs counter to my expectations, as I expected the fans to strongly favor players who avoided making errors (as it seems the voters of the Gold Gloves do). Instead, the component that averages the strongest correlation is range, with ARM (which is only calculated for outfielders) a distant second. Errors only beat out double play runs, which is an indication of how informed fans have moved from using errors as the primary way to evaluate defense. Indeed, errors had a strongest correlation of any component at only two positions: 1B and 2B. Further, errors had an extremely weak correlation with FSR in the outfield, with CF and RF featuring almost no relationship at all.

I was also struck by how strong the correlation between FSR and UZR was at every position. With the exception of 1B, every position’s correlation between the two metrics was above .5, with four of the seven positions above .6. The correlation between FSR and UZR was strongest at 3B, with LF a close runner up. 3B also features the strongest correlation between FSR and a component of UZR – in this case, RngR – and the smallest gap between UZR and one of its components. This finding surprised me, as I typically picture range as a CF tracking down a fly ball hit far over his head. Indeed, the average correlation between RngR and FSR is higher in the OF (0.520) than in the IF (0.454) despite the strength of the correlation at 3B.

I was also surprised to see the strongest correlation between ARM and FSR in LF, not RF which is typically known as the haven for strong arms. I have two theories to explain this incongruity: the first is that this simply is a small sample quirk. The other is that the selection bias for RF creates a situation where the distribution between the strongest and weakest arms is simply too small to make a significant difference in the data. Indeed, the range between the highest ARM in RF (Jeff Francoeur’s 9.7 in 2010) and lowest (Curtis Granderson’s -7.4 in 2014) was approximately 3 runs smaller than the difference in LF between Yoenis Cespedes’ 2014 (12.4) and Ryan Braun’s 2010 (-7.9).

Overall, this shows the strength of FSR. While its certainly not the same as UZR, the correlations are strongest between total UZR and FSR, and the components with the strongest correlations appear to generally be appropriate for the position. In Part 2, I will examine which components are over or under-emphasized by FSR.


Hardball Retrospective – The “Original” 1992 Milwaukee Brewers

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Accordingly, Ken Griffey, Jr. is listed on the Mariners roster for the duration of his career while the Marlins claim Miguel Cabrera and the Nationals declare Vladimir Guerrero. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. Additional information and a discussion forum are available at TuataraSoftware.com.

Terminology 

OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

Assessment

The 1992 Milwaukee Brewers    OWAR: 48.2     OWS: 290     OPW%: .587

GM Harry Dalton acquired 85% (29 of 34) of the ballplayers on the 1992 Brewers roster. All of the team members were selected during the Amateur Draft with the exception of Frank DiPino and Dave Nilsson (signed as amateur free agents). Based on the revised standings the “Original” 1992 Brewers finished eight games ahead of the Yankees and secured the American League pennant.

Gary Sheffield (.330/33/100) paced the Brew Crew with 32 Win Shares, collected the batting crown and placed third in the MVP race. Paul “The Ignitor” Molitor nabbed 31 bags, drilled 36 doubles and delivered a .320 BA. Fleet-footed shortstop Pat Listach earned Rookie of the Year honors, swiping 54 bases and scoring 93 runs while batting .290 from the leadoff spot. Center fielder Robin Yount slashed 40 two-base hits in his penultimate campaign. Darryl Hamilton contributed a personal-best 41 stolen bases and posted a .298 BA.

Yount placed fourth behind Honus Wagner, Arky Vaughan and Cal Ripken Jr. in “The New Bill James Historical Baseball Abstract” for the best shortstop of All-Time. Molitor (3B – 8th), Greg Vaughn (LF – 68th) and B.J. Surhoff (LF – 97th) finished in the top 100 at their respective positions. 

LINEUP POS WAR WS
Pat Listach SS 4.67 22.88
Darryl Hamilton RF 3.55 18.71
Paul Molitor DH 4.87 28.44
Gary Sheffield 3B 5.92 32.28
Robin Yount CF 2.29 19.45
Greg Vaughn LF 1.7 14.43
B. J. Surhoff C 1.58 13.54
John Jaha 1B 0.31 2.63
Jim Gantner 2B -0.24 4.96
BENCH POS WAR WS
Mike Felder CF 0.93 10.14
Dion James RF 0.45 4.29
Glenn Braggs LF 0.31 6.9
Dave Nilsson C 0.27 5.19
Kevin Bass LF 0.26 10.84
Dale Sveum SS 0.08 3.61
Bill Spiers SS 0.05 0.58
Ernie Riles SS 0.03 1.34
Russ McGinnis C -0.11 0.71
Bert Heffernan C -0.15 0.06
Randy Ready DH -0.21 2.92
Tim McIntosh C -0.66 0.62

Bill Wegman compiled a 1.169 WHIP while supporting a workload of 261.2 innings. Jaime Navarro topped the pitching staff with 17 victories and an ERA of 3.33. Chris Bosio (16-6, 3.62) fashioned a 1.154 WHIP. Rookie right-hander Cal Eldred notched an 11-2 record with a 1.79 ERA and a 0.987 WHIP subsequent to a promotion from the Minor Leagues in mid-July.

Doug Jones (11-8, 1.85) rebounded from an off-year in ’91, posting 36 saves and leading the AL with 70 games finished in 80 relief appearances. Jeff Parrett (9-1, 3.02) and Dan Plesac (5-4, 3.68) held opponents at bay.

ROTATION POS WAR WS
Bill Wegman SP 3.73 15.72
Jaime Navarro SP 3.47 15.6
Chris Bosio SP 2.41 13.26
Cal Eldred SP 3.76 11.58
Mike Birkbeck SP -0.3 0
BULLPEN POS WAR WS
Doug Jones RP 2.6 17.59
Dan Plesac RP 0.92 5.9
Jeff Parrett RP 0.91 8.43
Frank DiPino RP 0.31 1.29
Brian Drahman RP 0 0.58
Doug Henry RP -0.71 5.72
Tim Crews RP -1.11 0.05
Chuck Crim RP -1.52 2.41

 The “Original” 1992 Milwaukee Brewers roster

NAME POS WAR WS General Manager Scouting Director
Gary Sheffield 3B 5.92 32.28 Harry Dalton Dan Duquette
Paul Molitor DH 4.87 28.44 Jim Baumer Dee Fondy / Al Widmar
Pat Listach SS 4.67 22.88 Harry Dalton Dick Foster
Cal Eldred SP 3.76 11.58 Harry Dalton Dick Foster
Bill Wegman SP 3.73 15.72 Harry Dalton Ray Poitevint
Darryl Hamilton RF 3.55 18.71 Harry Dalton Dan Duquette
Jaime Navarro SP 3.47 15.6 Harry Dalton Dan Duquette
Doug Jones RP 2.6 17.59 Harry Dalton Ray Poitevint
Chris Bosio SP 2.41 13.26 Harry Dalton Ray Poitevint
Robin Yount CF 2.29 19.45 Jim Wilson Jim Baumer
Greg Vaughn LF 1.7 14.43 Harry Dalton Dan Duquette
B. J. Surhoff C 1.58 13.54 Harry Dalton Ray Poitevint
Mike Felder CF 0.93 10.14 Harry Dalton Ray Poitevint
Dan Plesac RP 0.92 5.9 Harry Dalton Ray Poitevint
Jeff Parrett RP 0.91 8.43 Harry Dalton Ray Poitevint
Dion James RF 0.45 4.29 Harry Dalton Ray Poitevint
Frank DiPino RP 0.31 1.29 Jim Baumer Dee Fondy / Al Widmar
Glenn Braggs LF 0.31 6.9 Harry Dalton Ray Poitevint
John Jaha 1B 0.31 2.63 Harry Dalton Ray Poitevint
Dave Nilsson C 0.27 5.19 Harry Dalton Dan Duquette
Kevin Bass LF 0.26 10.84 Jim Baumer Dee Fondy / Al Widmar
Dale Sveum SS 0.08 3.61 Harry Dalton Ray Poitevint
Bill Spiers SS 0.05 0.58 Harry Dalton Dan Duquette
Ernie Riles SS 0.03 1.34 Harry Dalton Ray Poitevint
Brian Drahman RP 0 0.58 Harry Dalton Dan Duquette
Russ McGinnis C -0.11 0.71 Harry Dalton Ray Poitevint
Bert Heffernan C -0.15 0.06 Harry Dalton Dick Foster
Randy Ready DH -0.21 2.92 Harry Dalton Ray Poitevint
Jim Gantner 2B -0.24 4.96 Jim Wilson Jim Baumer
Mike Birkbeck SP -0.3 0 Harry Dalton Ray Poitevint
Tim McIntosh C -0.66 0.62 Harry Dalton Dan Duquette
Doug Henry RP -0.71 5.72 Harry Dalton Ray Poitevint
Tim Crews RP -1.11 0.05 Harry Dalton Ray Poitevint
Chuck Crim RP -1.52 2.41 Harry Dalton Ray Poitevint

Honorable Mention

The “Original” 1987 Brewers        OWAR: 46.1     OWS: 258     OPW%: .555

Milwaukee rallied to a 90-72 record and finished seven games ahead of Detroit to achieve its first pennant. Paul Molitor (.353/16/75) sparked the Brewers’ offense with a League-leading 41 doubles and 114 runs scored. He pilfered 45 stolen bases and placed fifth in the A.L. MVP balloting. Teddy Higuera whiffed 240 batsmen and registered an 18-10 mark in the course of a four-year run in which he averaged 17 wins, a 3.25 ERA and 192 strikeouts per season. Robin Yount (.312/21/103) tallied 99 runs and 198 base knocks.

On Deck

The “Original” 1999 Rangers

References and Resources 

Baseball America – Executive Database

Baseball-Reference

James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database – Transaction a – Executive 

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive


Can Past Calendar Year Stats be Trusted?

To me, the first few weeks of baseball each year are small sample size season. It seems that every article is either a) drawing wildly irresponsible conclusions based on a few dozen plate appearances or innings (either with or without the routine “This is a small sample, but…” disclaimer), or b) showing why those claims are wildly irresponsible and not very useful. This is how we get articles comparing Charlie Blackmon and Mike Trout. It gets a little repetitive, but writing this in March, when the closest thing to real baseball I can experience is play-by-play tweeting of a spring training game, it honestly sounds lovely.

Fairly often in those early articles, I see analyses that use past calendar year stats, that incorporate the first x games of the current season and the last 162-x games of the previous season. The idea is to rely on more than a few games of evidence, but still incorporate hot first months in some way. I’m always conflicted about how much trust to put in those stats and the resulting conclusions.

On the one hand, they have a reasonable sample size, and aren’t drawing any crazy conclusions off a few good games. Including a large portion of the prior season limits the effect a first month can have on the results, which is probably a good thing. On the other hand, it seems like a lot of changes could be made in the offseason, and those changes could have major effects on a player’s performance basically immediately. If that were the case, stat lines that treated game 1 of 2014 as following game 162 of year 2013 in the same way game 162 of 2013 followed game 161 of 2013 would not be presenting an accurate picture of skill.

Consider the case of Brandon McCarthy, who made a lot of changes to his offseason training regimen between the 2013 and 2014 seasons (detailed in this Eno Sarris article). He went on to record his healthiest season to date in 2014, hitting 200 innings exactly with the second-best WAR (3.0) and best xFIP (2.87) of his career. Combining his results from September/October 2013 (42.0 IP, 7.6% K-BB%, 3.74 xFIP) and March/April 2014 (37.1 IP, 15.2% K-BB%, 2.89 xFIP) would not give an accurate sense of McCarthy going into 2014. But is he the exception, or the rule?

To test this, I looked at the correlations between players’ stats in the first and second halves of 2014, and compared that to the correlation between their stats in the second half of 2013 and the first half of 2014. I expect the six-month discontinuity in the second case to make the correlations weaker, but by how much? If it’s a lot, that’s a sign that analysis relying on stats from the last calendar year probably shouldn’t be trusted; if it’s not, then incorporating the last few months of the previous season to boost sample size is more likely to be a good idea. I also looked at the correlations between stats in 2013 and 2014, to provide a sort of baseline for how predictable each statistic is from season-to-season.

I tried to choose stats that reflect primarily the skill of each player, but that they can control to some extent. Hopefully these are stats that won’t change due to a player switching teams, but might if he changes his approach. I settled on BB%, K%, ISO, and BsR for batters, and BB%, K%, GB%, and HR% for pitchers. Those look reasonable to me, but I’d welcome any suggestions.

I set a minimum of 400 PAs or 160 IP for the full-year samples, and 200 PAs or 80 IP for the half-year samples, and looked at all the players that showed up in both of the time frames being compared. I’m going to look at position players first, then starters. In the following table, the value in each cell is the linear R2 of the stats in the two time periods, except in the last row, which shows the number of players in the sample. I bolded the stronger of the two half vs. half correlations.

2nd Half ’13 v. 1st Half ’14 1st Half ’14 v. 1st Half ’14 Full 13 v. Full 14
BB% .552 .481 .608
K% .672 .661 .771
ISO .572 .519 .654
BsR .565 .849 .605
n 140 138 142

So these are some seriously unintuitive results, to the point that I went back and triple-checked the data, but it’s accurate. BB%, K%, and ISO all tracked better from player to player from the second half of 2013 to the first half of 2014 than they did from the first half of 2014 to the second half of 2014. Of the four selected stats, only BsR had a stronger correlation inside 2014, but it was odd in its own way, as it was also the only stat for which the full year correlation wasn’t the strongest.

What could explain this? First, it’s possible that this is just randomness, and if we looked at this over a larger sample, the in-year correlations would tend to be stronger. But even if that’s the case, the fact that randomness can make the cross-year correlations stronger (as opposed to just making the lead of the in-year correlations larger) suggests that the difference between the two is relatively small. One possible explanation is survivor bias – perhaps players that get a lot worse between the first and second halves are still likely to see playing time until the end of the season, while players who get substantially worse between seasons might be benched in the first month or two and not get to the 200 PA/80 IP minimum. There’s no doubt that there is survivor bias in this sample, but I’m not convinced by that explanation. Settling on randomness always feels half-hearted, but I really have no idea what else it could be. If anyone has any thoughts, post them in the comments!

The table for the pitchers is set up in the same way.

2nd Half ’13 v. 1st Half ’14 1st Half ’14 v. 1st Half ’14 Full 13 v. Full 14
BB% .533 .663 .738
K% .489 .844 .723
GB% .742 .799 .779
HR% .243 .213 .357
n 38 45 47

This looks a lot more like I expected. Three of the four stats are more strongly correlated in season than between seasons, and the exception (HR%) also has the smallest gap between the two correlations, making me inclined to chalk that up to random variation. Interestingly, the gap between the season-to-season correlations and the half-to-half correlations is relatively small (again with the exception of HR%), which fits with my perception of BB%, K%, and GB% as stats that stabilize relatively quickly.

It also doesn’t surprise me that pitchers are less predictable than hitters from the second half of one season to the first half of the other, relative to their in-season predictability. Intuitively, pitchers seem to have a lot more control over their approach, and a much greater ability to shift significantly in the offseason by adding a new pitch, changing a grip, or just getting healthy for the first time in a while. Hitters, on the other hand, seem like they have less ability to change their approach drastically. Even when they can make a change, it’s not necessarily the sort of thing that has to happen in the offseason; if a hitter wants to be more aggressive, he can just decide to be more aggressive, whereas a pitcher looking to throw more strikes is probably going to have to work at that. If true, hitter changes would happen throughout the season and offseason, while pitcher changes would be clustered in the offseason. These correlations don’t provide nearly enough evidence to conclude that’s true, but they do fit these perceptions, which is encouraging.

Overall, this suggests that while going back to last season to get a year’s worth of PAs for a hitter might be a good way to beef up your sample size, it’s probably not as good idea for a pitcher, and also less necessary. After the first few starts, most starters have thrown enough innings that the interesting metrics – BB%, K%, Zone%, etc. – are more signal than noise, and not a lot is added by going to the previous season. This analysis also suggests that adding old stats may even reduce accuracy, by ignoring the potentially significant shifts made by pitchers in the offseason. So the next time you read about a starter’s performance in his last 30 starts, stretching back to May 2014, beware! Or at least be skeptical.


The Fans Projections: Pitchers

Previously, I looked at the difference between the Fans projections and the Depth Charts projections for hitters. Now let’s look at the pitchers.

As with the hitters, the Fans projections are much more optimistic than the Depth Charts. Using the raw projections, the following table shows how much more optimistic the Fans are for pitchers. One note: because of the big difference in playing time and role for Tanner Roark, I eliminated him from consideration. The Depth Charts have Roark primarily pitching in relief (61 G, 6 GS, 95 IP), while the Fans have him as a starter (34 G, 29 GS, 184 IP).

For the average starting pitcher, the Fans are projecting eight more innings pitched, a much better ERA and WHIP, slightly more strikeouts, and 0.5 more WAR. Relievers have closer projections from the Fans and the Depth Charts, with the Fans projecting just 0.1 more WAR for the average reliever. For the entire set of pitchers, the Fans are projecting a 3.47 ERA and 1.21 WHIP, while the Depth Charts check in at 3.71 and 1.25.

The overall totals are useful to get a big-picture view, but the distribution of WAR can also be interesting. The graph below shows the difference between the Fans projected WAR and the Depth Charts projected WAR for starting pitchers in increments of WAR from -0.8 to 1.9. The players on the left are projected by the Fans for less WAR than the Depth Charts are projecting. The players on the right are projected for more WAR by the Fans. The black line is at a difference of 0.0 WAR.

The Fans projected more WAR for 80% of the hitters (previous article). It’s even more extreme for the pitchers: 83% of the pitchers are projected by the Fans for more WAR than the Depth Charts are projecting. The pie chart below shows this breakdown.

Again, that’s the big picture for starting pitchers. The individual pitchers at the extremes might be interesting to look at, so we’ll start with the nine pitchers with the biggest NEGATIVE difference between their Fans projection and their Depth Charts projection.

There are some good pitchers on this list. Given that the Fans project 83% of pitchers to have a higher WAR than the Depth Charts are projecting, it’s surprising to see Max Scherzer, Jose Fernandez, Hisashi Iwakuma, and Francisco Liriano on this list of starting pitchers the Fans like the least. In most cases, the Fans are projecting a better ERA than FIP and, because FanGraphs WAR is FIP-based, this explains some of the difference in WAR. Also, the Fans are projecting significantly fewer innings than the Depth Charts for some of these pitchers, which reduces their WAR.

Notes:

  • The Fans like Masahiro Tanaka to perform well when he’s on the mound (3.14 FIP versus 3.37 FIP projected by the Depth Charts) but project him for 46 fewer innings than the Depth Charts are projecting, which accounts for his lower projected WAR.
  • Max Scherzer is projected by the Fans for five more innings than the Depth Charts, but with a higher FIP, at 2.96 to 2.78, and a higher BABIP that results in a higher WHIP (1.12 to 1.07). He’s still the eighth-best starting pitcher per the Fans, while the Depth Charts have him at #5 among starters in WAR.
  • Francisco Liriano is similar to Scherzer in that the Fans are in agreement on the number of innings pitched but project Liriano for a higher FIP (3.54 to 3.29) than the Depth Charts.
  • Jose Fernandez and CC Sabathia are projected for fewer innings by the Fans than the Depth Charts. In the case of Sabathia, the deficit is 53 innings. As for production, the Fans and Depth Charts projections are quite similar for CC: 4.18 ERA, 3.93 FIP, 1.26 WHIP for the Fans, 4.12 ERA, 3.96 FIP, 1.26 WHP for the Depth Charts.
  • After dropping his BB/9 to a microscopic 0.7 in 2014, the Fans and the Depth Charts both see regression in this area for Phil Hughes. The Fans expect Hughes’ BB/9 to more than double, from 0.7 to 1.6, while the Depth Charts expect Hughes to hold on to some of those gains, projecting a 1.2 BB/9. Hughes is projected for a very similar number of innings, but his higher FIP projection by the Fans results in a -0.6 WAR difference between the two sources.
  • For John Lackey, the main difference in his Fans projection and his Depth Chart projection is a higher BB/9 (2.3 to 2.0). His career mark is 2.6, but he’s been better than that in each of the last two years (1.9 BB/9 in 2013, 2.1 BB/9 in 2014).
  • Finally, the Fans are most pessimistic about James Shields, despite his move to Petco Park, also known as Pitchers Paradise (by me, I just made that name up, you’re welcome.). The Depth Charts project Shields for a higher strikeout rate (8.1 K/9 to 7.6 K/9) and to have fewer fly balls leave the yard (0.7 HR/9 to 0.9 HR/9). The result is a difference in FIP in favor of the Depth Charts (3.23 FIP to 3.47) and a 0.8 difference in WAR. Among starting pitchers, Shields is ranked 16th in WAR by the Depth Charts but 56th by the Fans.

 

On the other end of the spectrum, there are 21 starting pitchers for whom the Fans project a WAR that is at least 1.0 greater than the Depth Charts are projecting, with Jesse Hahn leading the way with a difference of 1.9 WAR. Here are the top 10 starting pitchers based on the greatest difference in their Fans projection and their Depth Charts projection.

All ten of these pitchers are projected for more innings and a better FIP by the Fans than by the Depth Charts. Many of these pitchers are younger pitchers or pitchers with only a year or two of major league experience.

Notes:

  • The leader in the clubhouse is Jesse Hahn, projected for 1.9 more WAR by the Fans than the Depth Charts. Their innings projections are close, just a difference of 10, but the Fans expect Hahn to post a 3.27 FIP compared to the 4.25 mark projected by the Depth Charts. The Fans project a better strikeout rate, better walk rate, and lower home run rate. In 163 1/3 minor league innings, Jesse Hahn struck out 8.8 batters per nine innings. His K/9 was 8.6 in 73 1/3 major league innings last year. The fans are projecting more of the same, with a K/9 of 8.5. The Depth Charts see major regression, pegging Hahn for a 6.7 K/9.
  • Cliff Lee is projected for 180 innings by the Fans and 106 by the Depth Charts. It looks like there’s a good chance he won’t achieve either mark in 2015.
  • Drew Smyly is projected for significantly more innings by the Fans, along with a better FIP (3.41 to 3.63).
  • The Mookie Betts of starting pitchers? That would be Carlos Carrasco. Carrasco’s projection for 4.2 WAR by the Fans ranks him 9th among all SPs, while the Depth Charts have him 28th. The Fans project Carrasco for a much higher K/9 (9.3 to 8.5) and a lower walk rate (2.2 BB/9 versus 2.6 BB/9), along with 29 more innings pitched.
  • If you compare the strikeout and walk rate projections by the Fans for Nathan Eovaldi to his career strikeout and walk rates, it’s easy to see that they are quite optimistic for Eovaldi in 2015. The Fans project Eovaldi for a 7.0 K/9 versus a 6.3 career K/9 and a 2.3 BB/9 versus a career 2.9 mark. That gives him a forecasted 3.58 FIP. The Depth Charts have him with a 4.25 FIP and less than half as much WAR.
  • Jake Odorizzi had a strong 2014 season and the Fans are optimistic that he can do it again in 2015.
  • Jacob deGrom had a big jump in his strikeout rate after moving up to the big leagues last year. In 323 1/3 minor league innings, deGrom has a career 7.4 K/9, and his best single-season mark in the minor was 7.8 K/9 in 2012. Then he came up to the major leagues last year and struck out 9.2 batters per nine in 140 1/3 innings. The Fans are projecting deGrom for a K/9 of 8.8, while the Depth Charts have him down at 8.2. The difference in FIP is 3.03 for the Fans and 3.40 for the Depth Charts, which produces an overall WAR difference of 1.3. deGrom is ranked 58th among starting pitchers in WAR based on his Depth Chart projection but is 32nd based on the Fans projections.
  • Both the Fans and the Depth Charts like Jordan Zimmermann quite a bit. The Depth Charts projection has Zimmermann with 3.4 WAR, which ranks him 12th among starting pitchers. The Fans projected WAR of 4.7 moves Zimmerman up to 6th.

Well wrap this up with a look at the individual relief pitchers at the extremes. First are the relief pitchers who are projected for much less WAR by the Fans than the Depth Charts.

The three relievers who might be the top three closers in baseball are on this list, which is surprising (Aroldis Chapman, Craig Kimbrel, Greg Holland).

  • Aroldis Chapman is projected for a 2.15 FIP but even that can’t compare to the 1.84 FIP projected by the Depth Charts. The Fans also project Chapman to have a 13.4 K/9 versus a 15.9 K/9 projected by the Depth Charts. The end result is a difference of 1.1 WAR, with the Depth Charts placing Chapman 1st in WAR among relief pitchers and the Fans projecting him 9th.
  • When it comes to Greg Holland, Craig Kimbrel, and Koji Uehara, I’m not sure what is going on with their WAR projection from the Fans. Holland, Kimbrel, and Uehara are all projected for better FIPs by the Fans than the Depth Charts, and similar innings pitched, but less WAR.
  • Jake McGee’s lower WAR projection is in part due to eight fewer innings being projected by the Fans.

Finally, we’ll look at the relievers with the most favorable difference in WAR based on the Fans projections versus the Depth Charts projections.

  • Aaron Sanchez is very popular this spring. He was terrific in 33 innings out of the bullpen last year (1.09 ERA, 2.80 FIP). The Fans project him for a 3.36 FIP in 129 innings (43 games, 16 starts), while the Depth Charts are not so optimistic, with a 4.53 projected FIP in 111 innings (56 games, 11 starts). This combination of better pitching in more innings results in a difference of 1.9 WAR, tops among all relief pitchers.
  • Yusmeiro Petit is similar to Sanchez, projected to have a better FIP (3.07 to 3.30) and more innings pitched (131 to 92) by the Fans.
  • A couple of Seattle Mariners pitchers, Dominic Leone and Danny Farquhar, make this list based on their projection for many more innings by the Fans versus the Depth Charts. This is also true for Chase Whitley, Tony Watson, Jake Diekman, and Justin Wilson.
  • The Fans projection for Jeurys Familia is closer to the Depth Charts projection for innings pitched (61 to 55), but the Fans project Familia to have a 2.97 FIP versus a 3.50 FIP projected by the Depth Charts.

 


Robinson Cano’s Replacement-Level Floor

Robinson Cano’s power vanished in 2014 without a clear explanation.  Most believe that he will be valuable even if the power does not return.  I think Cano’s risk going forward is greater than meets the eye.

After sporting an ISO of at least .199 every year from 2009 to 2013, Cano posted a mark of .139 in 2014.  There is reason to believe that this power outage is permanent.  Robinson Cano was a different kind of hitter in 2014.  His ground ball percentage was 53% (up from 44% in 2013), and his average HR/FB distance plummeted from 292 to 278.  Cano was mostly incapable of hitting fly balls to his pull side, which is where his home-run power used to be, despite swinging at more pitches middle-in.  Cano’s aging bat may be unable to turn on major-league pitching the way it used to.  As noted elsewhere, Cano’s 2014 power numbers had little to do with the move from Yankee Stadium to Safeco.  His problem was that he hit the ball in the air less frequently, with less authority, and to the wrong side of the ballpark.

Aging may have played a role, but it is unusual for an elite slugger’s power to disappear at age 31 without something else going on.  Perhaps Cano was dealing with an injury.  Perhaps his amazing run from 2009-2013 was fueled by PEDs.  We don’t know.  But consider the similarities between Cano’s pre-elite 2008 line and his line from last year:

Year NI BB% K% ISO BABIP WAR
2008 3.6% 10.3% .139 .283 0.1
2014 6.1% 10.2% .139 .335 5.2

It’s easy to forget that Cano was a replacement level second baseman in 2008.  BABIP (along with the changing run environment) is mostly what separates his 2008 replacement level performance from the five-win version of Cano we saw in 2014.  The stability of last year’s BABIP may be the key to Cano’s value going forward—a terrifying thought for the Mariners, who presumably did not intend to invest $240 million in the vagaries of BABIP.

There is conflicting data on what to expect from Cano’s balls in play in 2015.  For example, ZIPS predicts .323—not so bad.  Jeff Zimmerman’s xBABIP formula predicts .299—much closer to the 2008 disaster scenario.  Neither of these predictions fully accounts for shifts, and Cano’s performance against them in 2014 is concerning.  His BABIP was .388 against the shift and .303 without it.  This is disconcerting because Cano displayed no such shift-beating prowess before last year, and his 2014 spray chart suggests no change in his approach that would justify any BABIP spike.  To the contrary, last year Cano hit an alarming number of grounders to the right side of the infield, which should have favored the shifted infield defenses.  It appears that Cano got lucky—perhaps very lucky—with his 2014 balls in play.  My money is on something closer to the xBABIP prediction for 2015.

Cano went from an elite slugger to a BABIP-fueled slap hitter in a short period of time.  His 2014 output was akin to an early-career Ichiro, except unlike Ichiro, we lack assurances that Cano will maintain the high BABIP.  If the power is truly gone and the BABIP craters, he’s toast—or at least something closer to league average.  The risk of collapse is higher than most want to believe, if for no other reason than this same risk was once realized by the same player.


Making the Case Against Baseball in Montreal

Through a lot of backroom deals and schemes, which are beautifully illustrated in Jonah Keri’s Up, Up, and Away, mayor Jean Drapeau was finally able to get Montreal, and Canada, a professional baseball team. The Expos were the first baseball franchise to be situated outside of the US. They were part of Major League Baseball from 1969 to 2004; in 2004 they relocated to Washington and became the Washington Nationals.

Throughout most of its history, baseball in Montreal has been a struggle, not just on the field but also off it. In fact, just getting a suitable stadium for the team was a headache. The Expos had to play their first seven years in a Triple-A ballpark called Jarry Park, which could only seat 28, 500 people. The stadium was less than ideal, it wasn’t a dome, and due to Montreal’s cold weather, many games in April and September had to be played on the road.

In 1977, however, the Expos finally got a new Stadium, Olympic Stadium. The unfortunate part, however, for the Expos, was that the primary designs of the stadium were for the Olympics and not baseball. In fact the Stadium, while a dome, was a disaster, in not just its facility but it’s location. It was located completely out of the way, and far from downtown. Charles Bronfman, owner and majority shareholder, often tried to get a new stadium in downtown Montreal, but was never successful. This was probably one of the most significant impediments in the Expos success as a franchise.

The Expos were often poor on the field, but more importantly, they were poor as a business, creating very little revenue (as compared to other major league franchises). They were also, as it seemed, always rebuilding, never being able to sign valuable free agents, and never having a high payroll. There attendance also wasn’t exactly great.

What now follows is an evaluation, of the Expos historical value as a franchise. The problems? Well there are several, one and perhaps the most important to remember, is that teams are privately owned, and therefore are not obliged to disclose any of their financial information. This makes evaluating a team’s overall value very difficult, but not impossible.

Most of you are probably familiar with Forbes. The problem, however, is that I was only able to find Forbes data from 1990 to 2014. I also was only able to find data on payroll, from 1985, on-word, leaving me essentially only with attendance to look at from 1969 to 2004. Attendance, and let me make this clear, is not the best way of measuring a franchise’s value, but since it’s the only data source I could find before 1985, I thought I’d use it. So, below is a chart comparing the Expos attendance history to league average.

MTL A

For most of its history, the Expos attendance was below average. A couple of other important elements to note are that in 1981, it was a labor-shortened season. That’s why you see the league wide drop in attendance. In 1998 also, while the league attendance was starting to rise, the Expos dropped dramatically. Perhaps this had something to do with the trade of Pedro Martinez to the Red Sox, in the 1997 offseason. Perhaps it had something to do with the franchise rebuilding, yet again, or perhaps there was still some lingering frustration from the 1994 season. None of this is certain, what is however is after 1996, Expos fans stopped showing up.

The goal though is not to gain a sense of attendance, but rather to get a sense of the franchise’s value. Attendance, in that matter has a number of shortcomings. It doesn’t tell us anything about the overall expenses, revenues, ticket sales, TV deals, income, ect… Rather, what it does is give us a sense of the fan’s interest in the team (though not entirely as it doesn’t consider TV ratings). While there seems to have been a significant interest in the team in the mid to low 80’s, the overall interest in the team tends to have been very minimal.

As I’ve mentioned because teams are privately owned enterprises, I had to rely on Forbes value system, which is only available from 1990 on-wards. This will skew the data. For example, from 1979 to 1990 was the Expos most successful era. During that time they only had two losing seasons, which coincided with their first and only playoff berth in 1981.

That being said, a team’s success on the field does not always translate to value. We should therefore not assume that since the Expos had good teams from 1979 to 1990 that the team’s value had risen significantly, if at all. Just take a look at the Rays and the A’s, both teams have won a lot of games, the last few years, and yet Forbes still ranks them among the lowest teams in value.

Also many of you might be wondering what goes into Forbes’ valuation process? How accurate is it? These are valuable questions and concerns. While there isn’t a ton of information out there on these issues, John Beamer did write an article in 2007, for The Hardball Times, which takes a look at how accurate Forbes’ valuation is and what goes into it. If you’re too lazy to read it, than just understand this, “The variance between the purchase price and the Forbes’ valuation averaged 20%…” also “The primary axis of valuation is team revenue, which includes things such as ticket sales, TV money, sponsorship, revenue sharing, concessions, parking and a myriad of other schemes that franchises use to wheedle money from their fans”.

In determining the value, Beamer looked at “recent deals” which ranged from years 1992 to 2006 where only two team values were past 2004 (Brewers and Nationals). Considering most of the data we will be looking at will be from years 1990 to 2004, Beamer’s valuations should not be considered outdated.

So considering that Forbes’ main valuation process is through revenue, that’s where we’ll go next. Below is a chart that compares Montreal’s revenue from 1990 to 2004, compared to league average. An element to note, the 2002 data for revenue was not available, that’s why you will notice a break in the graph.

MTL R

As you can probably tell, Montreal was always, below average when it came to revenue, and the gap seemed to be getting wider and wider as the years went on. It is also very disappointing that the 2002 data point was not available. There seems to be some kind of break or shift that happened that year, which would have been interesting to look at.

Even though revenue is the major contributor to value, it also states in Beamer’s article that “Major League Baseball franchises are typically valued at somewhere between 2-3x revenues”. To see the evidence of this, again read John Beamer’s article.

So now lets get to the moment you’ve all been waiting for, the Expos franchise value, compared to league average. I also included the median in the chart below. Why? Well in order to avoid teams that are skewing the data too heavily one way or another, such as the Yankees, the median seemed like a useful tool to add, although as you will be able to tell, there wasn’t a significant difference between the median and average.

MTL V

A lot of you might notice the sudden increase in value for the Expos, in 2004. Well, the Forbes’ valuations for 2004 came after the 2004 season. Thus the franchise was going to officially be the Washington Nationals, which immediately increased the team’s overall value.

Some of you at this point might be wondering how can value increase so significantly? Well, in order to understand what this means, I recommend you read John Beamer’s The Value of Ball Clubs (Part 1) and go to the valuation 101 section. If you don’t want to do that, then I’ll just summarize the concept. Basically what one is trying to do, in valuing any type of business, is trying to work out the value of today, in conjunction with the amount of cash flow a business or team will provide it’s owners in the future.

Ok, now that you got that, let’s look at one final chart, I promise! Here we’ll look at the Expos overall franchise value beginning with 1990, but will also include the Nationals value until 2011, in order to see how the move to Washington has paid off.

Expos to Nats

Now look at that huge increase in team value. Basically what Major League Baseball did, was turn one of it’s least profitable teams into an above average team. In fact, from 2003 to 2004 the team’s value changed 114 %. This was by far the biggest change in one-year value of any franchise. The next highest one-year percentage change, for 2004, was the Phillies at 39%. In fact, since Forbes has made their data available I have never found a one-year value % change as high as this one.

This looks like pretty damming evidence of the Expos franchise, and it is. Montreal’s first crack at a Major League Franchise was not a successful one. This, however, does not mean that it wasn’t important. Montreal was the first Canadian franchise to ever get a baseball team and it opened the doors for a team to come to Toronto.

That being said,, the prospects of Montreal getting a new team does look bleak, even after Rob Manfred’s comments, “Montreal’s a great city. I think with the right set of circumstances and the right facility, it’s possible.” Manfred’s comments were positive, when addressing Montreal, however, they were relatively vague. The notion of the right set of circumstances, for example, could mean anything. Also, for Montreal to get a team another team needs to re-locate and when addressing a team’s relocation, a popular team has been the Tampa Bay Rays.

The problem is that the Rays aren’t moving anytime soon. As Eric Macramalla points out in his article, Dream Killer: Sorry Expos Fans, The Tampa Bay Rays Aren’t Moving To Montreal. Basically the Rays aren’t going anywhere because they signed a Use Agreement, which “prevents the team from moving out of Tropicana Field and calls for potentially catastrophic monetary damages should the Rays abandon the stadium before its deal is up in 2027”. As for baseball expanding, well I haven’t exactly herd or read that baseball expects to expand anytime soon, so it doesn’t look like that is going to happen.

Then there’s the right facility, well just about every owner of the Expos has tried unsuccessfully to get a new stadium, and one downtown. At this point (and this is my opinion and should be taken that way), Montreal would need to construct a stadium downtown in order for them to receive a team. Which, given its history of incompetence in that matter seems unlikely.

Finally, could Montreal someday get a baseball team? Yes, when that will be, I don’t know, probably not anytime soon. Therefore Expos fans should not be holding their breaths. At this point, as it concerns a Major League Baseball Franchise there really is no evidence that Montreal can sustain a successful team. That being said, if I were Major League Baseball, I’d start by installing a Minor League Team and see how it goes. If it’s successful and fans are showing up, then perhaps re-consider.

References:

 

  1. John Beamer Articles for The Hardball Times: Part 1 http://www.hardballtimes.com/measuring-managing-the-value-of-ballclubs-part-1/
  2. Part 2: http://www.hardballtimes.com/measuring-managing-the-value-of-ballclubs-part-2/
  1. SABR Business of Baseball Committee, which provided most of the Forbes data. Also a great source of economic data, for baseball research.
  2. Eric Macramalla’s article “Dream Killer: Sorry Expos Fans, The Tampa Bay Rays Aren’t Moving To Montreal”.
  3. The Biz of Baseball for providing additional Forbes data.
  4. Ben Nicholson-Smith article Manfred: Return to Montreal ‘Possible’ for MLB, for the Manfred quote.
  5. Jonah Keri’s Up, Up, and Away.
  6. Attendance data was found at Baseball Reference.