Archive for March, 2014

Baseball’s Biggest Market Inefficiency

A couple years ago, The Oakland Athletics extended the contract for general manager Billy Beane for an additional 5 years, securing him through the 2019 season. They did not release the specific details of the contract, but we can guess that it’s comparable to the $3-4m that Brian Cashman and Theo Epstein make per year. The A’s are paying Alberto Callaspo $4.1m this year. Neither Beane’s nor Callaspo’s salary is particularly surprising, since both roughly reflect the current market for a top-tier GM and a 30-year-old infielder with a career .273/.335/.381 slash line. But is this reasonable? Should the owner of a baseball team be more willing to pay a mediocre infielder than an elite general manager? If the following data on front office success is any indication, absolutely not.

Using payroll data that goes back to 1998, I wanted to compare how well teams achieved success relative to the budgets they were given by their owner. In order to do so, I ran a regression model for every season to determine what sort of an effect payroll had on wins. For each season, the league average payroll is normalized to “1” to allow every season to be in this chart. As the graph shows, teams with more money tend to win more. If you’re adventurous and feel like interacting with the data you see below, click here and mouse over everything until you pass out.

Payroll Regressions

It’s no surprise that money leads to success, and the fact that certain GMs tend to outperform their budget shouldn’t stun you either. But the extent to which some general managers are better than others is enormous. There are plenty of GMs who did exactly what you’d expect given their budgets. In 9 seasons with the Expos and Mets, Oscar Minaya was given the funds to win 742 games. He won 739. Mark Shapiro was supposed to win 703. He won 704.

Some general managers have better reputations, though. Theo Epstein, former Red Sox GM and current Cubs President of Baseball Operations, was given the budgets that would have resulted in 795 wins from an average GM, but he turned that into 839 wins in Boston. Legendary Braves GM John Schuerholz led a Braves squad that won 78 more games than expected since 1998, when the data begins. The 2nd best on the list, ex-Cardinals and current Reds GM, Walt Jocketty, has been worth an astounding 106 wins.

Then there’s Billy Beane: the Billy Beane who the A’s are paying slightly less than Alberto Callaspo. Under his direction, the Oakland Athletics have won 171 more games than expected. Babe Ruth had a career WAR of 168. Ruth’s best season was worth an absurd 15.0 wins in terms of WAR. Billy Beane has had 6 seasons during which the A’s won 15+ more games than they should have. They’ve never had the money to win 50% of their games, but half their seasons have ended with 20 more wins than losses.

I could go on. But first, let’s look at some visuals.

Best General Managers

Here are the 16 general managers whose teams have exceeded their financial expectations by over 25 wins since 1998. If you want to look at every general manager’s wins and expected wins, explore this. One name I have not mentioned so far that has had an impressive stint as head of the Rays is Andrew Friedman. Since he’s been the GM of the Rays, they have been tied with St. Louis as the most cost effective winners in baseball. It’s even more impressive when you consider the dumpster fire he inherited.

Andrew Friedman

After ignoring the potential for mediocrity during his first two seasons and building for the future, Friedman’s Rays took off. In the past 6 years, the Rays have won 87 more games than they should have. While it’s not as good as Beane’s best 6-year stretch of 117, it has coincided with a relatively weak stretch for the A’s where they have only exceeded their budget-wins by 30 games.

Unfortunately, not every team can have a Billy Beane or an Andrew Friedman. Some teams, like my Royals, have had more struggles. This chart, as much as anything else, shows the overwhelming need for effective front office management.

Team Front Offices

The A’s have been 300 wins more effective than the Orioles the last 16 years. If a win is worth $5m as recent free agency has suggested, then I don’t even want to type how much Billy Beane is worth. If you’re like me, it’s worth your time to explore this chart which shows yearly expectations and results for each team. A couple teams to look for, in addition to the ones I’ve talked about: the Cardinals and Braves have been unsurprisingly excellent, and the Cubs, Royals, and Orioles are worth looking at for less exciting reasons.

So how much should teams be paying their GMs? At this point, that’s an easy answer. $1 more than everyone else will pay, because they are tremendously undervalued right now. After that, the answer isn’t as clear, but I don’t see why they wouldn’t be similarly paid to players. One easy counter argument would be that if you pay the GM too much, he can’t do his job as well because he would have less money with which to pay players. But I think there is sufficient evidence that, for example, the Blue Jays would win more games with the team that Andrew Friedman could assemble with a $97m payroll than their current $117m team. The fact that the Rays will pay $57m this year for their squad should support that claim more than enough.

The free agent market would say that an elite GM could be worth $50-$100m a year. While that might strike people as unreasonable, it’s probably closer to their real value than the replacement level infielder-pay they are receiving now.


A Happy, Sad, Wonderful, Terrible April

If you’re anything like most fantasy players, you may find yourself investing in similar players across multiple leagues. If you’re anything like me, those players seem to get injured more than others. If you are me, this year you invested in Mat Latos and Doug Fister everywhere you could… and are furious.

But if you need a placeholder for April while your starters heal, full-season projections might not be as relevant to your replacement decisions. While it’s always smart to go with skill as your primary determination, often the free agent pitching pool is fraught with pitchers that are more similar. In such instances, the pitcher’s April schedule could be of use. If you need a pitcher for one month and one month only, his May – September prospects are of little concern.

Either because I’m a simple man, or because I’m receiving $0 in compensation for this short piece, I decided a fair estimator would be to simply use the FanGraphs 2014 Projected Rankings and input each opponents Runs Scored per Game (RS/G) for each team on a schedule grid for the month of April. I then averaged out the projected RS/G of all opponents for each game in April. This is what I found.

Team

Division

Games

Opponent

Avg RS/G

Atl

NLE

27

3.979

Cin

NLC

28

3.999

Was

NLE

28

4.000

Col

NLW

29

4.004

Mil

NLC

28

4.058

Ari

NLW

29

4.063

StL

NLC

29

4.070

NYM

NLE

27

4.087

ChC

NLC

27

4.093

LAD

NLW

26

4.095

Pit

NLC

28

4.110

Mia

NLE

27

4.127

Phi

NLE

28

4.153

SD

NLW

29

4.174

LAA

ALW

27

4.190

Tex

ALW

28

4.194

Det

ALC

26

4.195

KC

ALC

27

4.196

SF

NLW

28

4.203

Cle

ALC

29

4.212

Oak

ALW

29

4.244

Tor

ALE

27

4.254

Min

ALC

26

4.267

Sea

ALW

27

4.284

TB

ALE

29

4.301

ChW

ALC

29

4.318

NYY

ALE

27

4.319

Hou

ALW

28

4.345

Bos

ALE

28

4.370

Bal

ALE

27

4.383

What do we see here? First, as expected, on average the AL teams face more projected runs. You’re welcome for that valuable information. One interesting note, though, is that the San Francisco Giants will face an even tougher aggregate offense than four AL teams. What do we take from this? Maybe if you’re thinking about Tim Hudson vs. Marco Estrada in a shallow league for a rental, you take Hudson. In a shallower league in which this is a real decision, however, you can probably stream matchups with a high efficacy throughout the month. But as a FanGraphs reader (ego-stroke), there’s a fairly high probability that your most difficult decisions come in deeper leagues. So we shall redirect our attention to pitchers farther down the ranks.

“But DomRep,” you might smirk, “aren’t AL/NL differences factored into preseason rankings to a large degree?” Yes, observant reader, they are. This is why this table is much more useful when comparing pitchers in the same league. The NL is below:

NL

Rank

Team

Division

Games

Opponent

RS/G

1

Atl

NLE

27

3.979

2

Cin

NLC

28

3.999

3

Was

NLE

28

4.000

4

Col

NLW

29

4.004

5

Mil

NLC

28

4.058

6

Ari

NLW

29

4.063

7

StL

NLC

29

4.070

8

NYM

NLE

27

4.087

9

ChC

NLC

27

4.093

10

LAD

NLW

26

4.095

11

Pit

NLC

28

4.110

12

Mia

NLE

27

4.127

13

Phi

NLE

28

4.153

14

SD

NLW

29

4.174

15

SF

NLW

28

4.203

In the NL, there may be a built-in feeling that, when two pitchers are similar, you’re probably better off just taking the guy from San Diego. Poppycock! San Diego will face the Dodgers, Brewers, and two AL teams this month (Tigers and Indians). Exclamation point! It should be noted that San Diego likely has a less pitcher-friendly park factor than they used to, but even still, a quick glance at the table above should help you decide to maybe choose Jhoulys Chacin, Taylor Jordan, or Tanner Roark over Eric Stults if you think they’re similar pitchers.

Here’s the AL:

AL

Rank

Team

Division

Games

Opponent

RS/G

1

LAA

ALW

27

4.190

2

Tex

ALW

28

4.194

3

Det

ALC

26

4.195

4

KC

ALC

27

4.196

5

Cle

ALC

29

4.212

6

Oak

ALW

29

4.244

7

Tor

ALE

27

4.254

8

Min

ALC

26

4.267

9

Sea

ALW

27

4.284

10

TB

ALE

29

4.301

11

ChW

ALC

29

4.318

12

NYY

ALE

27

4.319

13

Hou

ALW

28

4.345

14

Bos

ALE

28

4.370

15

Bal

ALE

27

4.383

In the A.L., one might take a quick gander and be encouraged to use Garrett Richards over Bud Norris because they face the easiest and toughest April pitching schedules, respectively. Pseudo-sleeper Tyler Skaggs might also be expected to start out well.

As we mentioned before, preseason rankings and projections take league into consideration. So when considering two pitchers in different leagues, it might even help to take a quick peek at their respective schedule rankings within their own league. For instance, while San Diego (#14 NL schedule) can be expected to face less run-scoring potential this month on average than Anaheim (#1 AL schedule), this will be the case the whole season and is, therefore, factored in when rankings show Tyson Ross and Tyler Skaggs in similar places. But the rankings eke out that Ross’s month should be harder than his average month while Skaggs’s month should be easier.

If you’re in a position to stream relatively strong pitchers throughout April, this is probably useless to you. The sample size of a month’s worth of starts can also blow all of this up. It’s common practice to look at September strength of schedule for pitchers, but everyone tends to ignore April because their eyes are focused on the whole season. But if you’re anything like me, and Latos/Fister are giving you fits, hopefully you’ll keep strength of schedule in mind.


Fantasy Comparables: Ceilings, Floors, and Most Likely Situations

I’m entering my fourth season of fantasy baseball this year and in my quest for my first championship I stepped up my preseason work to include making my own projections for players and creating my own dollar value system for my league’s custom scoring (6×6, standard with OPS and K/9 added). When making projections for players this year, I looked at their last three seasons in the Majors and used their Steamer and ZiPS projections to make sure I was in the same universe or had solid reasons for my different projection. I made projections for about 300 hitters and 200 pitchers, which I feel are grounded in reality and will give me an edge in my fantasy endeavors this year.

However, while I’m pleased with my projections and it’s definitely better than when I first started playing and just knew Yankees and other AL East players, my projections are still very limiting. One of the main problems is that I’m producing a single stat line for each player. It’s based on what they’ve done previously, how they’re trending, and how I and other systems think they’re mostly likely to produce in 2014, but it’s still just a single projection. More advanced projection systems, like PECOTA, compare a given player to thousands of other Major Leaguers to find comparable careers and produce various projections and each projections probability of occurring.

Projection systems like this recognize the inherent uncertainty of projecting future baseball performance and instead of giving one stat line, give us a range of outcomes with their likelihood and produce more accurate results. Now, I am just dipping my toe in the water of finding comparable players and making projections based on that but I wanted to see how this type of system would change my valuation two outfielders who will turn 27 this season, Justin Upton and Jay Bruce. Bruce will turn 27 in April and Upton turns 27 in August. They’ve both been big fantasy contributors in the past, Bruce is more consistent in his production while Upton has been streakier, with hot and cool months and peaks and valleys of home run and stolen base totals. I’ve put my projections for them below with a dollar value based on a 12 team league with 22 roster spots and a 70-30 hitters-pitchers split.

Player

AB

BB

Hits

2B

3B

Runs

HR

RBI

SB

AVG

OBP

SLG

OPS

Dollar Value

Jay Bruce

590

62

154

38

1

88

33

100

7

.261

.331

.497

.828

$29.39

Justin Upton

550

68

150

28

2

95

25

78

13

.273

.353

.467

.820

$26.96

I’m projecting them to produce similar value, but Bruce definitely has an edge. To find comparable players to Bruce and Upton, I looked at all MLB season from 1961 through 2013 (61 being an arbitrary start date based on how much data my laptop could sort through and organize with John Henrying it’s CPU). I narrowed down to players with similar home run and stolen base totals in their age 23 to 26 seasons, along with average, OPS, strikeout and walk percentages, and playing time in an attempt to find a list of similar hitters.

For Jay Bruce I found 19 comps and I found 26 for Upton, there’s a link to the google doc with the full list below which I recommend checking out, it’s not included here so I can save some space. Now that I have the comparable players, I want to see how the performed in their age 27 season to give me a range of outcomes for both Bruce and Upton. I’ve included some bullet points here, again with the full spreadsheet linked at the end.

Mean and Median Value of Comparable Players’ Age 27 Season

  • The average dollar value of Upton comparables was $27.17 and the median value was $31.49.
  • The average of Bruce comparables was $21.39 and the median value was $19.51.

Best Case Scenario

  • The best case scenario for Upton would be to follow Bobby Bonds’ age 27 season, where he put together his power and speed (39 HRs and 43 SBs) and bumped his average up to .283 from .260 in the previous year. I don’t think the HR total is out of the question, definitely hard and more than I’m predicting, and I think the average is within reach, but Bonds was regularly stealing 40 bases a year at this point which Upton is clearly not.
  • The best case scenario for Bruce would be to follow Dale Murphy’s age 27 season. Murphy hit .302 that year, with 36 HRs, scoring 130 times and driving in 121 RBIs. While a .300 average may seem unfathomable for Bruce, Murphy hit .281 the year before and .247 the year before that. What makes this situation most unlikely, is that Murphy had a little more speed than Bruce (most seasons stealing bases totaling in the high single digits or low double digits) but he swiped 30 when he was 27, probably out of Bruce’s reach.

More Realistic Good Scenarios

  • While I don’t expect Upton to reach Bobby Bonds level, it’s not hard to imagine him producing a line similar to Reggie Jackson’s 1973 when Jackson was 27. From 1970 to 1972, Jackson’s home run highs and lows by season were 23 to 32, his stolen bases ranged from 26 to 9, and his average fluctuated from .237 to .277.  There’s the volatile situation that we’ve grown accustomed to seeing from Upton. In 1973, Jackson put it together and hit 32 dingers, stole 22 bags, and hit .290.  Upton has already produced remarkably similar lines (2011 – 31HR/21SB/.289 avg) and could put it together for 2014.
  • Jay Bruce isn’t going to steal 30 bases but he easily follow the 27 year old season of a former Cincinnati Red, Adam Dunn. Dunn was reliably hitting 40 home runs a year at this point (seriously, four straight season with exactly 40) and while Bruce has yet to reach the 40 mark, it’s not outside the realm of possibilities. The big difference with Dunn’s age 27 season from his other years is that he got his average up to .260 (bookended with .230 seasons), stole 9 bases, and had over 100 runs and RBIs. With Bruce entering his power prime, I think 40 homers is definitely possible, if still unlikely, and hitting .260 is definitely in his wheel house.

Outside of Injury, Worst Case Scenario

  • For Upton, if he stays healthy the worst case scenario is following former Phillies 2B, Juan Samuel. Samuel had between .264 and .272 the four previous season, with home run totals as high as 28 but reliably in the high teens, and had stolen at least 30 bases each year. At age 27 though, his average fell to .240, he only hit 12 home runs (and never exceeded 13 again), and while he could rely on his speed and stole 30 bases he failed to produce 70 runs or RBIs. Not the most likely situation for Upton, but I could envision it with less stolen bases.
  • For Bruce, the floor doesn’t get that low. If he reaches 500 Abs the worst comparable is Torii Hunter’s age 27 season where he only hit .250 and stole 6 bags, but still hit 26 homers and drove in 100 RBIs. Given Bruce’s consistency and the consistency of his comparables, I’d expect a high floor.

The Merciful Conclusion

 I know this took up a lot of room and we’re all happy this is almost over, but what does this mean. First, this is pretty rudimentary with no set formula for finding comparable players, I did my best but they’re definitely not one to one matches and should be taken with a grain of salt. However, I think this helps articulate a fundamental difference between Jay Bruce and Justin Upton. Bruce is a high floor, more limited ceiling guy and I’ve got more confidence that his 2014 will fall close to my projections. I know I’m buying about a .260 average, with a couple of stolen bases, mid 30s home runs with a little wiggle room, in a good lineup.

Justin Upton is a lotto ticket guy. I’m sticking with my projection for his season which falls between the extremes, but if he repeats his 2011 or puts together his tools that he has demonstrated at different points of his career, he could finish right behind Mike Trout among fantasy outfielders. At the same time, I could see him producing a line like his big brother BJ did last year, okay maybe not that bad, but definitely not worth his draft price. Who you take depends on what path you want to believe and who you already have on your team, but I think laying out these options and using player comparables definitely adds to fantasy projections and will be a staple I’ll use next year.

 

As promised, here’s the link to the full list of comparable players used for this article: https://docs.google.com/spreadsheet/ccc?key=0AmP-CH5MqzENdFZSZ0xhQVZiYWxNSVQxYzBsOFh3YkE#gid=0


Why I Don’t Use FIP

Over the last decade, Fielding Independent Pitching (FIP) has become one of the main tools to evaluate pitchers. The theory behind FIP and similar Defensive Independent Pitching metrics is that ERA is subject to luck and fielder performance on balls in play and is therefore a poor tool to evaluate pitching performance. Since pitchers have little to no control over where batted balls are hit, we should instead look only to the batting outcomes that a pitcher can directly control and which no other fielder affects. In the case of FIP, those outcomes are home runs, strikeouts, walks, and hit batters.

However there are many serious issues with FIP that collectively make me question its usage and value. These issues include the theory behind the need for such a statistic, the actual parameters of the formula’s construction, and the mathematical derivation of the coefficients. Let’s address these issues individually.

Control over Balls in Play

A common statement when discussing FIP or BABIP is that pitchers have little to no control over the result of a ball once it is hit into play. A pitcher’s main skill is found in directly controllable outcomes where no fielder can affect the play, such as home runs, strikeouts, and walks (and HBP). In trying to estimate a pitcher’s baseline ERA, which is the objective of FIP, the approximately 70% of balls that are put into play can be ignored and we can focus only on the previously mentioned outcomes where no fielder touches the ball.

The concept of control is a little fuzzy though and something I believe has been misappropriated. It is definitely true that the pitcher does not have 100% absolute control over where a batted ball is hit. There is no pitch that anyone can throw that can guarantee a ball is hit exactly to a particular spot. However in the same vein, the batter doesn’t have 100% absolute control either. If you were to place a dot somewhere on the field, no batter is good enough to hit that spot every time, even if hitting off a tee.

However this lack of complete control should not in any way imply that the batter or pitcher doesn’t have any control at all over where the ball is hit. Batters hit the ball to places on the field with a certain probability distribution depending on what they are aiming for. Better batters have a tighter distribution with a more narrow range of possibilities and can more accurately hit their target. For example consider a right-handed batter attempting to hit a line drive into left field on an 80 mph fastball down the heart of the plate. A good hitter might hit that line drive hard enough for a double 30% of the time, for a single 30% of the time, directly at the left fielder 10% of the time, and accidentally hit a ground ball 20% of the time. Conversely, a worse batter who has less control over his swing may hit a double 10% of the time, a single 10% of the time, directly at the left fielder 15% of the time, an accidental ground ball 25% of the time, and in this case not even get his swing around the ball fast enough and instead hits the ball weakly towards the second baseman 40% of the time.

Where the pitcher fits into the entire scheme is in his ability to command the ball to specific locations, with appropriate velocity and spin, as to try to sway the batter’s hit distribution to outcomes where an out is most likely. Consider the good hitter previously mentioned. He accomplished his goal fairly successfully on the meatball-type pitch. What if the same good batter was still trying to hit that line drive to left field, but the pitch instead was a 90 mph slider on the lower outside corner? On such a pitch the good batter’s hit distribution may start to resemble the bad hitter’s hit distribution more closely. This is a slightly contrived and extreme example, but it also encompasses the entire theory of pitching. Pitchers are not trying to just strike out every batter, but instead pitch into situations and to locations where the most likely outcome for a batter is an out.

By this reasoning the pitcher has a lot of control over where and how a batted ball is hit. This does not mean that even on the tougher pitch that the batter can’t still pull a hard double, or even that the weak ground ball to the second baseman won’t find a hole into right field, these are all still possibilities. However by throwing good pitches the pitcher is able to control a shift in the batter’s hit probability distribution. Similarly, better batters are able to make adjustments so that their objective changes according to the pitch. On the slider, the batter may adjust to try to go opposite field. However a good pitch would still make the opposite field attempt difficult.

This is all to say that better pitchers have more control over how balls are hit into play. They are able to command more pitches to locations where the batter is more likely to hit into outs than if the pitch was thrown to a different location. Worse pitchers don’t have such command or control to hit those locations and balls put into play are decided more by the whims of the batter. FIP takes this control argument too far too the extreme. There is a spectrum of possibilities between absolute control over where a ball is hit and no control over where a ball is hit that involves inducing changes in the probability distribution of where a ball is hit, which is how the game of baseball is actually played. As a simple example, we see that some pitchers are consistently able to induce ground balls more frequently than others. Since about 70% of all plate appearances result in balls being put into play, it is important to actually consider this spectrum of control instead of just assuming that the game is played only at one extreme.

Formula Construction

Let’s pause though and ignore my previous argument that a pitcher can control how balls are hit and we’ll instead assume that all the fielding independence theories are true and we can predict a pitcher’s performance using only the statistics in the FIP formula. This introduces an immediate contradiction since none of the statistics used in the FIP formula (except HBP, which has the smallest contribution and is a prime example of lack of control) are in fact fielder independent. The FIP formula is not actually accounting for its intended purpose.

The issue of innings pitched in the denominator has been addressed before. Fielders are responsible for collecting outs on balls in play which therefore determines how many innings a pitcher has pitched. However all three of the statistics in the numerator are also affected by the fielding abilities of position players, especially in relation to ballpark dimensions. Catchers’ pitch framing abilities have been shown recently to heavily affect strike and ball calls and could be worth multiple wins per season. Albeit rare events, better outfielders are able to scale the outfield fences and turn potential home runs into highlight reel catches.

More commonly though, better catchers and corner infielders and outfielders can turn potential foul balls into outs. When foul balls are turned into caught pop-ups or flyballs, the at bat ends, thus ending any opportunity for a walk or a strikeout which may have been available to a pitcher with worse fielders behind him. This is particularly harmful to a pitcher’s strikeout total. Whereas a ball landing foul only gives an additional opportunity for a batter to draw a walk, it also moves the batter one strike closer (when there are less than two strikes) to striking out.

Similarly, instead of analyzing the effects of the fielders, we can look at the size of foul territory. Larger foul territory gives more chances for fielders to make an out since the ball remains over the field of play longer instead of going into the stands. Statistics like xFIP normalize for the size of the park by regressing the amount of flyballs given up to the league average HR/FB rate, however there is no park factor normalization for the strikeout and walk components of FIP.

We can see the impact immediately by examining the Athletics and Padres, two teams whose home parks have an extremely large foul territory. By considering only the home statistics for pitchers who threw over 50 IP in each of the last five seasons, the Athletics pitchers collectively had a 3.25 ERA, 3.74 FIP, and 4.05 xFIP, while the Padres pitchers collectively had a 3.38 ERA, 3.84 FIP, and 3.86 xFIP. In both cases FIP and xFIP both drastically exceeded ERA. Also, of the 46 pitchers who met these conditions, only 9 pitchers had an ERA greater than their FIP and only 7 had an ERA greater than their xFIP, with 6 of those pitchers overlapping. This isn’t a coincidence. Although caught foul balls steal opportunities away from every type of batting outcome, it is more heavily biased to strikeouts since foul balls increase the strike count.

Mathematics

The mathematics of the FIP formula may be my biggest problem with FIP, mostly because it’s the easiest to fix and hasn’t been. I’ve seen various reasons for using the (13, 3, -2) coefficients in derivations of the FIP formula. Ratios of linear weights, baserun values, or linear regression coefficients are the most common explanations. However none of these address why the final coefficient values are integers, or why they should remain constant from year to year.

There is absolutely no reason why the coefficients should be integers. Simplicity is a convenient excuse, but it’s highly unnecessary. No one is sitting around calculating FIP values by hand, it’s all done by computers which don’t require such simplicity. By changing the coefficients from their actual values to these integers, error and bias is unnecessarily introduced into the final results. Adjusting the additive coefficient to make league ERA equal league FIP does not solve this problem.

The baseball climate also changes yearly. New parks are built and the talent pool changes. This changes the value of baseball outcomes with respect to one another. It’s why wOBA coefficients are recalculated annually. However for some reason FIP coefficients remain constant. The additive constant helps in equating the means of ERA and FIP but there is still error since the ratios of HR, BB, and K should also change each year (or at least over multi-year periods).

I’ve calculated a similar version of FIP, denoted wFIP, for the 2003-2013 seasons using weighted regression on HR, (HBP+BB), K, all divided by IP as they relate to ERA. If we treat each inning pitched as an additional sample, then the variance of the FIP calculation for a pitcher is proportional to the reciprocal of the amount of innings pitched. Weighted regression typically uses the reciprocal of the variance as weights. Therefore in determining FIP coefficients we can use each pitcher’s IP as his respective weight in the regression analysis. The coefficients for the weighted regression compared to their FIP counterparts are shown in the following graph.

Ignoring the additive constant, since 2003 each of the three stat coefficients have varied by at least 22% from the FIP coefficient values and are all biased above the FIP integer value almost every year. In 2013 this leads to a weighted absolute average difference of 0.09 per pitcher between the wFIP and FIP values, which is about a 2.3% difference on average. However there are more extreme cases.

Consider Aroldis Chapman, who had a 2.54 ERA and 2.47 FIP in 2013. On first glance this seems to indicate a pitcher whose ERA was in line with his peripheral statistics and if anything was very slightly unlucky. However his wFIP came to 2.96. If we saw this as his FIP value we might be more inclined to believe that he was lucky and his ERA is bound to increase. This difference in opinion would come purely from use of a better regression model, without at all changing the theory behind its formulation. That is a poor reason to swing the future outlook on a player.

However even with current FIP values, no one would draw the conclusions I did in the previous paragraph that quickly. Upon seeing the difference in FIP (or wFIP) and ERA values, one would look to additional stats such as BABIP, HR/FB rate, or strand rate to determine the cause of the difference and what may transpire in the future. This in fact may be the ultimate problem with FIP. On its own it doesn’t give us any information. Even with the most extreme differentials we always have to look to other statistics to draw any conclusions. So why don’t we make things easier and just look at those other statistics to begin with instead of trying to draw conclusions from a flawed stat with incorrect parameters?


Expected RBI Totals: The Top 267 xRBI Totals for 2013

While there is almost zero skill when it comes to the amount of RBI a player produces, through the creation of an expected RBI metric I have found a way to look at whether or not a player has gotten lucky or unfortunate when it comes to their actual RBI total.

I hope I don’t need to do this for most of our readers, because it’s 2014 and you’re reading about baseball on a far off corner of Internet, so you obviously are more informed than the average fan who consumes ESPN as their main source of baseball information, but lets talk about why RBI, as a stat, and why it is not valuable when you look at a players’ talent. The amount of RBI a player produces are almost—we’ll get into the almost a little later—entirely dependent on the lineup a player plays in. If a player doesn’t have teammates that can get on base in front of them in the lineup, there aren’t very many opportunities for RBIs; that’s the long and short. Really, RBI tell more about the lineup a player plays in than the player himself.

Intuitively, this makes sense.  The more runners there are on base, the more chances the batter will have for RBI, and the more RBI the batter will accumulate. When I said, “The amount of RBI a player produces are almost…entirely dependent on the lineup a player plays in”, lets be a little more precise. My research took the last three years of data (2010 to 2013) and looked at all players that had 180 runners on base (ROB) during their at bats over the course of a season. Over the three seasons, which should be enough data—it was a pain in the ass to obtain the data that I did find—ROB correlated with RBI by a correlation coefficient of .794 (r2 = .63169), which is a very strong positive relationship.

But hey, that doesn’t mean that you can be a lousy hitter get a lot of RBI. That would be like if you threw a hobo in the Playboy Mansion and expected him to get a lot of tail; all the opportunity in the world can’t mask the smell of Pall Malls, grain alcohol and a lifetime of deflected introspection; trust me, I worked at a liquor store for three years in college, and I know.  In the same sample of players from 2010 to 2013 as used above, the correlation between wOBA—what we’ll use here to define a player’s ability at the plate—and RBI is .6555. So there is a relationship between a player’s ability and their RBI total, but nowhere near as strong as the relationship between their RBI total and their opportunity—ROB.

However, when we combine a player’s opportunity—ROB—with their talent—wOBA—we should get a good idea of what to expect for a hitter’s RBI total. Here is the formula for the expected RBI totals based on the correlations between ROB and wOBA, and RBI: xRBI =- 85.0997 + 262.7424 * wOBA + 0.1918 * ROB.

When you combine wOBA and ROB into this formula you end up with a correlation coefficient of .878 and an r2 of .771. Wooooo (Ric Flair voice)!!!!!  With the addition of wOBA to ROB we increase our r2, from .63 with just ROB, by fourteen percent.

2013 Expected RBI Leaders

Click Here to See xRBI Leaderboard

Miguel Cabrera

Photo by: Keith Allison

Let’s think about why Chris Davis xRBI is so much lower than his 2013 actual RBI total.

Davis had 396 runners on base while he batted in 2013, which is 140 ROB less than Prince Fielder who led the league with 536 ROB; Davis’ opportunity was limited.

Davis’ RBI total was considerably higher than what his opportunity would suggest his RBI total should be, and one of the reasons that he outperformed his xRBI total by so much was because of the amount of home runs he hit. Davis, or any batter, doesn’t need a runner on base to get an RBI when he hits a home run. But beyond home runs there is another reason why Davis and other batters outperform their xRBI totals: luck.

Hitting with runners on base is not a skill. A batter has the same probability, regardless of the base/out state, of a hit. Lets forget pitcher handedness and Davis’ platoon splits at the moment. With a runner on second base and two outs Chris Davis will get a hit .272 (27%) of the time—I averaged his Steamer and Oliver projections for 2014 together. Davis, and Alfonso Soriano for that matter, who was the only player to outperform his xRBI by more than Davis in 2013, was lucky and happened to have runners on base the majority of the 28.6%—Davis’ 2013 batting average—of the time he got a hit in 2013.

To put Davis’ 2013 136 RBI season into perspective, in the last five seasons there have been eight players to record 130 or more RBI in a season. Of those eight players, only two—Ryan Howard (2008-9) and Miguel Cabrera (2012-13)—were able to duplicate the performance the following year.

While the combination of ROB and wOBA has allowed us come up with a reliable xRBI, the next step, to increase the reliability of xRBI and account for players who produce a large amount of their RBI from home runs (i.e. Davis), is to include a power component in xRBI: HR/FB ratio.

Follow Me on TwitterDevin Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading him, follow him on Twitter @devinjjordan.


The Royals: The AL’s Weirdest Hitters

The MLB season is quickly approaching, and I am running out of ways to entertain myself until real baseball starts again.  One way that I attempted to do so today was to prepare a guide about strengths and weaknesses of offenses by team.  I just worked with the AL because I didn’t feel like adjusting the data for DH and non-DH teams to be in the same pool.  Using FanGraphs’ infallible Depth Charts feature, I gathered every American League team’s projected totals for AVG, OBP, SLG, and FLD, in order to see some basic tendencies for each team coming into the 2014 season.  I plugged some numbers into 4 variables which I thought would give a better-than-nothing estimate of how a team’s offensive roster was set up. Here are the stats I used to define each attribute:

Contact: AVG

Discipline: OBP – AVG

Power: SLG – AVG

Fielding: FLD

These variables are about as perfect as they are creative (which is to say, not very).  However, this was intended to be a fairly simple exercise.  For each variable, I ranked all the teams and assigned a value between -7 and 7.  The best team in the AL received a 7, second best a 6, and so on.  A score of 0 is average and -7 is the worst.  Here are the results:

Dashboard 1

As an inexperienced embedding artist, I feel obligated to include this link, which should work if the above chart is not working in this window.

Immediately, one thing popped out at me. The Royals are 1st in Contact. They also are 1st in Fielding. This is good, since they project to be dead last in Discipline and Power. These facts going together really is odd. For the most part, teams fit into more general molds. The White Sox and Twins are below average in everything. The Yankees, Red Sox, and Rangers, are below average at nothing. The Rays and A’s are, to no one’s surprise, copying each other with good Discipline and Defense.

In fact, outside of the Royals, there isn’t another team who is 1st or 15th in any 2 categories, and Kansas City did it in all 4. To figure out how they got here, let’s look at some of the ways they stick out from the rest of the league.

In 2013, the American League had a 19.8% strikeout rate. Of all the Royals’ projected starters in 2014, Lorenzo Cain had the highest 2013 K% at 20.4%. Alex Gordon sat at 20.1%, and you won’t find anyone else above 16.1%. Not satisfied with an overall team strikeout rate about 3 points lower than the league average in 2013, the Royals went out and acquired Omar Infante and Nori Aoki this offseason, whose respective rates of 9.1% and 5.9% ranked 8th and 1st among all hitters with 400+ PA last year. It’s obvious why the Royals batting average is supposed to be 8 points higher than the 3rd best in the league. They put the ball in play.

Unfortunately for them, putting it in play is about as much as they can do. They’re the least likely team in the AL to be clogging up bases with walks, and they’re the least capable team to drive in runs with power.

In 2013, the American League had an average Isolated Power of .149. Alex Gordon led the Royals with his .156 mark. And that was it for the above average power hitters. Even Designated Hitter Billy Butler couldn’t muster up anything better than a .124. The team’s ISO was .119, which won’t be affected dramatically by the arrival of Aoki and Infante, whose ISO’s averaged out to .108, but who replace weak-hitting positions for the Royals.

Oh, and for discipline: they don’t walk. They don’t like it. GM Dayton Moore got in trouble for saying something dumb about it, and the data suggest Manager Ned Yost may not have been aware they existed when he played. To the Royals’ credit, they did acquire Aoki, whose 8.2% rate last year was ever so slightly higher than the AL average of 8.1%. Omar Infante’s rate was just above 4, though, and their 6.9% team rate probably won’t be much better this year.

Lastly, fielding. Kansas City could flat out field, winning 3 Gold Gloves, and saving a mind-blowing 80 runs according to UZR. That number, more than double (!!!) anyone else in the AL in 2013, was the 2nd highest UZR ever in the AL, trailing only the 2009 Mariners. Those 80 runs are almost sure to decrease in 2014, but there’s little reason to argue that any other team in the AL will be expected to save more runs with the glove this year.

Overall, the Royals offense could be nuts in 2014. They won’t strike out, and will put the ball in play. There won’t be many other ways they get on, and they won’t be hitting the ball out of the park much. If last year is any indication, they should save some runs for their pitchers when they’re out in the field. No matter how they turn out this year, there’s one thing to remember. If you’re watching a team effort from Kansas City, there’s a decent chance that no one in the rest of the league is doing it better. There’s also just as good a possibility that everyone is.


What’s the Value of a Home Run These Days?

Let’s face it, people love the home run. It’s why players like Mark Reynolds can find jobs. These days, we aren’t surprised when we see a couple of home runs in one game. It wasn’t always like this, however. Home runs used to be a rarity among baseball events. In the early 20th century, it wasn’t uncommon for a player to lead his league in baseball by hitting 10-15 home runs. This brings me to the question: how has the home run actually changed? Not in terms of its frequency, but in terms of its value. More specifically, its value in runs. To approach a solution to this question without arduously parsing through hundreds of event files, we must find a way to mathematically frame the game of baseball in a way that encourages simplicity but doesn’t lose the most familiar parts of the game.

Markov Chains

The first batter of the game steps to the plate and sees no runners on base with none out. He pops up. The second batter steps to the plate and sees the immediate result of the last at bat: an out. The second batter walks. The third batter then sees the immediate result of the second batter’s at bat: a runner on first base. The stream of batters stepping to the plate and being placed into a state resulting from the previous batter’s at bat exemplifies the nature of a Markov Chain. When a batter steps into the batter’s box, his current state (whether it be an out situation, a base situation, or a base-out situation) is only dependent on the previous batter’s state. This is known as the Markov Property. Using this structure, we can simulate any baseball game we’d like. However, to keep our calculations simple, we should introduce some new rules.

The Rules of the Game

  1. A batter can only attain a BB, HBP, 1B, 2B, 3B, or HR.
  2. Outs only occur via a batter getting himself out.
  3. Anything other than the events from 1) is assumed to be an out.
  4. When a batter gets a hit, the runners on base advance by as many bases attained by the batter (e.g. a double with a runner on second will score the runner on second).

These are the rules of the game. There are no stolen bases, no scoring from second on a single, and no double plays. We have stripped the game down to only its essentials, while implementing certain changes for our own convenience. For our purposes, we don’t care about Mike Trout‘s 33 stolen bases, only the fact that he mainly attains his bases through the events we allow.

The Out Chain

We assume that the probability of a batter getting a single at any point during a season is the number of singles he gets for the season divided by his plate appearances. We do this for the probabilities of all our desired events. By doing this, we can construct a simple Markov Chain where players step to the plate and find themselves batting with 0, 1, or 2 outs. We find that this chain is irreducible, meaning that each state (0, 1, or 2 outs) eventually leads to every other state. This, and the fact that we are dealing with a finite number of states, leads us to the existence of a probability distribution on our state space of outs. It so happens to be that when a batter starts his at bat, he does so with an equal probability of seeing 0, 1, or 2 outs, i.e. the probabilities of a batter seeing 0, 1, or 2 outs when he comes to the plate are all 1/3. The knowledge that outs are uniformly distributed over our game allows us to construct probabilities for a more complicated chain that should shed light on our original question.

The Base Chain

We now place our focus on the stream of batters who see a certain base situation when they step to the plate. The transitions of base situations are dependent on the out situation, as can be seen when a batter bats hits with 1 out versus 2 outs. Batting with 1 out, if the player makes another out, then the base situation stays the same for the subsequent batter. If he does this with 2 outs, however, then the inning is over and the base situation reverts to the state where no one is on base in the next inning. Fortunately, we know that the probability of a batter seeing any number of outs when he steps to the plate is 1/3. In a similar manner to the Out Chain, we find that every state in the Base Chain leads to every other state. The “runners on the corners” state eventually leads to the “bases loaded” state, which eventually leads to the “bases empty” state, and so on. Since there are finitely many base situations, we are led to a stationary probability distribution on the state space of base situations. That is to say, there is a probability associated with a runner stepping to the plate and seeing the bases empty, and another for seeing a runner on first, etc.

Results

Using this method, a player in our universe who stepped to the plate in 2013 saw the bases empty with an approximate probability of .467. That same batter saw the bases full with a probability of .103 and one runner on first with probability .210. If a team managed to load the bases, they’d find that they generally had to wait about 10 more plate appearances before they next loaded the bases. If they put runners on the corners, they generally had to wait 42 more plate appearances before they did so again. All of this leads us to some of our final conclusions. In the context of our rules, the expected number of runners on base in 2013 was .908, meaning that the expected value of a home run was 1.908 runs. This method generates home run values that are always between 1.8 and 2.2 runs. The following is a table of all of the expected home run values this method generates from the seasons of the last 25 years:

In the last 25 years, we predict that a home run had the greatest value in 1999, at 1.972 runs. This is a reflection of the heavily offensive environment of the season, when big bats such as Sammy Sosa, Mark McGwire, and Barry Bonds were getting on base at staggering rates. The following is a graph of all of the home run values the system predicts from 1884 through 2013:

We see that this system predicts home runs to have been of more value from around 1889 to 1902, when the home run hovers at around 2.00-2.15 runs. While most players of this generation weren’t hitting home runs, they were certainly getting on base often. In 1894, 38 players had on base percentages greater than or equal to .400, compared to 7 players in 2013. When on base percentages are higher, more people are on base, and this increases the expected value of the home run. Under our restrictions, however, the home run hasn’t been worth 2.00 runs since 1950 and these days it fluctuates between 1.90 and 1.93 runs. While these estimates are all under the umbrella of rules and assumptions, this framework allows us to more easily generalize the game of baseball while preserving its most important aspects. It’s this framework that gives us the power of estimating that, while Chris Davis‘ 53 home runs were probably worth 101 runs in 2013, they may have been worth 114 in 1894.


Building a “Smart” Team from Scratch – What Would You Do?

If you had a team that was in complete or semi- “rebuilding” mode, and you wanted to start quite nearly from scratch, and implement some of the smartest analytical techniques into your team philosophy, what might you do?  In the rest of this article, I detail some examples of what said hypothetical team might want to do.  I assume that the team has a middle-of-the-road farm system and an average operating budget, and that they want to accrue wins as efficiently and cost-effectively as possible.  I also assume that they have installed all of the state of the art ball and player tracking systems in their major and minor league ball parks that they possibly can.

What’s first?  Well, the ballpark.  Build the field to have a lot of foul territory–mimic the current Oakland A’s stadium.  Even though park factors seemingly have no effect on wins, I think mimicking the A’s would be a good choice for cost efficiency.  This move would allow you to stockpile high FB% pitchers who are going cheap nowadays.  It would enable you to take cheap, mediocre pitchers–the price for pitching is getting out of control nowadays–and give them a chance to put up great numbers.

Next, infield shifting–do it more.  No one shifted more than the Orioles last year, and studies have shown, along with even player anecdotes, that there should be even more shifting done than the O’s did.  Use opposing batter spray charts to determine where and when to shift, and do it as much as possible.  You might even look to hire more multi-position eligible players as they might find it easier to handle shifting abilities.  Ben Zobrist might be the most important player for the Tampa Bay Rays, defensively.

Next, develop and train hitters who can pull the ball with power.  It would be nice if your team was full of guys with all-fields power, but they are more rare, and thus more expensive.  Start teaching them to bunt well from the minors in order to be able to beat the eventual shifts they will see in the majors.  Hire the foremost bunting coach in the world for your staff.

Pitch framing–teach it from the minors and don’t let players like Jose Molina get signed by the Rays for so cheap money.  If possible, make clones from Molina DNA.

Keep your best relievers in the 7th, 8th, or high leverage situations only.  Sign a cheap closer each year from the scrap heap and watch him go to another team the next year as a free agent!  Game the system to keep your best young relievers stuck at a low price.  Their low save totals will help keep their arbitration numbers down.

Try to sign your best young players to long term deals.  The more Dustin Pedroias you can accrue the more payroll flexibility and WAR you will have at your disposal.  This one is easier said than done.  But if you can pull it off, you will make your team more attractive for incoming free agents.  And don’t be afraid to commit long-term to speedy players, as the data seems to say they age well.  The more tools a player has, obviously, the less risk his contract is if one of the tools breaks down.

Speaking of signing free agents, try to stay flexible in your 5th SP or 4th OF spot.  It seems like there are always guys left over at the end of the FA signing season who are forced to sign bargain contracts–Ervin Santana and Nelson Cruz, for examples from this year.  Try to find cheap platoon solutions when you have a player who struggles against a certain type of pitcher.

At the end of the day, this article is just a collection of some of the ideas that a mediocre team could implement to try to win now and for the near future.  Many teams are already implementing some of these ideas.  If you have any further “smart” hacks that you think should be the gold standard for teams looking to improve in a cost-efficient manner, I’d love to hear it in the comments section.


Assessing George Springer’s Contract Situation

As many should know by now, George Springer is a highly touted and talented player, a huge part of the Houston Astros’ future. He has been on the brink of 40 home runs and 40 steals in both 2012 and 2013, shown that he could maintain his offensive output through each level of the minor leagues, and plays a premium position (center field) at an above average level; if he were to fix strike out issues, Springer would be seen as a top-10 outfielder in the American League coming into 2014.

Given the rules surrounding service time in the majors and the rapidly ascending price tags for premier talent, the Astros would have been perfect content with keeping Springer in Triple-A Oklahoma City to begin the season. Springer was with the major league team for Spring Training and in 31 at-bats was not terribly successful with only a .161 BA, but with much improved strikeout to walk ratio of 11 strikeouts to 8 walks; this should have firmly signaled a demotion to Oklahoma City. The Astros, though, tried to change it up a bit and offered Springer the richest contract for a player with his experience, a 7 year/$23 million contract.

Springer’s representatives countered that the 3 years of arbitration that the Astros were buying out were worth more than the $7.6 million per year that were essentially to be bought out, rebutting that he would be worth closer to $10 million in arbitration. Springer’s declining of the Astros offer sent him back to minor league camp and he will now come to Houston sometime this summer when he is able to move his arbitration clock forward. There are a few questions that arise from this valuation of Springer and also Springer’s decision to not accept the offer.

The most rudimentary, yet essential, aspect to look at in regards to Springer is what the Astros valued George Springer as for the last 3 years of his contract. For better or worse, the contract that the Astros offered was geared towards buying his arbitration so it is not fair to value this contract at $3.3 million a year because it is a ridiculous premise that the Astros assumed he was worth that money now. In fact, arguably the best player in baseball, Mike Trout, was only valued at $1 million a year pre-arbitration and it is a difficult argument to make that Springer is worth more than 3 times of Trout.

That being as it is, this contract should fairly be valued at $7.6 million for the 28-30 year old seasons for Springer. To assess what Springer’s price tag would buy the Astros on the open market, a thorough analysis of trends of free agent outfield salaries from 2006-2013 needed to be conducted. This analysis looked at all outfielders that were signed for $6-$10 million per season. A quick analysis of the data shows that a 34 year old outfielder with a 2.5 WAR would get roughly the same amount of money on the open market as Springer would have received in the proposed extension. Furthermore, out of the 24 player sample, only Cody Ross and Coco Crisp have been better since they signed for a similar amount as what was offered to Springer and each was over the past two seasons so there is very little of an inflation factor.

There are a lot of outside factors and reasons why these players received the amount of money that they did and there is also the fact that $23 million would represent the most money given to a player with minimal experience, Evan Longoria received $17.5 million in 2008; a lot of the liability of this deal was in the hands of the Astros, as Springer is more of an unknown than a proven commodity. The Astros are at a position with their franchise where they would take this liability; Houston is one of the strongest markets in the US, there is a new ownership group in place that has shown a willingness to spend, and by 2017 the team expects to contend. To take it a step further, the team is almost willing to take nearly a $10 million financial hit, assuming that the team is not successful on the field for the 2014-2016 seasons, just to be able to save that money for 2017 and 2018.

The final point may be where Springer’s agents had flawed logic; they are looking at the best wishes of their client as well they should, but in fact this may be a good deal for Springer. The Astros have shown that they are building for the future and are not going to spend money to be decent — there are many that see the Astros as tanking but really they are looking at their present day weaknesses and making them future strengths — so the team spending $10 million on a prospect while the team is still developing should be taken as a huge victory. Springer’s agents are right in assuming that he is going to lose money in arbitration and, if he did sign that contract, he would have been a free agent at 30 years old which is outside of his athletic prime of 27 years old.

This may be outside of his athletic prime, but definitely not outside of his financial prime. Since 2000, fourteen outfielders have earned in excess of $17 million over at least five years of a contract.  Assuming that Springer produced about 4.5 WAR per year from 2014-2018, a common projection for Springer, he could earn $21 million per season in the open market according to the averages set by those player’s contracts.  There are tremendous issues in that valuation of Springer, namely a shift in the market and a regression in Springer’s talents, but judging by recent trends, Springer would still be very well on his way to being one of the wealthiest outfielders in baseball history.

This contract is an outlier and was a fantastic idea for the Astros in attempting to fund their future and assure that Springer was a well paid player for his production. The Astros knew that Springer would outplay that contract and may very well have had provisions in the contract for enhanced performance, but at the onset, all of the liability was on Houston. There is very little reason why Springer needed to be on the 2014 Astros and the team’s financial shrewdness and outstanding player development are main reasons why pundits predict a bright future for the team; Springer was offered this contract as a statement towards the future and it may be short-sighted for Springer to have declined this fair offer. As seen with the case of Longoria, players that outgrow their contracts are usually paid handsomely by their parent teams. By signing this deal, Springer would have opened the pipelines for better negotiations between himself and the team. Declining the deal may be good for Springer in the short term but may be a major blow for negotiating a massive extension for the future.


Talkin’ About Playoffs

While watching the playoffs last October, I realized that I had never seen rookies play such a prominent role in the postseason before.  Pitchers like Michael Wacha, Gerrit Cole, Hyun-Jin Ryu, and Sonny Gray propelled their teams into contention during the regular season, and took the hill in multiple elimination games.  The inimitable Yasiel Puig had a similar impact on the Dodgers’ fortunes in 2013.

This observation led me to investigate rookie performance during the 2013 regular season.  Were rookies contributing to the success of their teams more so than in the past?  Were rookie pitchers outperforming rookie hitters?  How about rookies on playoff teams versus non-playoff teams?

Using WAR data from Baseball Reference (sorry, guys) I measured rookies’ contribution to overall team success in 2000-2013, defined as rookie WAR divided by their team’s WAR.  A few definitions before jumping in to the findings:

  • Rookies are players who have accumulated less than 130 AB (or 50 IP) and less than 45 days on an active roster prior to their rookie season
  • For consistency across time, teams that won the second wild-card slot in 2012 and 2013 are not considered playoff teams (u mad, Reds and Indians fans?)
  • Rookie pitcher WAR = amount of WAR created by a team’s rookie pitchers
  • Rookie pitcher share of WAR = % of a team’s WAR created by rookie pitchers
  • Rookie batter WAR = amount of WAR created by a team’s rookie batters
  • Rookie batter share of WAR = % of a team’s WAR created by rookie batters
  • Rookie total WAR = Rookie batter WAR + Rookie pitcher WAR
  • Rookie share of total WAR = Rookie pitcher share of WAR + Rookie batter share of WAR

In chart 1, rookie share of total WAR for the average team in 2013 (11.3%) is above the long-run average of 8%, and was only exceeded in 2006 (12.7%).  But there was no discernible difference in rookie share of total WAR between the average playoff team (10.9%) and non-playoff team (11.4%) last season.  So far, it would appear as though I need to adjust my TV.

The data becomes more interesting when the average team’s rookie share of total WAR is decomposed into pitcher and batters’ contributions (chart 2).  There is a rapid rise in rookie pitcher share of WAR between 2010 and 2013, peaking last season at 6.7% of the average team’s WAR.  This increase was so strong, it more than made up for a decrease in rookie batter share of WAR during the same timeframe, from 6.5% in 2010 to 4.6% last season.

These trends become starker when the analysis is limited to playoff teams (chart 3).  On the average playoff team in 2013, rookies provided 10.9% of WAR, a step down from the high reached in 2012.  But there is still a huge rise in rookie pitcher share of WAR between 2010 and 2013, to 8.7% last season, and a concurrent decrease in rookie batter share of WAR, to 2.2%.  In other words, 80% of the average 2013 playoff team’s rookie total WAR was generated by pitchers.  If not for a certain Cuban-American hero with a penchant for bat-flipping, that share would have been even higher.

But some evidence, as well as anecdotal observation, suggests that pitchers in general have become more dominant over the past few seasons.  Is this trend, observed so far among rookies, true of all pitchers?  Over the past fourteen seasons, the average team has generated between 36-44% of WAR from pitchers (chart 4).  This share has been consistent over time, and has edged up only slightly during the past few seasons.  This suggests that rookie pitchers, especially those on playoff teams, really did excel in 2013.

Now, let’s look at just how good the rookie pitchers on playoff teams were last season (chart 5).  Together, the 54 rookie pitchers on 2013 playoff teams generated 29.6 WAR, which is slightly higher than last year’s total (29.1 WAR) and much higher than the long-run average (16.0 WAR).  What’s even more impressive is that last season, 57% of all 30 teams’ rookie pitcher WAR was generated by the rookie pitchers on playoff teams, a higher share than in any other season since 2000.  Cumulatively, 54 rookie pitchers on 8 teams outperformed 151 rookies on 22 teams.  Not bad.

But wait…there’s more.  By focusing on the best rookies on playoff teams (arbitrarily defined here as those who generated 1+ WAR), we see that there were 20 such players last season (chart 6).  Of that number, 16 were pitchers, like Shelby Miller, Hyun-Jin Ryu, and Julio Teheran.  Five of those pitchers were on the Cardinals (Miller, Siegrist, Wacha, Rosenthal, and Maness.)  The concentration of top rookie pitchers on playoff teams last year is the highest in at least fourteen seasons.

My initial observation, “Wow, there are lots of rookie pitchers killing it in the 2013 playoffs!” looks to be borne out in the data.  This raises two other interesting questions:

1.  For any of last year’s playoff teams, did rookie pitchers provide enough value to get their team into the playoffs?

2.  Is the rookie pitcher observation a one-time anomaly, or indicative of a larger trend?

The first question is relatively easy to answer.  We can compare each playoff team’s rookie pitcher WAR (essentially, how many more games the team won because of rookie pitchers) to the number of additional games each playoff team could have lost and still made the playoffs without tying a second-place team (let’s call this the buffer). 

For four out of eight playoff teams (again, I exclude the second wild-cards), rookie pitcher WAR is higher than the buffer (chart 7).  But since Detroit and Tampa made the playoffs by one game, and since Pittsburgh’s rookie pitcher WAR is less than one game higher than the buffer, it’s hard to argue that rookie pitchers definitively moved the needle for them. Andy Dirks or Yunel Escobar could have just as easily gotten their teams over the hump, since they also created more than 1 WAR.

The Cardinals are the one team whose rookie pitchers probably got them into the playoffs.  They got 9.7 extra wins from their rookie pitchers (almost 23% of the entire team’s WAR), and made the playoffs by 6 games.

The second question is harder to answer, since the 2014 season hasn’t started yet.  There’s no clear reason why rookie pitchers on playoff teams would suddenly start playing extremely well, especially since it doesn’t look like they’re causing their teams to make the playoffs.  The likeliest explanation is that the top teams in the league happened to have outstanding rookie pitchers last year.  Sometimes, “stuff” happens.

But if you want to prove me wrong, and show that last year’s playoff teams have developed great farm systems capable of producing more top rookie pitchers, pay close attention to what Jameson Taillon (Pirates), Carlos Martinez (Cardinals), Jake Odorizzi (Rays), and Allen Webster (Red Sox) bring to the table in 2014.  All four pitchers are on Baseball America’s list of top 100 prospects, are on last year’s playoff teams, and are projected to crack the majors this season.  If they get off to a hot start, and if they help their teams return to the playoffs, I might have to revisit my conclusion next winter.