Archive for March, 2013

Measuring a pitcher’s ability, performance, and contribution

I’d like to share some of my thoughts and research on how we evaluate Major League Baseball pitchers. I think for the most part when we use statistics to discuss a pitcher, we are really looking at the pitcher from one or more of the following three perspectives: 1) ability, 2) performance, and 3) contribution. Before I get into my research, I will take a moment to describe what I mean by each of the three terms.


When I use the word ‘ability’, I am describing the physical and mental skills the pitcher has at his disposal. Some examples of ability are: how hard he can throw, what kind of movement he has on his pitches, how well he can locate, how well he mixes his pitches, etc. With the introduction of PitchFX, we are now capable of measuring ability better than ever before. With that being said, it is still difficult to accurately and meaningfully quantify many aspects of ability. Since a pitcher’s performance is based at least in part upon his ability, performance statistics can sometimes be used as a substitute for direct ability measures.


Performance literally describes how well a pitcher performed. In other words, it refers to the outcome or outcomes resulting from that pitcher throwing pitches. Nearly all baseball statistics describe performance. Some statistics measure a pitcher’s individual performance fairly well, whereas others combine the pitchers performance with the performance of his team and other factors. For example, ERA is generally not considered a great measure of a pitcher’s individual performance; however, FIP is considered a better measure of individual performance.


I have not found much reference to the word ‘contribution’ in the baseball literature, but I do think it is an important concept to consider. Contribution is a word I use to describe a pitcher’s contribution in helping his team win baseball games. By this general definition, I suppose ERA (and other performance measures) could also be considered a contribution measure in some respects, since wins are related to runs allowed. Therefore, I also propose that the relationship between ability, performance, and contribution is not divided by solid lines but is instead a spectrum where each statistic can be considered somewhat a part of each category. However, in an attempt to clear up this somewhat murky discussion, I will offer stats such as W-L, WAR and WPA as the most obvious contribution stats*.

*Note: Contribution stats can be measured directly (ie. W-L) or derived from performance stats (ie. fangraphs WAR is derived from FIP).


Now on to my research… The hypothesis that drove this work was: pitcher ability measures are more consistent between seasons than performance or contribution. This hypothesis is based on my belief that unlike performance and contribution, which are affected by countless outside factors, a pitcher’s ability is within himself and therefore less likely to dramatically change between seasons.

To test this, I took each pitcher that pitched a minimum of 120 innings in each season from 2008-2011. This gave me a pool of 63 pitchers.

For my ability measure, I took the statistic whiff/swing. I like this measure of ability because to me it is the simplest measure of an isolated part of a pitcher’s ability. Since the batter has already decided he will swing, we are only looking at the pitcher’s ability to throw a ball that will evade a hitter’s bat. I know ability to hit the ball is also heavily dependent on the hitter’s ability, but I think that using pitchers that pitched 120 innings in each season will let me take the individual batter out of the equation and use this as a measure of pitcher ability.

For my performance measures I used ERA and FIP from FanGraphs. I agree ERA is not the best performance measure, and may be considered more of a contribution; however, I have included it nonetheless. Finally, for my contribution measure I decided to use FanGraphs WAR.

I calculated the average whiff/swing, ERA, FIP, and WAR for each pitcher of the four-year period. I also calculated the standard deviation within each pitcher for each stat and the within pitcher coefficient of variation (stdev/avg). Coefficient of variation is the best way to report the variability of each statistic over the four seasons because it effectively normalizes each stat by the units they are reported in.

Globally, over the four-season period the 63 pitchers in my group had an average:
whiff/swing = 0.205
ERA = 4.03
FIP = 3.97
WAR = 3.08.

The average within pitcher coefficient of variation was:
9.6% for whiff/swing
18.5% for ERA
12.0% for FIP
and 47.7% for WAR.


So what does this mean? Well, I know this is just a start, but based on this I believe my hypothesis was correct. A pitcher’s ability is much more consistent between seasons than their performance and/or contribution. Furthermore, performance is more consistent than contribution. It appears as though the further you get from pure ability measures the more difficult it will be to accurately/reliably predict a pitcher’s future performance and contribution. I’d like to do some further research on performance prediction to confirm this but, my guess is that trying to predict future WAR from past WAR will be extremely difficult. Perhaps predicting future WAR from past ability measures may prove to be more effective.

Bill “Moneyball” Veeck

I was sitting on a park bench reading Veeck as in Wreck, the memoir of legendary ballclub owner Bill Veeck, when I came across this passage:

Ken Keltner, our third baseman and one-time power hitter, had a miserable season in 1946. There seemed little doubt that he was on the downgrade. Still, when I signed him for the next year, I gave him the same amount of money and told him that if he had what I considered a good year I’d give him a bonus of $5,000.

The next year, Kenny hit the ball better than anybody on our club, with less luck than anybody in the league. If you walked into the park late and saw somebody making a sensational leaping, diving backhanded catch, you could bet that Keltner had hit the ball.

On the last day of the season, he was hitting under .260 and had driven in around 75 runs. I called down to the locker room, got him on the phone, and said, “Hey, where have you been? Weren’t you supposed to come up and see me at the end of the season?”

“I didn’t win anything,” he said. “I’m having a lousy season.”

I suggested that he wander up anyway. As he came through the door I said, “I’ve got $5,000 for you.”

And he said, “I didn’t earn it, Bill.” And he started to weep.

“You hit the ball better than anybody else on this club,” I told him. “It wasn’t your fault they kept catching it.”

As a loyal FanGraphs reader, I immediately thought: BABIP! For those who need a quick reminder, batting average on balls in play (BABIP) measures just that: batting average on balls hit somewhere the defense can get to them. It’s expected that BABIP will generally hover around .300, modified by such factors as the enemy defense (this averages out over a season), whether the balls you hit go over outfield fences, and, most of all, luck.

Now, Veeck’s comment that Keltner “hit the ball better than anybody else” was probably a kindness rather than a hypothesis. But his observation that “they kept catching it” checks out. I looked at the leaderboard for the BABIPs of every qualifying hitter in 1947. Sure enough, Ken Keltner’s down near the bottom, ranking 68th of 86 with a BABIP of .264. The median that year was almost thirty points higher: .292.

Ken Keltner had lousy luck, but was still an average hitter (102 wRC+). And the next year was the best of his career (7.9 WAR), so it looks like Bill Veeck saw the Keltner case exactly right. Only there’s a twist. One of Veeck’s 1947 Indians had it even worse. Down there at 74th is the .256 BABIP of Joe Gordon. Joe Gordon slugged 27 doubles, 6 triples, and 29 home runs, so things turned out well for him, but if Veeck’s latecomer had bet that “a sensational leaping, diving backhanded catch” was on a ball hit by Ken Keltner, you’d want to bet against him. Joe Gordon’s luck was worse; he compensated by putting more balls in the outfield bleachers.

There’s weirder to come. Dead last, 86th of 86, is Roy Cullenbine, Tigers first baseman, who paired a grotesque .206 BABIP and .224 average (83rd of 86) with the second-highest walk rate in baseball. His 22.6% walk rate was topped only by Triple Crown winner Ted Williams. (By the way, in the previous year, Williams had been introduced to the defensive shift, as pioneered by, yes, Bill Veeck’s Indians.)

No player in 2012 came close to matching Cullenbine’s bizarre season. The lowest BABIP of any qualifying hitter in 2012 was .242 (Justin Smoak); of all hitters with BABIPs below .256 (fifty points higher than Roy Cullenbine’s), none came within fifty points of Cullenbine’s .401 OBP. The best analogy is this: Cullenbine hit for average like Dan Uggla, had Justin Smoak’s luck, and still drew walks at the rate of Barry Bonds.

Roy Cullenbine was only 33 in 1947, and in past years his offensive numbers were impressive. Had he been on Bill Veeck’s Indians instead of playing for the Tigers, his unlucky 1947 might have ended as Ken Keltner’s did,with a $5,000 bonus. The Tigers, not valuing Cullenbine’s patience, released him, and he never played a major-league game again.

There’s another interesting name among the ten unluckiest batters of 1947. Coming in at sixth-worst, with a BABIP of .247, is a patient slugger who got on base even more than Cullenbine did, with four more walks than he had hits. He too retired after the season. His name was Hank Greenberg, and that winter he accepted a job in a major-league front office, where he was groomed to be the team’s next general manager. The team was the Cleveland Indians. His new boss was Bill Veeck.

Can the WBC be fixed?

While this year’s iteration of the World Baseball Classic has certainly experienced success, it does not have the juggernaut status that the Football World Cup or the Olympics currently hold. While the Classic will probably never approach the success these two international tournaments have, it does have the potential to spread baseball interest and expand the game around the world, particularly in places like Europe or China. In order for baseball to grow, it has to reach new fan bases outside of the United States, which appears to be at the max of its potential. The WBC is a nice touch to baseball’s international growth, but it needs a few modifications to truly reach its potential.

The problem with the current round-robin format is the attendance figures and interest level with the games involving two lesser-known countries. In pool A, three out of the six games drew less than 5,000 fans, while the other three had more than 10,000 fans each, and two drew more than 25,000. The attendance figures in pool B were even more extreme. three of the games drew less than 2,000 fans, while the other three drew more than 20,000 each. To combat this problem, there have been numerous suggestions about modifying the tournament to turn it into a single elimination format, as Dave Cameron suggested in his post “Fixing the WBC”. This format is definitely the best option for the tournament, as it would increase the interest and attendance in each game given the win or go home nature atmosphere. Hopefully, since all the games would pit a high-seeded team against a low-seeded team, the low-interest games of less than 5,000 fans would be eliminated.

The other advantage to the single-elimination tournament is the elimination of the silly WBC rules and tiebreaking procedures. Run differential would no longer be the difference between advancing out of a pool and going home. The pointless games to determine seeding at the end of the second round would also be eliminated. Perhaps the pitch limits would go away as well because teams would play fewer games. The tournament would no doubt gain some relevancy if the silly rules and restrictions were eliminated.

Most of the potential changes to the WBC involve shortening it to a week or so. While most would agree that the current format is too long, MLB might not bite on a change that shortens the tournament to a mere week. The solution: why not expand the number of teams to 32? The current 16 teams would stay, and all the teams that participated in the qualifier would be added as well. That adds up to 28 teams. I wasn’t really sure what the four other teams could be, so I came up with Pakistan, Russia, Belgium, and Austria. I’m sure there might be better teams out there, but let’s proceed with these four teams to make it easy. To determine the format, I divided the tournament into four conferences: Northwest, Euro, East, and South:

East:                                                    South

  1. Japan                                    1. Venezuela
  2. South Korea                        2. Australia
  3. Taiwan                                 3. Brazil
  4. China                                    4. Colombia
  5. Israel                                     5. South Africa
  6. Czech Republic                   6. New Zealand
  7. Pakistan                               7. Philippines
  8. Russia                                   8. Thailand

Euro:                             Northwest:

1. Netherlands                     1. Dominican Republic
2. Italy                                    2. United States
3. Spain                                  3. Puerto Rico
4. Germany                            4. Cuba
5. United Kingdom              5. Canada
6. France                                6. Mexico
7. Belgium                              7. Panama
8. Austria                               8. Nicaragua

The current March timing for the WBC works OK, but it’s not perfect. The All-Star break doesn’t work either because MLB would never agree to nix the “beloved” event. That leaves the winter. I’m not sure the middle of the winter makes sense because the offseason is in full swing and free agents wouldn’t want to do it in fear of getting injured. That leaves November and February. Both of these times make sense to me, but I think the players would be less than thrilled to participate right after the postseason. That leaves February. The absence of football is a plus, and players wouldn’t have the excuse of spring training to avoid participation. Assuming that Spring Training starts March 1, here are some potential dates:

February 14: 4 East First Round Games

February 15: 4 South First Round Games

February 16: 4 Euro First Round Games

February 17: 4 Northwest First Round Games

February 19: 2 East Semifinal Games

2 South Semifinal Games

February 20: 2 Euro Semifinal Games

2 Northwest Semifinal Games

February 22: East Final Game

South Final Game

February 23: Euro Final Game

Northwest Final Game

February 25: East Winner vs. South Winner

February 26: Euro Winner vs Northwest Winner

February 28: Final Game

The close proximity of these games might require them to be played in a single country as opposed to the international format used now. I’m not really sure how many countries could host the two-week tournament besides Japan and the United States. Perhaps Japan and the US could alternate until other countries become viable alternative solutions. Or the regional tournament games could be held in that specific region and the winners could meet up for the semis somewhere else, like the current format. It would be great if European countries or other big countries like India could host the WBC, but currently it doesn’t seem likely.

Overall, this format offers some significant advantages to the current one. This version of the classic would have 31 games, only eight less than the current format, which would appeal to MLB because the new version could generate a comparable amount of revenue. However, individual teams would play fewer games, potentially attracting the big stars currently holding out. Already, we have seen players like Chase Headley, Jurickson Profar, Gio Gonzalez, and Kenley Jansen join the Classic in the later rounds when there are fewer games to play. Additionally, players competing for a job in spring training would be more enticed to join the classic because it provides another opportunity to showcase their talent to teams. The injury risk would be less because 1. there are fewer games to play and 2. players would have a longer period of time to recover from injury. Yes, baseball would start earlier, but hopefully this format would attract players the same way the World Cup does for soccer. With increased player participation, more exciting games, more teams involved, and a time frame that doesn’t compete with baseball’s own spring training, these changes make sense for MLB, the players, and most importantly, the fans.

The True Dickey Effect

Most people that try to analyze this Dickey effect tend to group all the pitchers that follow in to one grouping with one ERA and compare to the total ERA of the bullpen or rotation. This is a simplistic and non-descriptive way of analyzing the effect and does not look at the how often the pitchers are pitching not after Dickey.

I decided to determine if there truly is an effect on pitchers’ statistics (ERA, WHIP, K%, BB%) who follow Dickey in relief and the starters of the next game against the same team. I went through every game that Dickey has pitched and recorded the stats (IP, TBF, H, ER, BB, K) of each reliever individually and the stats of the next starting pitcher if the next game was against the same team. I did this for each season. I then took the pitchers’ stats for the whole year and subtracted their stats from their following Dickey stats to have their stats when they did not follow Dickey. I summed the stats for following Dickey and weighted each pitcher based on the batters he faced over the total batters faced after Dickey. I then calculated the rate stats from the total. This weight was then applied to the not after Dickey stats. So for example if Francisco faced 19.11% of batters after Dickey, it was adjusted so that he also faced 19.11% of the batters not after Dickey. This gives an effective way of comparing the statistics and an accurate relationship can be determined. The not after Dickey stats were then summed and the rate stats were calculated as well. The two rate stats after Dickey and not after Dickey were compared using this formula (afterDickeySTAT-notafterDickeySTAT)/notafterDickeySTAT. This tells me how much better or worse relievers or starters did when following Dickey in the form of a percentage.

I then added the stats after Dickey for starters and relievers from all three years and the stats not after Dickey and I applied the same technique of weighting the sample so that if Niese’12 faced 10.9% of all starter batters faced following a Dickey start against the same team, it was adjusted so that he faced 10.9% of the batters faced by starters not after Dickey (only the starters that pitched after Dickey that season). The same technique was used from the year to year technique and a total % for each stat was calculated.

Here is the weighted year by year breakdown of the starters’ statistics following Dickey and a total (- indicates a decrease which is desired for all stats except K%):

ERA: -46.94%  with 5/5 starters seeing a decrease
WHIP: -16.16% with 4/5 seeing a decrease
K%: 47.04% with 4/5 seeing an increase
BB%: 6.50% with 3/5 seeing a decrease
HR%: -50.53% with 5/5 seeing a decrease
BABIP: -14.08% with 4/5 seeing a decrease
FIP: -25.17% with 5/5 seeing a decrease

ERA: 17.92%  with 0/3 seeing a decrease
WHIP: -9.63% with 2/3 seeing a decrease
K%: -2.64% with 2/3 seeing an increase
BB%: -15.94% with 2/3 seeing a decrease
HR%: -9.21% with 2/3 seeing a decrease
BABIP: -15.14% with 2/3 seeing a decrease
FIP: -5.58% with 2/3 seeing a decrease

ERA: -23.82%  with 5/7 seeing a decrease
WHIP: 1.68% with 5/7 seeing a decrease
K%: -22.91% with 1/7 seeing an increase
BB%: -2.34% with 5/7 seeing a decrease
HR%: -43.61% with 5/7 seeing a decrease
BABIP: -3.61% with 4/7 seeing a decrease
FIP: -10.61% with 5/7 seeing a decrease

ERA: -17.21%  with 10/15 seeing a decrease
WHIP: -8.10% with 11/15 seeing a decrease
K%: -3.38% with 7/15 seeing an increase
BB%: -5.17% with 10/15 seeing a decrease
HR%: -32.96% with 12/15 seeing a decrease
BABIP: -11.04% with 10/15 seeing a decrease
FIP: -13.34% with 12/15 seeing a decrease

So for starters that pitch in games following Dickey against the same team, it can be concluded that there is an effect on ERA, WHIP, BABIP, and FIP and a slight effect on BB% and on K%. There is also a large effect on HR rates which we can attribute the ERA effect to. This also tells us that batters are making worse contact the day after Dickey.

So a starter (like Morrow) who follows Dickey against the same team can expect to see around a 17.2% reduction in his ERA that game compared to if he was not following Dickey against the same opponent. For example if Morrow had a 3.00 ERA in games not after Dickey he can expect a 2.48 ERA in games after Dickey.

So if in a full season where Morrow follows Dickey against the same team 66% of the time (games 2 and 3 of a series) in which he normally would have a 3.00 ERA without Dickey ahead of him, he could expect a 2.66 ERA for the season. This seams to be a significant improvement and would equate to a 7.6 run difference (or 0.8 WAR) over 200 innings.

Here is a year by year breakdown of relievers after Dickey (these are smaller sample sizes so I will not include how many relievers saw an increase or decrease):

ERA: -25.51%
WHIP: -1.57%
K%: 27.04%
BB%: -49.25%
HR%: -34.66%
BABIP: 30.23%
FIP: -38.34%

ERA: -17.43%
WHIP: 8.45%
K%: 6.74%
BB%: -5.14%
HR%: 7.34%
BABIP: 9.75%
FIP: -2.05%

ERA: -2.55%
WHIP: 7.69%
K%: -9.28%
BB%: 10.84%
HR%: 2.11%
BABIP: 4.23%
FIP: 9.43%

ERA: -16.61%
WHIP: 5.38%
K%: 7.50%
BB%: -12.65%
HR%: -8.53%
BABIP: 13.38%
FIP: -10.40%

As expected there was a good effect on the relievers’ ERA, FIP, K%, and BB%, but the WHIP and BABIP were affected negatively. This tells me that the batters were more free swinging after just seeing Dickey (more hits, less walks, more strikeouts).

So in a season where there are 55 IP after Dickey in games (like in 2012) there would be a 16.6% reduction in runs given up in those 55 innings. If the bullpen’s ERA is 4.20 without Dickey it can be expected to be 3.50 after Dickey. Over 55 IP this difference would save 4.3 runs (or 0.4 WAR).

Combine this with the saved starter runs and you get 11.9 runs saved or (1.2 WAR). This is Dickey’s underlying value with the team that he creates by baffling hitters. This 1.2 WAR is if Morrow has a 3.00 ERA normally and the bullpen has a 4.00 ERA. If Morrow normally had a 4.00 ERA than his ERA would reduce to 3.54 over the season with 10.2 runs saved for 200 innings (1.0 WAR) and if the bullpen has a 4.00 ERA normally as well, 4.1 runs would be saved there, equating to 14.3 runs saved or a 1.4 WAR over a season.

Johnny B. Goode

Controlling the run game, pitcher fielding and ERA

Run & Glove

Johnny Cueto has been mocking his peripherals ever since his big league debut.  For the most part FIP serves as a terrific gauge for pitcher performance, but in 2011 Cueto made FIP look like a heart monitor trying to explain the weather.  On what most consider a separate note, base runners have a healthy and robust fear of Cueto’s pickoff move, which is one of the best in the show.

FIP measures outcomes a pitcher can control (home runs, walks and strikeouts) and chalks the rest up to random variation.  Studies have shown that stolen bases contribute relatively little to run creation and perhaps on that basis the ability to control the run game has generally been ignored or deemed overrated.

It is difficult, however, to ignore the six runs Cueto saved the Reds via his contributions to controlling the run game in 2012.  By contrast, A.J. Burnett’s inability to control runners cost the Pirates four runs. The typical scale is that 10 runs amount to one team win – and teams will pay about $5 million per win.

Acknowledging run game control cannot fully explain how Cueto has routinely outperformed his peripherals, just as it cannot wholly explain Pittsburgh’s inability to keep pace with Cincinnati in the NL Central last season.  It does, however, get us closer.

Incorporating a pitcher’s fielding ability proved of comparative importance in explaining and predicting performance.  Here we’ll turn to Mark Buehrle, whose glove has saved four runs per year since 2004, and among fellow hurlers the fast-working lefty has been one of the decade’s most steadily superb fielders.  FIP underestimated Buehrle in eight of the past nine seasons, slighting his ERA by an average of .30 per year over that span.


 The numbers indicate that a pitcher’s defense and ability to control the run game should both be considered in assessing and forecasting the pitcher’s value.

Focusing on seasons in which pitchers hurled 100 same-league (AL or NL) innings from 2003-2012 (n=1400), I ran a multiple linear regression to create a formula (“MBRA”) incorporating run control (rSB) and pitcher fielding (rPM) on top of line drive and infield fly ball percentages (credit to BABIP guru Steve Staude) and a regressed take on FIP.

MBRA = (55.25*HR + 14.05*BB – 8.57*K)/TBF – .041*rPM – .056*rSB + (5.71*LD – 8.27*IFFB)/(LD+GB+FB) + 2.34 


Mean Absolute Error























MBRA is engineered to properly credit pitchers who can field and control the run game.  When I subtracted MBRA from FIP to locate the pitcher-seasons that benifitted most from my formula, I was encouraged seeing Buehrle show up twice in the top ten, and five times in the top 100 (again, that is out of 1400).

Next, I looked at seasons in which pitchers threw 100 same-league innings in consecutive seasons from 2003 to 2012 (n=791).  This time I ran a regression to create a model suited to predict a pitcher’s ERA based on his previous year’s statistics.

MBRAT = (20.12*HR + 7.13*BB – 6.7*K)/TBF -.025*rPM -.034*rSB + 2.37*ZC% + 2.22



Mean Absolute Error






















MBRAT stands tall on the lofty pinnacle of public forward-looking ERA estimators, and if you factor in the percentage at which pitchers throw over the edge of the plate (EDGE%) its correlation jumps even higher (.4621).  Unfortunately, I only have Edge% data from 2008 to 2012 (n=362) and cannot yet justify its inclusion.

On Deck

I will create expectations for pitchers with fewer innings pitched and convert my findings to a WAR measure that may serve as a middle-ground between fWAR and rWAR.  I also stumbled on a potentially significant relationship between pick-off attempts and strand rates that may work its way into future formulas.