Adjusting defense efficiency by the quality of pitching

Fausto Carmona throws a hard sinker on the outside corner, but Ichiro Suzuki turns it into a well-struck ground ball by going the other way, splitting the defenders on the left side of the diamond. We know who should get credit for the single on the Mariners’ side of the box score—there was only one guy with a bat. But who on the Indians will take the blame for the single? Is it Carmona who made the pitch, or the defenders who could not get to the ball fast enough?

Bill James invented Defensive Efficiency, measuring the percentage of balls in play that a defense turns into outs. It became apparent just how useful this would be for evaluation of team defense when Voros McCracken famously concluded that, “There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.” A natural corollary to this thesis says that to measure team defense, one should use Defensive Efficiency rate.

However, since McCracken’s original thesis, the community has determined with certainty that while there is little difference between pitchers, there definitely are some major “little” differences. Following on work by J.C. Bradbury and others, I have shown that a pitcher’s ability to control the number of hits he surrenders on balls in play is well correlated with strikeout rate, walk rate and ground ball rate, the so-called “DIPS” (Defense Independent Pitching Statistics) that are not determined by the defense behind a pitcher. In fact, a pitcher’s BABIP in a given season correlates more with his DIPS the previous season than with his BABIP the previous season. In other words, DIPS predicts BABIP better than BABIP itself does.

As I close in on how to measure a pitcher’s ability to control BABIP without actually using what happened on balls in play, I have realized that I can actually see how much of team defensive efficiency is the fault of hurlers. It turns out that a large portion of defensive efficiency is pitching after all. I have shown the following to be true:

A) Pitchers who strike more hitters out give up fewer hits on balls in play.
B) Pitchers who induce fewer ground balls give up fewer hits on balls in play.
C) Pitchers who walk fewer hitters give up fewer hits on balls in play.

Using this information, I have found that the variance in BABIP among starting pitchers who pitch over 150 innings can be attributed approximately as follows:

A) 12 percent pitching skill
B) 13 percent team defense skill
C) 75 percent luck

Of the fraction that pitchers do control, you can predict about 10.4 of those 12 percent using DIPS. Yes, pitchers do exhibit some control over their BABIP, but in an entirely estimate-able way. I think this passes the smell test, too, because if I try to imagine a pitcher who you expect to limit hits on balls in play, I picture one who fools hitters into whiffing a lot too, or perhaps one who pops a lot of hitters up.

One of the most underrated aspects of SIERA is that it implicitly computes an “Expected BABIP,” by using regression techniques. Since it looks directly at expected ERA, conditional on strikeout rate, ground ball rate, and walk rate, it does not directly compute the effect of a strikeout on ERA; instead, it computes what pitchers’ ERAs will look like given their strikeout rate (and holding everything else constant). Thus, SIERA expects high-strikeout pitchers to have low BABIPs, and makes similar adjustments for ground ball rate and walk rate as well.

As I considered how individual pitchers’ DIPS correlate with expected BABIP recently, I realized that there are considerable differences among whole teams in their strikeout and ground ball rates. The 2010 Giants struck out 21.6 percent of hitters faced; the 2006 Royals struck out only 14.1 percent and unsurprisingly had a team BABIP that was 24 points higher than the 2010 Giants.

Putting this all together, I found that the variance in team defensive efficiency can be attributed roughly as follows:

A) 48 percent team defensive skill
B) 40 percent luck
C) 12 percent pitching skill

With about 4,350 balls in play per team per year, you get rid of most of the luck, so this number shrinks to just 40 percent, and of course, team defense still explains BABIP better than anything else does. However, a very large part (12 percent) of keeping a batted ball from resulting in a hit is pitching. (Put in a mathematically equivalent but different way, there is a .37 correlation between a team’s Expected BABIP based on its pitching peripherals and its actual BABIP.)

To study this more objectively, I redefined “BABIP” to include errors, and ran a regression on all individual pitchers in the majors in 2002-2011 with 80 balls in play or more, weighted by balls in play, and using net ground ball rate ((GB-FB)/PA), strikeout rate, walk rate, all of their squares and interactions, dummy variables for season, and pitcher starter/relief role.

Then I simply applied this to each individual’s pitching statistics, and came up with an expected number of batters reached per ball in play with neutral defense and luck. Then I used that to develop an expected “BABIP” (with errors) for each team.

The lowest expected team BABIP (relative to the rest of their league) belonged to the 2002 Twins, with just a .299 expected rate of reaching on balls in play, below the league average of .307. The actual Twins allowed a .297 BABIP, which means that they were good defensively and also good at pitching, resulting in particularly few hits.

Mental Health and the CBA
A particular bit of language in the latest CBA could have negative consequences for some players.

The highest expected team BABIP (relative to the rest of the league) belonged to the 2007 Blue Jays, who had a .321 expected BABIP, as compared with a .316 league average that year. The actual 2007 Jays’ BABIP was a very low .297. Their defense was actually fantastic, and their pitching made it harder and cost them the league best BABIP. Relative to their expected BABIP, their 19-point lower actual BABIP was the best in the league, but they finished millimeters behind the Red Sox. However, the Red Sox had pitchers with more strikeouts and lower ground ball rates, and their defense had a much easier battle to make outs.

Overall, there is pretty high year-to-year correlation in a team’s expected BABIP, .47, which is not so shocking since teams generally do not turn over most of their pitching staff in an offseason. This highlights the fact that one cannot look at aggregate numbers over a longer period of time to determine how teams play defense, hoping other factors will wash out; a defense can look bad for several years, when the pitchers should actually shoulder the blame.

Below I list teams by their 2011 ranking in “adjusted BABIP.” This is done by taking their actual BABIP (again, including errors as hits), and adjusting it for their expected BABIP based on their pitchers relative to the league BABIP. I also include the team’s ranking by actual BABIP surrendered, for comparison.

Of particular note is the Giants, who would have been 10th overall in BABIP, thanks to a somewhat wild pitching staff that was rather groundball prone, but still managed to make a lot of outs. Relative to the high BABIP that would have been expected given their pitching staff, the Giants actually appeared to have the fifth best defense at recording outs per ball in play.

Hurlers like Tim Lincecum, Matt CainJonathan Sanchez simply do not allow hitters to get good wood on the ball, and as a result, the defenders behind them look strong behind them when batters do make contact. On the other hand, the Diamondbacks were ranked above the Giants, at seventh, using BABIP alone, but their high-flyball stuff actually requires an adjustment to bump them down to 10th. (Again, recall that BABIP here includes ROE as hits.)

image

For all of the rankings for 2002 through 2011, see this Google Doc.

There are a number of interesting examples of teams whose defensive efficiency can be reinterpreted based on their pitching stats. The following table gives my favorite examples of teams re-interpreted using this method, some of which I describe below.

image image

The 2010 Giants were actually on the other end of the spectrum than the 2011 Giants. They had a similar high strikeout rate and walk rate, but their groundball rate was much lower, making their expected number of outs much higher, since fly balls are easier to catch.

This was partly due to Matt Cain’s groundball rate going up from 36.2 to 41.7 percent. It was also due to replacing Barry Zito’s 33 starts with a 36.1 percent groundball rate in 2010, with just nine Barry Zito starts at a 39.8 percent groundball rate in 2011, and 45.6 percent ground balls in Ryan Vogelsong’s 28 starts in 2011. They also got 15 more starts out of Madison Bumgarner, whose groundball rate was 45.1 percent in 2010 and 46.0 percent in 2011, instead of Todd Wellemeyer’s 33.5 percent groundball rate in 11 starts as they received in 2010.

In both seasons, the Giants had fantastic strikeout rates that we know correlate with less hittable pitches, and more catchable balls in play, but the groundball rate was very different in 2010 and 2011.

The 2003 Mariners were an interesting story of run prevention. A large part of their league-leading defensive efficiency was fantastic defense. They had an outfield of Ichiro Suzuki in eight (21.1 UZR), Mike Cameron in center (19.6 UZR), and Randy Winn in left (4.3 UZR), combined with an infield that featured John Olerud at first base (11.0 UZR), Bret Boone at second (10.4 UZR).

But they also had an excellent flyball staff that kept the ball catchable in the first place. Jamie Moyer had 215 innings pitched with only a 38.3 percent groundball rate, Freddy Garcia had a 41 percent groundball rate in 201.1 innings, Gil Meche had a 36.8 percent groundball rate in 186.1 innings, and Ryan Franklin had a 34.3 percent groundball rate in 212 innings. The only starter who was not particularly flyball prone was Joel Pineiro, who had only a 45.4 percent groundball rate himself.

None of these starters were particularly good at missing bats, but their extreme flyball tendencies made up the difference. When combined with their fantastic defense, the 2003 Mariners were fantastic at making outs.

The 2007 Rangers relied on their 46.5 percent groundball rate to keep opponents from scoring, which has the side effect of permitting a lot of singles. On the down side, they struck out only 15.3 percent of hitters faced. As a result, they were 22nd in the league in preventing hits on balls in play.

However, they would have been 17th if they had an average staff in terms of BABIP skill. Pitchers like Kameron Loe, Kevin Millwood and Vicente Padilla contributed to the high groundball numbers without striking enough hitters out to shorten swings and reduce BABIP.

The Nationals trailed the league at striking hitters out in 2009, whiffing only 14.3 percent of hitters. Unsurprisingly, the Nationals were 24th in defensive efficiency in 2009, but they would have been right near the middle at 19th if you adjust for their staff. John Lannan, Craig Stammen and Shairon Martis are hittable in all the ways you would expect—they do not strike hitters out and hitters make better contact with the ball as well.

The Indians took away the dubious crown for worst strikeout staff in the league in 2010 from the Nationals, and they allowed a lot of hits too. Their defensive efficiency was .316, definitely below average, but their pitching numbers suggest that it should have been .310 anyway, reapportioning most of the blame from the defense to the pitchers.

Disentangling credit between pitching and defense appeared to take a great step forward with McCracken’s discovery about pitcher BABIP control (or lack thereof), and this is assuredly one of the most important findings of sabermetrics. However, as analysts collectively step back from the extreme position that a pitcher should never be blamed or credited for his BABIP, we should also reinterpret team defensive rankings as well. A full 12 percent of variance in team defensive efficiency is directly attributable to pitching. As we always knew, there are many factors in play once the ball hits the bat.


Print This Post
Matt writes for FanGraphs and The Hardball Times, and models arbitration salaries for MLB Trade Rumors. Follow him on Twitter @Matt_Swa.
Sort by:   newest | oldest | most voted
Ryan
Guest
Ryan

I loved the article. Great stuff.

Also, would park factors come into play heavily when making these conclusions?

Nyet Jones
Guest
Nyet Jones
Great article, Matt; it’s fun to see the pendulum swing back a bit from the casual “No pitcher control over BABIP” references that get made here and there. I had a couple of questions: 1. “it does not directly compute the effect of a strikeout on ERA; instead, it computes what pitchers’ ERAs will look like given their strikeout rate (and holding everything else constant).” I had to reread that a couple of times to make sure what it was saying, but I think I’ve got it – basically, the coefficient attached to strikeout rate is the effect on ERA… Read more »
Matt Swartz
Guest
Matt Swartz
@Ryan: Yes, park effects do play a role here. James Click did some great work in creating PADE (Park-Adjusted Defensive Efficiency). That re-ordered teams much like the pitcher-adjusted defensive efficiency that I did here. Ideally, you could do both. @Nyet: Yes, that’s what I meant. Sorry. It picks up direct + indirect effects. FIP actually measures the direct effect of a strikeout per IP, so you can basically figure out the indirect effect with a little calculus. If you take the derivative of a strikeout with respect to SIERA, and plug in league average values for strikeout, walk, and groundball… Read more »
Mike Fast
Guest
Mike Fast

The community has shown with certainty that there is little difference between pitchers?  I would say that my study of HITf/x data indicated exactly the opposite.

And similarly for team defensive efficiency, a large portion of it is due to how hard the team’s pitchers allow the ball to be hit.

Single-year BABIP is a crude measure of pitcher skill, and it’s leading you to conclusions about the game of baseball that are very wrong.

Bojan Koprivica
Guest
Bojan Koprivica

Very interesting read, thanks

Anonymoose
Guest
Anonymoose

You wrote:

“Pitchers who induce fewer ground balls give up fewer hits on balls in play.”

I can’t access the BP article you linked to, but this sounds like a surprisingly counter-intuitive conclusion.  Did you mean to say that pitchers who induce MORE ground balls give up fewer hits on balls in play?  Or is there some correlation between DIPS and weak flyball contact like IFFB?

Bojan Koprivica
Guest
Bojan Koprivica

“Pitchers who induce fewer ground balls give up fewer hits on balls in play.”

BABIP is higher on ground balls than on fly balls, so intuitively it makes sense.

Matt Swartz
Guest
Matt Swartz
Mike: I’m not coming to any wrong conclusions. I don’t know what you think I’m doing with single season BABIP, but it’s not leading myself to wrong conclusions. There IS little difference relative to the difference between pitchers in strikeout rate, which is why it takes more than a season to stabilize. What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates. I… Read more »
Matt Swartz
Guest
Matt Swartz

Bojan:
You’re correct, it was a typo. Sorry. GB% is positively correlated with BABIP, but it does go down for very high GB%. The highest BABIPs would be around 50% GB rate, all else equal.

Mike Fast
Guest
Mike Fast
I’m not disputing your statistics.  I’m disputing your conclusions about the game of baseball. “What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates.” Right.  But I did show that BABIP is a poor way to measure pitcher skill.  We sorta knew that already, but some people had taken the BABIP findings to mean that pitcher skill was also minimal.  I established that… Read more »
Nate
Guest
Nate

Thanks, Matt. Not sure I get how FIP measures the impact of a strikeout directly; how is that model capable of teasing out the different effects? It seems like with less info (only k, bb/hbp, and hr per inning), you’d be relying even more on the peripheral effects K has. Or is that why FIP is not as good of a predictor as SIERA?

thanks again!

Matt Swartz
Guest
Matt Swartz
Which of my conclusions about the game of baseball do you dispute? You found that how hard a ball is hit is highly correlated. This is a self-contained statistic that is only useful inasmuch as it can teach you about singles, doubles, triples, home runs, outs, and errors. It doesn’t do me any good to know the statistic otherwise, except for how it relates to outcomes that affect games. So BABIP is a logical skill to try to infer from how hard a ball is hit, and your numbers do a nice job of hitting on that. I think when… Read more »
Matt Swartz
Guest
Matt Swartz

Nate:
The way that the individual effect of a strikeout can be determined is by looking at win expectancy matricies. At insidethebook.com, they have all that determined out. It’s Markov marticies and stuff, kind of complicated, but rigorous.

The reason that FIP is worse at predicting ERA than SIERA is largely the K% correlation, yes. Add in the fact that SIERA regresses HR/FB, and that explains some of the benefit of SIERA in RMSE but not in correlation. That SIERA predicts low BABIPs for very-high-GB% pitchers also helps a LOT too. The K/BABIP correlation is a big part of it, though.

Nyet Jones
Guest
Nyet Jones

So FIP was built off the Markov matrices (giving out a sort of average K-event value) whereas SIERA is a regression of K/PA’s relationship with ERA. Cool, got it.

Jack Thomas
Guest
Jack Thomas

TB’s low 2011 BABIP is strongly impacted by Hellickson’s ridiculously low .224 BABIP. He accounted for 13% of TB ‘s total IP.

Dave Studeman
Guest
Dave Studeman

Matt, are you sure about the K value in FIP? I thought Tango had stated that the K value in FIP is not just based on markov chains, but perhaps I’m misunderstanding what you or he have said.

Matt Swartz
Guest
Matt Swartz

Dave, I might be mistaken. I had remembered it as Markov matricies, but there might be something else there. In general, I think it’s a way of getting the direct effect of a K, but I forget exactly what the procedure entailed.

wpDiscuz