- FanGraphs Baseball - http://www.fangraphs.com/blogs -

A Discussion About Evaluating Pitchers

Eric Seidman and I had a conversation about pitchers, pitching metrics, and the end of season awards last night. The fruits of that conversation are below.

Dave: So, I took a sneak peak at the FanGraphs author awards ballot, and you’re kind of a traitor. You can make a strong case for Roy Halladay or Cliff Lee, but instead, you pick Clayton Kershaw, even though he has a WAR of 6.8 compared to Halladay’s 8.0. You’re from Philadelphia, you write for FanGraphs, and you pick the pitcher with a lower WAR who doesn’t play for the Phillies? Don’t you know that you’re supposed to be a slave to the stats, and our most recognizable stat says Halladay has been better? You’ve got some explaining to do.

Eric: I’m a loner, Dottie, a rebel. At least when it comes to the Cy Young Award it seems. But I don’t think it’s crazy to support Kershaw for the NL’s best pitcher of the year award even though Halladay has a 1 WAR advantage. Bear in mind that by voting for Kershaw I’m not dissing Doc or his tremendous work this season. It wasn’t like Kershaw clearly stood out above the rest. I wrestled with the decision but ultimately decided that it would be nice to see him take home some hardware. The difference between he and Halladay is actually an interesting proxy for discussing why WAR isn’t the be-all, end-all when it comes to evaluating pitchers. First, some numbers:

Kershaw: 218 2/3 IP, 9.7 K/9, 2.1 BB/9, 43% GBs, .272 BABIP, 2.30 ERA, 2.37 FIP, 2.63 SIERA
Halladay: 227 2/3 IP, 8.6 K/9, 1.3 BB/9, 51% GBs, .300 BABIP, 2.41 ERA, 2.18 FIP, 2.61 SIERA

Each pitcher has excellent peripherals, and Halladay doesn’t have a dominant innings advantage like he did last season. Their ERAs and adjusted estimators — xFIP and SIERA — are very similar as well. The major difference for me, and why I don’t truly believe Halladay holds as big a lead in WAR as it seems, is Kershaw’s batting average on balls in play. While .272 seems absurdly low, his mark was .275 last season and .269 in 2009. Dodgers stadium likely plays a role in that, but FIP doesn’t factor his BABIP prevention into the equation. Given the lackluster infield defense and the revolving door of middle infielders he’s pitched in front of, it seems safe to assume that most of the prevention comes from Kershaw’s skill-set, repertoire and sequencing. Were we to retroactively credit his WAR given that it seems he is going to be one of those consistent BABIP-preventers, suddenly that 6.8 WAR might be be 7.5.

At that point, we’re really splitting hairs. There isn’t a statistically significant difference between 7.5 and 8 WAR. Plus, even though I have Halladay as my computer and phone background, and root for him as a Phillies fan, I would just like to see Kershaw get some more national recognition.

Dave: By choosing a FIP-based pitcher WAR, I agree that the measure leaves out some things that pitchers do deserve credit for, but are you really ready to give a pitcher 100% credit for his BABIP? Yes, Kershaw has shown some ability to post lower than average marks before, but almost all of that is due to his performance at home, where his career BABIP is just .271 (and .248 this year). On the road, he’s been basically league average at preventing hits on balls in play. I’m not saying we should limit ourselves to split season data and draw conclusions only from smaller samples, but I’d be more convinced that the variable affecting Kershaw’s BABIP was actually something he’s doing if he was able to do it anywhere besides Los Angeles.

Certainly, when doing retrospective value analysis, I believe that pitchers should get some credit (or blame) for their BABIP. FIP gives them no credit, ERA gives them total credit, but the truth is somewhere in the middle. We just don’t know where. So, should we split the difference and hope we’re close? I don’t know that there’s a right answer here. Separating what is pitching and what is outside factors is just hard.

Eric: Right, I wouldn’t want to give Kershaw complete credit for the BABIP-prevention but I actually think this year could be the outlier on the road. In 2010 his BABIP split was virtually identical at .271 home/.278 away. The year before, .275 home/.261 away. Now Dodgers Stadium surely factors in, but his 2009-10 road BABIP gives me some reason to believe he has the prevention skills. Again, he shouldn’t be credited for the complete difference between his and the league’s BABIP, but I don’t think ignoring it entirely does him justice. The inverse is also true, especially for a Javier Vazquez-type, who has shown over 10+ years that his estimators will always best his actual run-prevention. Fortunately, these guys are few and far between, so an FIP-based metric does the job 90 percent of the time.

It’s very difficult to separate pitching the other external factors, but perhaps that’s the next great wave of analysis. At the very least, it would be interesting to have a readily available calculator where a user could determine a pitcher’s WAR based on his feelings about BABIP. That would provide a range of possibilities. For someone like Kershaw it might suggest he’s worth between 6.5-8 wins… for Halladay it might be a smaller range, like 8-8.5, but we could foster the conversation about what the BABIP-prevention (or the lack thereof) can do to a pitcher’s value.

Dave: If this year is the outlier for Kershaw, what was 2008, when his road BABIP was .341? I don’t think we can just ignore large home/road BABIP splits in two of the four years he’s been in the big leagues. I’m not saying it’s all Dodger Stadium, but a career road BABIP of .290 makes it tougher to argue that Kershaw is one of the exceptions.

The tough thing about estimating BABIPs impact on ERA is that it can be skewed situationally. Last year, for instance, Cliff Lee’s BABIP was .256 with the bases empty and .344 with men on base, so he posted a ridiculously low LOB%, which drove his ERA up significantly. His overall BABIP didn’t look out of whack, but the distribution of when those hits came had a remarkable impact on the amount of runs he allowed. We’ve seen a bit of the opposite this year with Kershaw (though not to the same degree), as his BABIP with men on base is just .254, and only .260 with men in scoring position, so he’s got one of the highest strand rates in baseball.

This is part of why I’m in disagreement with the “ERA measures what happened” crowd. For all we know, Dee Gordon, Rafael Furcal, and Jamey Carroll each made an outstanding play with runners at second and third this year, saving Kershaw six runs in the process. A good or bad defensive play can have a substantial impact on runs allowed, and I don’t think we can just assume that those plays are distributed normally throughout all situations.

Should Kershaw get some credit for outperforming what we’d expect based on his walk rate, strikeout rate, and home run rate? Absolutely. Should he get enough to make up for the fact that Roy Halladay has just out-pitched him against better competition and in a better hitter’s park? It seems like you would have to give him almost total credit for his hit prevention – and the timing of that hit prevention – in order to swing Kershaw’s way.

Eric: Right — I think many have a skewed idea of what “happened” and don’t truly realize how much 4-6 earned runs mean over the course of the small sample that is 200 innings. For instance, tack on another five earned runs and, despite throwing 218 2/3 innings, Kershaw’s ERA jumps to 2.51. It doesn’t take much for that number to be manipulated. Whether it’s defensive players making a tremendous stop, or a two-out error followed by five runs that don’t count against the pitcher, ERA clearly doesn’t tell us everything we want to know. Not only with respect to run prevention, but also in terms of what actually took place on the field.

Dave: Over in the AL, Justin Verlander is almost certainly going to win the Cy Young, but he might just win the MVP award too. Not to turn this into another article on what the word valuable means, but how would you judge Verlander against premium position players like Jose Bautista, Jacoby Ellsbury, and Curtis Granderson this season? WAR has them in the same general range, and Verlander is probably under-credited for his season by a system based on FIP, so giving him partial credit for his low BABIP closes the gap even further. You didn’t vote for Verlander in the staff MVP awards, but given your arguments about Kershaw, it’s hard to believe that you don’t see him as being among the most valuable players in the AL this year.

Eric: As for Verlander, it seems like virtually none of the balls batters put into play off of him fall in for hits. While I wouldn’t expect him to consistently hold batters to a .235 average on balls in play, he has been absolutely tremendous this season, and awards are based on what happened, not what might happen in the future. Verlander has clearly been less hittable than, well, anyone, and he is a major reason the Tigers won their division. You’re absolutely right that he’s one of the most valuable players in the league this season, and I’d honestly vote for him #2 on my ballot behind Bautista. If Verlander is credited for his BABIP prevention, as I believe he should be, the gap between he and Joey Bats does close, but I don’t think it would close enough for me to give him the award.

Bautista, in a down offensive season, is hitting like batters did in the mid-90s and is putting the finishing touches on one of the best offensive seasons we’ve seen. Just like Pujols missed out on MVPs because Bonds was otherworldly, I can’t justify a Verlander-for-MVP campaign when Bautista is tearing the league up.

That being said, I do believe pitchers should be considered for the MVP. As we discussed in my recent article about Verlander’s MVP credentials, the extreme impact he has in a smaller concentration of games can be argued to have been more valuable to the Tigers than marginal improvements in their odds of winning from everyday players in the games he didn’t start. Some view the 32-35 starts as a detriment to a pitcher’s campaign while I think of it in the opposite manner. If the Tigers odds of winning are 65 percent when he pitches and 51 percent when he doesn’t (made up to illustrate the point), then he can certainly have the same level, if not a greater level, of impact as a position player.

Dave: Not to carry the Kershaw/Halladay discussion over to this argument too much, but I do wonder if there’s some confirmation bias going on here. Justin Verlander is excellent, and throws really freaking hard, so when he posts a low BABIP, it’s not that tough to draw the conclusion that he’s just throwing pitches that are tough to hit. Except, Justin Verlander has been really good and thrown really hard for years, and his BABIP last year was .286, and the year before that it was .319. In fact, most studies that have found a trend in guys who can consistently beat the league average in BABIP tend to show that it’s soft-tossing lefties – the Barry Zito‘s of the world, not the Justin Verlander’s.

Has Verlander actually been less hittable than anyone else in the majors, or do we just talk ourselves into that conclusion when a great pitcher also gets a lot of balls hit right at his defenders? I’ll repeat the point I made about Kershaw – I’m comfortable giving him some credit for his BABIP, but all of it? I don’t think so. We’ve come too far in understanding the impact of non-pitcher variables on the outcomes of balls in play to then ignore that progress when we start looking backwards instead of forwards.

Eric: Getting a lot of balls hit right at defenders isn’t necessarily something Verlander or any other pitcher can control, but it’s entirely possible that, this season more than any other, batters are making weaker contact, which makes the jobs of his fielders much easier. If HITf/x were available — and around for 5+ years so we had some context to incorporate — we might see that the speed off the bat when he pitches is much lesser in scope than that of other hurlers. Right now we only have his balls in play distribution and his BABIP, so it’s impossible to make that determination.

But I do think that getting balls hit at fielders isn’t necessarily something that should take away from the pitcher. There’s a big difference between lined one-hoppers right at a bailing first baseman and weak grounders hit at a spot where even an average shortstop could range. I’m not saying with 100 percent confidence that’s the case, and I will certainly grant that confirmation bias could be at work, but until we have more detailed data, the natural reaction is to revert to what we know with the most certainty. Right now that would suggest that Verlander has some prevention skill but it’s a combo of a tremendous pitcher having a flukily lucky season. I don’t know if I’m okay accepting that without at least testing what his WAR would be if we considered his BABIP true talent level to be X, Y, or Z.

Dave: Yeah, I’ll retract the “balls hit at defenders” comment, since we’re trying to isolate a pitcher’s performance but not strip out all aspects of luck. I’d agree that it’s likely that if we had Hit F/x, we’d probably find out that Verlander’s been getting more weak contact than usual, but we don’t have that, and we’re the kind of people that like evidence, and I don’t know that we have much to support the idea that Verlander has actually induced weak contact. It seems like the kind of thing we’d expect to find if we could prove it, but then again, we probably would have expected to find that all good pitchers produced weak contact before Voros McCracken came along.

Like Kershaw with the Cy Young, Verlander’s case for MVP rests mostly on a full acceptance that his low BABIP is almost entirely something he caused. He’s had a good enough year that he’s certainly a worthy candidate, and it won’t be any kind of poor choice if he ends up winning the award, but I’d agree that I just don’t quite see enough pure dominance to make up for the fact that there are some position players having truly spectacular years as well – and their greatness comes with fewer caveats.

Eric: The idea of weak contact does seem like more of a backtracking statement and a possibility drawn up after seeing results, and not the process, but I used that more to illustrate that there is an awful lot of unknown out there. Way too much unknown for people to express opinions with any meaningful level of certainty. But I do think Verlander’s candidacy extends beyond just the BABIP prevention and whether it’s his or not. Given the vague definition of the award and the nebulous term ‘value’, and how aspects ancillary to the individual — like teamwide success or the number of other good players on the team — are often factored in, I’d wager that Verlander would be a candidate even if he had, say, a .260-.265 BABIP. Bautista plays for a non-playoff team, and voters are hard-pressed to select a player from the Yanks and Red Sox when there are numerous very valuable players. This is somewhat silly, but it’s how many people think when they’re given a ton of leeway in interpreting the intentions of an award.