This post builds on the one I wrote a few hours ago, so I’d encourage you to read that if you haven’t yet. If you really don’t want to follow the link, this is the paragraph where we’re picking up from:
In the end, we had to choose between two different methods – assuming that the pitcher had no responsibility for the outcome of a ball in play, or attempting to approximate the amount of time that the result was due to the pitcher or the fielder. Ideally, we’d be able to do the latter – which is how Sean approaches it – but I just don’t think we currently have the tools available to make an accurate enough judgment on how to apportion that responsibility.
Clearly, some hits on balls on play are the “fault” of the pitcher. He throws a fastball down the middle in a 3-1 count and the hitter whacks it for a double in the gap – that’s on him, certainly. However, most hits are not of that variety. Instead, they’re ground balls in between two defenders or fly balls that fall near a chasing outfielder before he can get to it. In those instances, we don’t really know how much responsibility for the hit should go to the pitcher or the fielder. Would Elvis Andrus have gotten to that grounder up the middle that Yuniesky Betancourt didn’t get close to? Maybe, maybe not. Did Carl Crawford run down a shallow popup that Juan Rivera would have had to pick up on the third bounce? Perhaps. We don’t have the luxury of having a control group for each ball in play. All we know is whether the guy who happened to be the defender on duty at the time was able to make the play or not.
So, what do we do a specific pitcher’s results on balls in play? This was the thing that I wrestled with the most while we were designing WAR for pitchers a few years ago. I can see an argument for doing it in one of two ways, though I think both have problems.
1. FIP-based WAR, which is what we ended up using, essentially admits that we don’t have enough information about dividing responsibility for the results of balls in play, and so it ignores them.
2. RA-based WAR, which is what Sean ended up using, attempts to adjust for defensive contribution by taking a team’s overall Total Zone rating and assigning an expected defensive debit or credit to each pitcher based on how the team performed on the season as a whole.
I get why Sean did it the way he did it, and I understand why there are people who prefer that path. It appeals to our inherent sense of runs allowed being a record of what actually happened, and presents the possibility of achieving the ultimate goal – a pitcher’s total contribution to run prevention with the effects of his teammates factored out. The problem, though, is a pretty big one, and the one that caused me to lean away from RA-based WAR for our purposes here. It assumes that the distribution of defensive performance was even for each pitcher on every team, which is quite obviously not going to be true. Simply put, it is not a record of what actually happened – it is an assumption of what might have happened if all defenders on a team were evenly skilled and were perfectly consistent on a day-to-day basis.
We can simply look at the distribution of run support for a pitcher on any given team to see that the assumption of even performance is not going to be true. If we use the Yankees rotation as an example, we see that the Yankees averaged 5.30 runs per game this year. Their distribution by starting pitcher is below:
No pitcher is actually within half a run of the team average. Burnett and Vazquez are over a run per game lower than the overall total, while Pettitte is nearly three quarters of a run per game higher and Hughes is a run and a half per game higher. If you built a metric that worked off the assumption that the Yankees offense scored the same amount of runs per game when Vazquez was on the mound as when Hughes was on the mound, you’d probably draw some pretty inaccurate conclusions. There is no reason to think that defensive performance is any more consistent on a day-to-day basis. If anything, there are reasons to believe that it would vary even more than offense.
In general, a team will run out a similar line-up on a day-to-day basis, and each guy will get about the same number of plate appearances per day, as required by having batters take turns in order. That boundary does not hold with defenders, however. There is no rule that says each player on the field get an equal number of opportunities each day. In fact, given that pitchers have different tendencies in terms of groundball and flyball rates, it’s nearly guaranteed that the defensive opportunities will not be equal between pitchers.
Using aggregate stats from a team’s entire season simply won’t give you the kind of detail needed to accurately determine the quality of defense that was played behind a given pitcher in a season. Doing pitcher WAR that way provides an end result that does not match what actually happened. It is not an accounting of what actually happened.
Since neither method gets us to that goal of accurate accounting, we’re left with a choice of two paths, both with structural problems that can’t be avoided based on the data we currently have access to. Personally, I prefer FIP-based WAR because it is easier to adjust for what we know is not included – BABIP and sequencing, essentially – than it is to take a defense-adjusted RA based WAR and make adjustments for places where the assumption of defensive distribution equality does not hold.
Let’s use Francisco Liriano and Cliff Lee as examples. Liriano’s RA results don’t match his FIP in large part because he has a .340 batting average on balls in play. Since our version of WAR doesn’t hold that against him, he comes out looking really good. An RA-based version not only holds his actual BABIP against him (by starting with runs allowed), but it then penalizes him further because the Twins have an above-average defense, and the assumption is that he got proportionate help from the guys behind him.
What is more likely – that Liriano gave up contacted balls that should have resulted in a .350 to .360 BABIP, and the good glove Twins helped bring that down to .340, or that the guys behind him didn’t make as many plays for him as they did when Carl Pavano or Brian Duensing was pitching? Considering that he posted a basically league average 19.1% line drive rate, I’m more inclined to believe that the latter is closer to the truth. We don’t know exactly what kind of defensive support Liriano got this year, but based on what we know about a pitcher’s control over BABIP, I think we’re better off assuming that there were some issues behind him that hurt him than we are assuming that the Twins defense supported him equally as well as they supported his fellow pitchers.
Lee’s case shows the other side of the coin that FIP ignores – when those hits occur. While he has a normal .302 batting average on balls in play, it is not at evenly distributed within the base-out states. His BABIP is just .257 with the bases empty, but jumps to .350 with men on base, and is .333 with runners in scoring position. Because of that split in when his balls are being turned into outs, he has a LOB% of just 67.9%, well below average and far below what pitchers of his quality have posted this year.
For Lee, it hasn’t been a problem of too many finding holes, but simply those balls finding holes at the wrong times. It’s possible that those hits were a result of poor location, but given that he ran a 10.00 K/BB ratio with men in scoring position, it doesn’t seem like Lee suddenly lost his command when men got on base this year. Maybe he did – I don’t know. But should we assume that a pitcher’s BABIP with RISP is under his control? That’s really the driving force of Lee’s ERA this year, we have to acknowledge that it is certainly possible that his defenders let him down in those critical situations.
It is also possible that he let himself down. We just don’t really know who is at fault – pitcher or defenders. FIP blames BABIP entirely on defense. That’s definitely wrong. Defense-adjusted RA assumes that each pitcher got the same support from their teammates. That is also definitely wrong.
So, we’re left with two imperfect options. Which should you prefer? I can’t answer that for you. They both have strengths and weaknesses, and both are valid attempts to answer the question that we’re really trying to get at. I prefer the FIP-based implementation because it’s easier to make mental adjustments from that number, knowing that BABIP and sequencing are not included, than it is to try and back out of a metric that is already attempting to account for defensive support and find out where it might have missed the mark, but that’s a personal preference more than it’s a right or wrong thing.
WAR is not perfect, and it’s less perfect for pitchers than it is for hitters. Separating out defense from pitching is hard, and we don’t have it all figured out yet. We don’t encourage you to use any version of WAR as the be-all, end-all of analysis. We think its a pretty nifty tool, especially if you understand its limitations, and it does a good in most instances. However, it’s not perfect. Our version isn’t perfect, and Sean’s version isn’t perfect. We’re both trying, and we’re trying from different angles. Rather than focusing on why the differences make both “wrong,” maybe we should admit that its nice to have both perspectives?