- FanGraphs Baseball - http://www.fangraphs.com/blogs -

On Context, or Evaluating Hitters and Pitchers Differently

Here at FanGraphs, our pitching WAR is built around Fielding Independent Pitching, which focuses solely on a pitcher’s walks, strikeouts, and home runs allowed. Because it ignores the results of balls in play and the order in which results occur, there are occasionally big differences between a pitcher’s FIP and his ERA. This divide often leads to some consternation when a pitcher with a high ERA posts a decent WAR, or in reverse, when our WAR doesn’t grade out a pitcher with a very low ERA that highly.

A significant number of people — including a good chunk of our own readers, and noted sabermetric evangelists like Brian Kenny — prefer to evaluate pitchers by runs allowed because, as I’ve heard repeatedly over the last few years, that measures “what actually happened”. And that’s one of the reasons we have RA9-WAR here on the site, as we know that a sizable amount of people prefer to evaluate pitchers in that way.

I believe there are valid points on both sides, and I see the argument using a FIP-based WAR and a RA9-based WAR when evaluating a pitcher’s past performance. However, I find it interesting that this debate has not carried over to position players, where there seems to be broad consensus* that context-neutral is the way to go.

*Allowing for the fact that there was definitely some positive response to my article on Context Batting Runs last week, my feeling is that there’s still not much of a push towards this kind of evaluation for hitters.

It’s not even just those of us who subscribe to a linear weights based WAR, like we use here on FanGraphs. Even just looking at a player’s standard batting line, or using BA/OBP/SLG with some adjustments for playing time, or OPS+; these are all offensive evaluations that consider only the number of events that happened, not the situation in which they occurred. And there is essentially broad agreement that these are the best types of measures to use when evaluating how well a hitter contributed to his team’s performance.

The only context-specific statistic that has any real traction is RBIs, and the sabermetric community — myself included, so I’m not pointing fingers here — has spent years explaining why using RBIs to evaluate a player’s contributions is a misuse of statistics. RBIs are something of a pariah in the analytic community, shunned because they are a team statistic masquerading as an individual measure. For valid reasons, they’ve been marginalized by sites like this one, almost completely removed from the discussion of player value among the “new school” crowd.

These decisions on how to value player performance are incongruous. If you use something like wOBA to evaluate a hitter, you are making a conscious decision to ignore the order of events that “actually happened”. However, when evaluating a pitcher by runs allowed, you’re making the same decision to include sequencing factors and hold the pitcher entirely responsible for the order in which events occurred.

For instance, here are two hypothetical innings, with only the order of sequence changed.

Scenario A: Single, single, homer, fly out, fly out, fly out.

Scenario B: Homer, single, single, fly out, fly out, fly out.

By runs allowed, the pitcher in Scenario A is charged with giving up three runs, while in Scenario B, he’s either giving up one or two, depending on whether or not the guy who hit the first single was fast enough to tag up and reach third on the first fly out, and then whether the second fly out was hit deep enough to score him. His FIP for the inning would be 16.04 in either case, but his ERA could be 9.00, 18.00, or 27.00, depending the order of sequence and the speed of the inning’s second hitter.

By wOBA, these two innings are exactly the same, as it simply sees six individual events, giving out credit for each one without regard for what became before or after. The third batter in Scenario A will get the same credit as the first batter in Scenario B, because they both hit home runs, and because the measure is context-neutral (by design), it will simply give them the average credit for a home run based on the expected distribution of when home runs occur. wOBA, like FIP, sees these two innings as equal, both from the perspective of the hitter and pitcher.

ERA and RA9 would see these innings quite differently, assigning three runs to the pitcher in Scenario A — and either one or two in Scenario B — because that is how many runs were scored while he was pitching. The sequence of events is a significant factor in how the pitcher is valued.

This isn’t to say that one or the other is definitively right or wrong. There is some evidence that pitcher sequencing is at least partially a skill. Guys like Jim Palmer and Tom Glavine accumulated an extra +13 wins from sequencing during their careers, while Nolan Ryan was an amazing 34 wins in the negative based on the order of events. If a pitcher struggles to pitch out of the stretch — or conversely pitches in such a way that he strands more runners than you might expect from his overall numbers — then there is an argument that the pitcher ought to be credited or blamed for those results.

Of course, the answers aren’t always that clear cut, especially when we’re not looking at a player’s entire career. In the two scenarios above, we really don’t yet know how much credit or blame the pitcher should receive for the two singles that occurred. They could have been scorching line drives that no fielder had a chance to make a play on, or they could have been routine ground balls that rolled into the outfield because the defenders behind the pitcher have the range of a potted plant.

Both RA9 and FIP respond to this uncertainty by taking polarizing extremes, with FIP giving the pitcher no responsibility for the hit and RA9 giving the pitcher all the responsibility. Both are clearly wrong. There are ways to attempt to adjust for defensive performance, as Baseball-Reference does with their version of pitcher WAR, but they require some huge assumptions about the consistency of team defensive performance that is also clearly wrong, and gets away from that “what actually happened” point of origin.

But I’m getting a little off course here. The goal of this article isn’t to argue that a FIP based WAR or a RA9 based WAR is superior. I simply think it’s worth pointing out that using a linear weights based WAR for position players — which pretty much every popular WAR implantation uses, including ours — is inconsistent with using an ERA/RA9 based WAR for pitchers. If you use that combination of metrics, you are giving hitters no credit or blame for the contexts in which their performances occurred, but you are giving the pitcher full credit or blame for those same situations.

And, based on what we know about how to distribute credit for how balls in play become hits, this is probably backwards. We are pretty sure that, when a hitter gets a hit, he is the only offensive player who deserves credit for that outcome. However, when a pitcher gives up a hit, we often do not know whether it was his fault or whether it was a failure of his defense. And yet, if our scenarios included a bases clearing double instead of a home run, we would assign the pitcher (through ERA/RA9) the full blame for the two runs that scored on that double, while only giving the hitter credit for the average run value of all doubles, ignoring the fact that he drove in two runs in the process.

Again, I see the argument for using both context neutral and context dependent statistics to evaluate player performance, especially when we are looking backwards and asking questions of past value. There is a difference between trying to isolate skills and trying to measure the value of events that have already occurred. I just think that maybe we, as a community, should consider evaluating position players and pitchers the same way.

In this way, wOBA and FIP are similar, which is one of the reasons why we use FIP as our basis for pitching WAR. With a linear weights model for both hitters and pitchers, we are attempting to evaluate pitchers based on the number of events we can credit or blame them for, and not measuring the sequence in which those events occurred. If one’s preference is to use RA9-WAR, then I’d suggest that perhaps it would be more fair to also evaluate hitters based on situational performance, which would lead to relying on something like RE24 for offensive performance.

It is worth noting that RE24 here on FanGraphs isn’t a perfect replacement for Batting Runs in the WAR calculation, because RE24 also includes SB/CS, it’s more like RE24 replaces Batting Runs and the wSB part of our Baserunning measure. However, depending on future interest in this kind of calculation, it is possible to build a version of RE24 that doesn’t include any baserunning, and that could simply be subbed in for context-neutral batting runs if there was a desire to build a version of WAR for position players that modeled the way RA9 treats situational events for pitchers.

But then again, there’s also a school of thought that there are already too many versions of WAR going around as it is. Most people I talk to want fewer WARs, not more. The problem is that we’re not always asking the same question, and at the end of the day, answering questions is the entire reason we have analytical data to begin with. While I’m not necessarily advocating for one position or the other, I do think it’s worth pointing out that the currently popular versions of WAR for hitters do not answer the same question that a runs allowed based WAR for pitchers seeks to answer.

If you prefer RA9-WAR for pitchers, you’re essentially asking a different question than you are when you use WAR for position players. It’s worth considering whether that’s a problem we’re okay with, or whether that is an argument for either using a FIP-based WAR for pitchers — the conclusion we came to when we built WAR here on FanGraphs — or creating a new version WAR for position players that gives them credit or blame for their situational hitting.

Otherwise, combining a linear weights based position player WAR with a runs allowed based pitcher WAR creates a bit of a paradox. Maybe we’re okay with that, but we should probably at least be aware that it’s what many people who use that combination of WARs are doing.