Substituting DRS for UZR in WAR

When we calculate WAR at FanGraphs, we use a player’s UZR as his defensive input. This holds true for all positions except catcher, which defer to defensive runs saved (DRS), since UZR does not measure defense for catchers. That led me to wonder what would happen if we used DRS across the board. How big a difference might we see in the WAR values of the league’s best players?

For the players at the very top of the WAR leader board there wasn’t a huge difference. Josh Hamilton, for instance, loses just 0.9 RAR when we substitute DRS for UZR. That moves him very slightly, from 8 to 7.9 WAR, which would still lead the league. Joey Votto takes a similarly small hit, perhaps not even enough to move his WAR 0.1. Even Albert Pujols doesn’t take much of a hit. It’s after those three, though, that things start to get interesting.

The first one to get a big bump is Ryan Zimmerman. UZR credited him with 13.9 runs above average, which ranked third among MLB shortstops. DRS, on the other hand, credits him with 20 runs above average, which increases his RAR to 75.2. That would put him second in the league in WAR — it would hold up because Hamilton, Votto, and Pujols take that small hit from the UZR to DRS switch. Still, DRS still doesn’t rate Zimmerman as the best third baseman in the league. That honor belongs to UZR leader Chase Headley.

Robinson Cano, nominative MVP candidate, would also benefit if we substituted DRS for UZR in WAR. While he ranked below average with a -0.6 UZR, he ranked quite higher with DRS, seven runs above average. That would bump his WAR to around 7.2, which would move him all the way to fifth in the league. His passing of Adrian Beltre would stand, since Beltre takes a slight hit with DRS. He would not, of course, pass Zimmerman, since the alternate WAR has Zimmerman around 7.6 WAR.

What of Jose Bautista? By outslugging the league by a significant margin he produced 55.9 park-adjusted batting runs above average, but defense took him down a peg. Between third base and the outfield he checked in at 7 runs below average per UZR. Yet DRS saw something different. If we change WAR to reflect Bautista’s 6 DRS we’d have him at somewhere around 8.1, 8.2 WAR. That would trump Hamilton, since as we saw above he takes a slight hit with DRS.

Evan Longoria presents an interesting case. He led the league in bWAR, but finished just sixth in our WAR. Changing UZR to DRS would give him 1.9 additional runs above average, bumping him to around 7.1 WAR. That would put him right around Beltre’s level, if not a bit higher.

If we’re looking for a player who scored lower on DRS than UZR, we can look to Longoria’s teammate Carl Crawford. By UZR he ranked the league’s second best left fielder at 18.5 runs above average. Change that to DRS, though, and he has only 14 runs above average. That downgrades him from 6.9 WAR to around 6.5 WAR. We can see the same in Andres Torres. His 6 WAR is based heavily on a 21.2 UZR. DRS sees things differently, crediting Torres with just 12 runs above average. The change amounts to a huge difference in WAR, knocking him all the way to 5.4.

Instead of ending on the negative, let’s close this exercise with someone who gets a decent boost from DRS. That would be Troy Tulowitzki. While UZR credits him with 7.1 runs above average, an impressive figure considering he missed a decent chunk of time, DRS bumps that to 16 above average. That gives him nearly an entire win bump, from 6.4 to around 7.4. That would place him ahead of Albert Pujols.

What we can do with these changes I’m not exactly sure. Both UZR and DRS use the same data from Baseball Info Solutions, but they interpret it differently. I’m sure we could conduct a similar experiment using Total Zone, or even Colin Wyers’s nFRAA, neither of which uses the BIS data. It’s just something to consider when using any version of WAR to rank players.



Print This Post



Joe also writes about the Yankees at River Ave. Blues.


Sort by:   newest | oldest | most voted
Lucas Apostoleris
Guest
5 years 6 months ago

Good stuff, Joe.

Piccamo
Guest
5 years 6 months ago

“The first one to get a big bump is Ryan Zimmerman. UZR credited him with 13.9 runs above average, which ranked third among MLB shortstops.”

Is Zimmerman so good at third base that we classify him as a shortstop now? :p

Anon
Guest
Anon
5 years 6 months ago

This is how errors should be critiqued. Just poke a little fun at the author, nothing mean-spirited. Props Piccamo.

Michael
Guest
5 years 6 months ago

Very cool. Would be fantastic to have this data available on all players…

Austin
Guest
Austin
5 years 6 months ago

It would, indeed. One of the biggest gaps that I’ve noticed is with Austin Jackson (yes, it’s coincidence that we share a first name), whom DRS credits with 21 runs as opposed to UZR’s 5.4. A reevaluated WAR of around 5.3, BABIP-inflated or not, would look awfully impressive for a rookie.

Locke
Guest
Locke
5 years 6 months ago

The title of this article got my hopes up, yet looking back at it.. you didn’t lie! You could, however, have replaced literally this entire piece with one single table showing the top 20 WAR calculated with UZR then in the next column, that player’s WAR using DRS. Zero words needed.

What would’ve been interesting would be… you know… WHY they numbers are actually different, which metric is better, why we use one not the other, why don’t we use a combination of both……or any of the many questions that come to mind when reading this mess of words surrounding 10 numbers.

Any actual analysis at all would’ve been awesome.

John
Guest
John
5 years 6 months ago

This wasn’t an article filled with opinions. This entire article is used to the differences, not why. Why could be any number of reasons, which would be a longer article. Maybe you should write, then we can all come here and explain why it was terrible.

Locke
Guest
Locke
5 years 6 months ago

One table replaces this entire article. 4 Columns.

Locke
Guest
Locke
5 years 6 months ago

Not a single word added to the data.

198d
Guest
198d
5 years 6 months ago

I couldn’t help but think this upon reading as well. It’s just a table in “sentence” form, and IMHO would be easier to digest presented as such.

gradygradychase
Member
gradygradychase
5 years 6 months ago

Ryan Zimmerman is a shortstop?
That’s fine!!

Lee Panas
Guest
5 years 6 months ago

I like the idea of averaging multiple metrics together. It could be a straight average or a weighted average. I’m not sure it would improve our accuracy, but it would be a conservative estimate. It would have the effect of regressing to the mean. I think if we are going to plug a fielding number into WAR, I would rather risk erring near the mean than erring at the extremes.

The Nicker
Guest
The Nicker
5 years 6 months ago

100% with this Lee on this one. Why don’t we use some average (weighted or otherwise) for defensive players?

I would feel better about the numbers if we were using four systems (UZR, DRS, TZL, FANS) instead of just one, when trying to summarize a player’s season in something like WAR.

Anon21
Guest
Anon21
5 years 6 months ago

Perhaps Joe could answer a question I’ve had for a while. Does fWAR use the current season’s UZR data to calculate current season WAR, or does it use some kind of rolling average for the past few seasons? If fWAR does use current season UZR, and only current season UZR, what of the frequent injunctions we hear around here *against* relying on a single season (or even worse, a partial season) of UZR data for anything meaningful?

J.P.
Guest
J.P.
5 years 6 months ago

It’s not necessarily meaningful when trying to determine a player’s true talent level on defense. It is, however, a record of what he was actually worth that year on defense. fWAR doesn’t account for BABIP luck, either. It just records what the value of the hits and outs were.

phoenix
Guest
phoenix
5 years 6 months ago

this is especially important with guys who play all over the field. they play much fewer innings in one position than a normal stater, so their UZR would be based on a smaller sample than the already often mistaken one season’s worth of data.

xeifrank
Member
5 years 6 months ago

Does 2010 batting average use hits and at bats from 2009?

Sure, if you want to calculate a more accurate “true talent” level you use multiple years, but if you are telling the story of the “2010 season”, you use 2010 data.

Anon21
Guest
Anon21
5 years 6 months ago

I think that’s different. We have excellent data for the batted ball outcomes which are the guts of offensive statistics; we have comparatively shoddy data about the batted ball locations which are supposed to be the guts of defensive statistics. Thus, I had always understood the objection to using a single season of UZR to be that the data that goes into it is not pristine or comprehensive enough to be meaningful in sample sizes of a single season or less; that is, we really aren’t sure that it’s recording what “did happen,” because of the possibility of serious measurement errors. If that’s so, the issue with folding single-season UZR samples into fWAR has little to do with determining true talent level, and a great deal to do with a concern that the defensive data is basically garbage below a certain threshold of observed plays.

Jason
Guest
Jason
5 years 6 months ago

Gotta agree with Anon21 here. UZR/DRS for 2010 doesn’t really tell the story of “what happened” in 2010 the same way that the offensive stats do. By combining single season offensive stats with single season defensive stats you’re essentially combined “what happened” on offense with a defensive stat that has nowhere near the same level of certainty in terms of telling you “what happened”.

This isn’t just a question of “true talent” versus “what happened”. Even those who came up with the advanced defensive metrics don’t argue that those stats captured “what happened” in such a small sample. Because of the uncertain nature of defensive stats (compared to offensive stats where you know what the actual result of the play was), there is a lot more noise and imperfections in small samples of the data.

Hank
Guest
Hank
5 years 6 months ago

This is a poor analogy/metaphor.

The only subjectivity in batted ball outcomes is the official scorer on a marginal hit/error call and an umpires call (both which are very infrequent.

UZR is a model…it is not a direct measurement as people keep stating it is. The zones that are used for classification are relatively large (thus getting to a given ball in a zone is not always the same assumed one size fits all zone probability), not to mention the parsing of how hard the ball is hit and runner speed is very crude. There are issues even with the size of zones as I don’t think it even corrects for park dimensions (thus not all outfield zones are even the same size).

For example while it is nice to say that every ball in a specific OF zone is an out 40% of the time for a moderately hit flyball, what happens when it is a fliner (still judged to be moderately hit), hit on the edge of the zone away from the fielder who is positioned even further away by the bench coachs (UZR doesn’t care where the defensive player starts unless it’s a major infield shift). What happens on the flyball that is hit 100 feet higher to the same spot? On an infield grounder in the hole… how do you distinguish a guy who is jammed and has a tough time getting out of the box vs one that got a running start on a ball hit in the same position?

Variation can also occur with how the zone, batted ball speed, runner speed get classified by the person doing the classification. I think there are some checks on this but there is subjectivity in how virtually every batted ball is classified with the UZR system.

In short (or long) comparing variation in BABIP to variation in UZR is a poor comparison at best. I realize the defensive skill or true talent vs outcome argument but UZR is still modeling the outcome, not measuring it. If folks are arguing it should be outcome based and luck is just part of a 1 year WAR, why not just look at a rate version of putouts and errors?

masterkembo
Member
masterkembo
5 years 6 months ago

“That moves him very slightly, from 8 to 7.9 WAR, which would still lead the league.”

If Hamilton falls to 7.9 WAR and Bautista moves up to the 8.1/8.2 range, how again is Hamilton still leading the league?

Jim Lahey
Guest
Jim Lahey
5 years 6 months ago

Any chance that the WAR calculations could be weighted for both DRS and UZR for the defensive portion to smooth out these differences between the ratings systems?

Someanalyst
Guest
Someanalyst
5 years 6 months ago

Sounds to me like an excellent idea. Some brave soul might even play with the weighting to see which composite value is most predictive of future performance.

the fume
Guest
the fume
5 years 6 months ago

I always try to average the two stats, personally.

Glancing over the stats, the biggest winner looks like BABIP-king Austin Jackson, who is 15 runs or so better with DRS than UZR. That bump would put him around 5.2 WAR.

George
Guest
George
5 years 6 months ago

Bautista was the best player of the 2010 season.

siggian
Guest
siggian
5 years 6 months ago

Well, he’s certainly the best at hitting taters this year and probably the best at growing a beard, having done it a few times during the season.

Locke
Guest
Locke
5 years 6 months ago

What else is there besides taters and beards?

Rally
Guest
Rally
5 years 6 months ago

He’s got nothing on Brian Wilson’s beard though.

grady
Guest
grady
5 years 6 months ago

I always thought Bautista was getting jipped on his defense to be honest. It was always really fun watching people try to go first-to-third on a sharply hit liner/grounder to right.

hgjghgjjh
Guest
hgjghgjjh
5 years 6 months ago

No doubt Jose has one of the best arms in the OF, but he wasn’t great by any means at reading fly balls and taking good routes. Not awful, but not great.

I’d rather have him in the OF than at 3B to utilize his arm.

Baron Samedi
Member
Baron Samedi
5 years 6 months ago

Jose Bautista is the Greatest Blue jay Of All Time.

fredsbank
Guest
fredsbank
5 years 6 months ago

interesting capitalization choices here, champ

Baron Samedi
Member
Baron Samedi
5 years 6 months ago
Darren
Guest
Darren
5 years 6 months ago

The only problem with just replacing UZR with DRS is that unlike UZR, DRS does not sum to 0 for each position. In all cases it is much higher than zero. I think this is because UZR is adjusted to average 0 each year, while DRS does not. So since the batting runs in WAR here at fangraphs zum to zero, should you not make an adjustment to DRS first. I think your findings will not be as dramatic.

J.T. Jordan
Guest
J.T. Jordan
5 years 6 months ago

This.

Someanalyst
Guest
Someanalyst
5 years 6 months ago

Very good point. It would be nice to see what that adjustment did to the numbers.

waynetolleson
Guest
waynetolleson
5 years 6 months ago

This is all interesting stuff. However, the calculation of exactly how many runs a player saves or costs his team is an inexact science. You never know if an error, misplay, or lack of range resulting in a hit is going to prove harmless, if it will result in small damage – i.e. a run or two – or if will have catastrophic consequences, e.g. a five-run inning.

This is what makes the quantification of defense so difficult. Take, for example, the age-old argument about Derek Jeter. Derek Jeter led all MLB shortstops with a .989 fielding percentage. He was FOURTH WORST among MLB shortstops in DRS, and he was sixth-to-last among MLB shortstops in UZR/150.

I was listening to an interview with former pitcher David Cone. He stated that as a former pitcher, he loved statistics, because he wanted every last piece of information to help him make the smartest decision possible. Cone stated, however, that he wasn’t entirely convinced of the accuracy of defensive metrics. He stated that it’s hard to judge whether the play should have been made by a given fielder on a given ground ball or pop fly, and to judge whether or not a different fielder might have made the play on the same ball.

He also stated that he wasn’t sure if things like sure-handedness were given their proper weight. Cone said that when he was on the mound, he had a strategy of where he wanted the batter to hit the ball. He said that as a pitcher, you feel better having a guy in back of you where you know if you execute your pitch and the ball is hit to a certain fielder, that fielder is going to field the ball and knows what to do with it.

To return to my original point, a central problem with realistically calculating Runs Cost and Runs Saved is that an error, or failure to make a play, could prove harmless; or an error could lead to five runs.

Also, an error or misplay might not have an effect in the inning it occurs, but could impact the game later. For example, during the Cliff Lee’s World Series starts, there were a couple innings where, had there been some better luck and better defensive play, Lee might have been through two innings on his customary 23 pitches. The Giants didn’t score these particular innings, but the extra base-runners meant that rather than sailing along through two innings, Cliff Lee had to face three extra batters, and had to throw 38, rather than 23, pitches.

(These are rough estimates of the pitch count used to illustrate a point. They’re not the exact pitch counts of the game.)

Defense definitely counts, and it certainly pays to evaluate which fielders are helping their teams, and which players are costing their teams, in the field. However, it’s difficult to place an exact number on this.

Someanalyst
Guest
Someanalyst
5 years 6 months ago

That is also true of offensive stats like wOBA, and we rightly love that one. Basically, you’re just saying that defensive metrics are context neutral – which is what most people would want, frankly.

However, with offensive metrics, we have WPA to provide contextual insight. Perhaps a similar metric could be tabulated for defense but the problem of where the defense starts and where the pitcher ends is ever-present.

waynetolleson
Guest
waynetolleson
5 years 6 months ago

“Basically, you’re just saying that defensive metrics are context neutral – which is what most people would want, frankly.”

I’m not sure if that’s exactly what I’m saying. I do see how people would desire Context Neutrality. However, I feel that context is a big part of judging a player’s true defensive capabilities. There are so many factors that can affect how many chances a fielder gets, or whether or not he reaches a ground ball.

A certain fielder might reach a given ball, while another fielder fails to field a ball hit in that exact same location. There are all sorts of game situations that could affect whether or not that ball gets fielded. It’s not just the fielder versus a contextually neutral baseball.

I do believe in defensive metrics in the long-term. For example, if a player is posting between -20 and -30 UZR’s every single season, chances are we’re not talking about a very good defensive ballplayer.

However, if within a single season, one player posts a positive DRS and another player posts a negative DRS at that same position, I couldn’t automatically say that the fielder with the better DRS is, in fact, the better fielder. I’d need to investigate further.

Bascinator
Member
Bascinator
5 years 6 months ago

I never understood why Fangraphs (and several other sites) used UZR as their primary measuring stick for defense and not DRS or a weighted average of the two. I have always believed a combination of these stats is the best way to go, and I would even consider including Total Zone and Revised Zone Rating/Plays out of Zone to get a better overall defensive number. Can anyone explain why UZR is generally viewed as “better” than DRS?

Also, UZR and DRS are about as accurate as an ERA for a pitcher, as explained in the post above, that an error/failed play could be harmless or very harmful.

CFIC
Guest
5 years 6 months ago

how does this effect Brendan Ryan’s WAR? also, many thanks for this. kudos.

CFIC
Guest
5 years 6 months ago

how about averaging UZR and DRS, or somehow waiting them? seems like it would be more accurate that way, since UZR is prone to distortions

pft
Guest
pft
5 years 6 months ago

@CFIC. I agree with you ( got booted off SOSH for making stupid comments like that. LOL)

When you have 2 competing models that give close results in most cases but give different results from time to time, with one model appearing to be right sometimes, and the other model looking right sometimes, and at other times, neither seems right (both high, both low, or one high and one low) the standard practice is to average them for all cases to reduce the error in the outliers.

Unless you have clear and compelling evidence that one model is significantly more accurate than another, you should never choose one model over the other.

I would average DRS and UZR.

Just put me in the stupid camp with CFIC.

Lee Panas
Guest
5 years 6 months ago

I don’t think averaging the results of defensive metrics will give us a more accurate result. Averaging just gives us a more conservative estimate of something which is still pretty uncertain. It dampens the effect of an extreme value for UZR or DRS. I think this is a good idea.

Suppose, a player has a +20 on UZR and a +4 on DRS. What do we do about such uncertainty? We could ignore the fielding metrics and just calculate WAR without them (only giving a player credit for his position). However, it looks like the guy is above average so we want to give him some credit. We just don’t know how far above average.

Automatically using UZR would add 20 runs to his WAR which is a lot. Taking the average of the two (12) gives him credit for his apparently above average defense but we aren’t taking the big leap of saying he adds 20 runs.

pft
Guest
pft
5 years 6 months ago

Yes, that’s the point of averaging them. In your example, either DRS or UZR are wildly inaccurate, and maybe both are off the mark. An average reduces the magnitude of such extreme errors. Of course, if your chosen metric is UZR and DRS was wildly wrong, averaging increases the error, but the magnitude is of the error is tolerable. Unfortunately, we have no way of saying with any certainty which one is correct unless you watched that player in a significant number of games.

Jason Bay in 2009 was a good example. UZR had him at -11 and after revising it he was +2. Using the average approach he would still have been -6 (DRS had him at -2) but that’s closer to the fact he was a league average LF’er. Having watched most of the Red Sox games in 2009, I feel confident this was the case.

gradygradychase
Member
gradygradychase
5 years 6 months ago

Substitute DRS for UZR in always-underrated Chase Utley’s 2008 season, and he becomes a +9.5 WAR player. Ridiculous!!

Ivdown
Guest
Ivdown
5 years 6 months ago

Any idea what that would do for Matt Kemp? I have a hard time believing his defense was that terrible. Yes I watched him play badly in the outfield, but not Adam Dunn bad, which is around what UZR is saying.

wpDiscuz