How reliable is UZR?
Ultimate Zone Rating is now everyone’s favorite defensive stat on the Internet. But how reliable is it?
Let’s define reliable as year to year persistance. It’s not the only definition, but it’ll do. We’ll measure it using weighted correlation coefficient. And for good measure, we’ll take a look at how to use it to regress to the mean. And for free, we’ll throw in a look at wOBA, a form of linear weights per plate appearance, as a point of reference.
In order to get this to work out correctly, we need to convert UZR into a rate stat. (Okay, maybe we don’t, but it’s the only way I’ve found so far.) What I did was take Chris Dial’s methodology for converting STATS Zone Rating into a UZRlike plusminus rating and backcalculate a number that looks like ZR from a player’s UZR and their expected outs.
And now, the results of that trial:
Obs

R

Const

Regress


IF UZR

107

0.36

190

64%

OF UZR

80

0.26

228

74%

wOBA

320

0.53

284

47%

I split UZR up by infield and outfield; it could be productive to further break down by position. I looked at stats from 20022008, yearly totals only. (I did the same with wOBA to provide an apples to apples comparison.)
The first column is the average number of observations – either estimated chances (expected outs divided by average zone rating) for UZR or plate appearances for wOBA. Next is the correlation between one year and the next. (It’s weighted by the harmonic mean of the chances in the two seasons.) Following is the constant number of observations needed to regress to the mean, using this basic formula:
(Player’s Rate * Observations + League Average Rate * Constant) / (Observations + Constant)
And the last column is that expressed as a percent. If that figure was 50%, then for a player with the number of observations in column one, you would put equal weight on his performance and on the league mean to estimate his true talent level.
The takeaway:
 Everything regresses to the mean. A hitter in 300 PAs should be regressed roughly 50% to the mean. (Assuming all you have is those 300 PAs, of course.)
 Defensive metrics are less reliable than offensive metrics. (Which – see above – are not as reliable as they are sometimes treated, when it comes to determining a player’s inherent level of ability.)
 An infielder’s UZR is more reliable than an outfielder’s UZR. This is partly because an outfielder sees fewer chances than an infielder, and partly because outfield defense is more difficult to measure than infield defense.
Print This Post
I like what Adam says. With pitching stats it’s best to use a mix of several measures to get a big picture from several angles. Even more so with fielding. Use UZR, +/, and scouting/eyeball to try to get an overall picture of the player’s defensive skills.
I thought UZR was like ERA: it’s a measure of what did actually happen – not what you should reasonably expect to happen. But, until we develop component fielding statistics, it’s all we have.
Adam, UZR is good guess of what actually happened, not exactly what happened, unlike ERA.
There are limitations of what we can measure right now, so two situations which UZR thinks are the same can vary by a small but significant amount. The biggest issue is that we have pretty good location data, but little knowledge about how long the ball actually took to get where it went.
Year to year correlations will show the reliability be lower than it actually is because of aging. Players on either end of the aging curve are expected to have a change in UZR. This isn’t a full study and we both realize that, but if you want to take it further…
http://www.hardballtimes.com/main/article/fieldingagingcurves/
So, Colin, would the shortcut interpretation be that given a half year’s worth of data, you’d regress it by by the numbers in the last columns, about 45% for hitting, 75% for OF defense, and 65% for infield defense?
I don’t know how UZR is calculated, exactly, but wouldn’t defensive positioning factor into UZR as well? That’s on the coaches. (Consider the Ortiz shift.)
Fielders – especially outfielders – can’t control where the ball is hit, so luck is also a factor to some degree.