I’m surprised this hasn’t been run before. What about if you run it by a minimum of innings played at the position. This might bring in players who have less chances but suffered from poor range scores during certain years.

I split it up by infielder and outfielder, and took a bit of a different approach (I converted UZR to a pseudobinomial first). But the results are pretty consistent with each other. (I got a higher correlation than Dave did, I think, because I put UZR on a similar scale to wOBA – I think if Dave took wRAA instead of wOBA UZR would probably look even better in comparison.)

Those R-squared values are very, very low. And, R-squared tends to increase with more data anyway. Can you try a different metric to see if there is more correlation?

I think a Stats book might be helpful here, too bad I sold mine back to the book store for $7. I don’t think R^2 helps to explain why there may or may not be wild fluctuations from year to year.

And llike David C. pointed out, an R^2 that is low is more important to me than the fact that it is increasing, in fact in typical statistical studies I’ve seen for non baseball related field, any r^2 less than .90 implies that the actual data doesn’t fit with the expected results based on the study parameters…

To those noting the low r-squared values, I think the point of the post was not necessarily to show that the correlation is fantastic, but to show that UZR correlates year-to-year at least as well as wOBA and perhaps other metrics that are considered “reliable”. At least for the last two years.

He’s not using rsquared to show they stay the same year to year as a .90 would. Just that the year to year is as variable as wOBA.

Comment by Troy Patterson — October 3, 2009 @ 8:41 pm

good thing Im taking Stats right now (despite it being a watered down wussy version for MBAs).

I’d suggest expanding this to be a multivariable regression, perhaps by adding age at the very least. That way, the effect of age will be separated out. Then you can look at how much the data fluctuates just from UZR, as opposed to UZR and players getting older.

Comment by Tim_the_Beaver — October 3, 2009 @ 9:06 pm

There’s lots of things you could do to make this analysis a lot more sophisticated, but I think the gist of it would still be the same. We’re looking at players who have only aged 1 year here anyway and while there may very well be a “fielding shelf” at a certain age, most of the players aren’t going to be effected drastically by it in this quick study.

This may mean UZR is as reliable at measuring true talent as wOBA, but it’s still not as good at measuring performance. It may still be the case – it probably is still the case – that fluctuation in batting stats from year to year is due more to random fluctuation in performance, reflected in the metrics, and the fluctuation in defensive stats is due more to the inaccuracy of the metric itself. That is, batting lines just tend to vary more, even considered in context neutral comparisons – more singles, doubles, hrs, one year than the next – than defensive performance, which one suspects is roughly equivalent year to year; and so the noise in the one has a different source from the noise in the other. Even should one doubt the consistency of defensive performance and claim that it is just as unpredictable as offensive, some argument has to be made to that end, to the end of showing that these comparable r^2s suggest that UZR is just as reliable at measuring performance as wOBA.

Of course, none of this applies to true talent. True talent should be constant from year to year (more or less), and whether the source of variation is chance reflected in one’s metric or the inaccuracy of the metric itself doesn’t particularly matter: both screw up one’s ability to detect true talent. Still, this is an important and here unremarked distinction, as outlier UZRs are sometimes used to indicate an extraordinary season (a la outlier wOBAs) and not just its unreliability as a metric (a la outlier RBI totals) – e.g., http://ussmariner.com/2009/09/10/gutierrezs-defense/

No. OOZ (and RZR, as well as ZR) have only one zone for each position, and each batted ball is either in that zone or out of it. UZR divides the whole field into a sort of grid with several smaller zones, and the zones don’t depend on the position. OOZ/RZR considers all balls in the zone the same value and all balls out of the zone the same value. UZR assigns a unique value to each zone for each position based on a number of parameters and how frequently a batted ball in that zone is converted to an out. Technically, there is no such thing as out of zone in UZR since the whole field is divided into zones independent of position, unlike OOZ/RZR zones.

Say, for example, that a medium hit fly ball in a particular zone between center and right is caught 50% of the time by the center fielder and 40% of the time by the right fielder. Each fielder gets .1 plays for making the play (the probability of the play not being made). The center fielder loses .5 plays and the right fielder loses .4 plays if no one makes the play (the probability that their position makes the play on average. If the batted ball is a hard fly ball or a soft fly ball or any type of line drive, the values will be different. Or if the fly ball goes to a different zone. Or if the batter is left- or right-handed. There are also adjustments made to the value for the park and pitcher characteristics.

OOZ/RZR, on the other hand, will take the same fly ball and simply ask, was it in a fielder’s one zone or not? OOZ and RZR are purely binomial where the answer is either yes or no, and that is the only thing the value of the play depends on.

Come on guys, r^2 is not correlation, just r is. And in fact, there could actually be a NEGATIVE correlation (r) in play, but it gets lost in the process of squaring it. IE, an r of 0.8 and an r of -0.8 both have the same r^2.

This is Stat 101, people.

Comment by Matthew Harms — October 4, 2009 @ 2:41 pm

R^2 doesn’t increase with more data, but rather with more variables.

Comment by Matthew Harms — October 4, 2009 @ 2:43 pm

The problem you’re going to run into there is colinearity. Age and UZR in year one are going to be significantly correlated.

Why are you reporting the R^2 anyway, rather than the autocorrelation coefficient in the regression UZR_t = constant + rho*UZR_(t-1) + white noise? You can strip the UZRs of the age effect and other covariates by running a first period regression, and work with the residuals.

As in fielder positioning? Yes and no. In the sense of having actual coordinates or anything like that (i.e. Fielder F/X), no. There is no data on fielder positioning to input, so that is not viable at this point even if you want it. Whether such specifics on positioning are necessary or even better than what UZR currently does is somewhat debatable (depending on how much you want to hold fielders accountable for being positioned well or poorly). Certain adjustments in UZR do account for positioning being different in different situations, though.

For one, it makes adjustments for the base/out state. For example, a double play situation will call for a different handling of a ball in play for the middle infielders than a non-DP situation. In a situation where the first baseman usually holds the runner on, that is accounted for. Positioning is generally different for left- and right-handed hitters, so that adjustment also addresses fielder positioning. If pitcher GB/FB tendencies generally affect positioning, that would be addressed by that adjustment in UZR as well. For players who commonly face obscure shifts (like David Ortiz), the data is just discarded since there is not enough data to establish reliable probabilities for converting outs in that defensive configuration.

David doesn’t have to do much more. He’s already shown the r-squared (I prefer seeing it as r) for UZR and for wOBA.

He should actually do it for UZR/150, since that’s a rate stat like wOBA. No biggie.

Anyway, they both show an r-squared of .24 (roughly r=.50) when UZR chances is a minimum of 150 (I’d like to know the mean) and when PA for wOBA is a minimum of 300 (again, I’d like to know the mean). Say the mean is 250 UZR chances and 450 PA. THAT IS THE EQUIVALENCY.

That is, whatever you think of someone with 450 PA, that’s how you think of someone with 250 UZR chances. They are equally reliable.

I also can’t believe the 450 PA (or whatever it would be). Historically r=.50 when PA is roughly 200 to 300. Perhaps it’s just this year (2008 to 2009) that shows the inconsistency. Obviously, we really need to do this for many years.

When I’ve done the correlation, I get r=.50 for UZR at a mean of around 100 games. (And, as I said about 50 games for wOBA.)

That’s the equivalency that I’ve always been using.

First, as was mentioned, you have to do this with a rate stat like UZR/150 or it is entirely meaningless. The increasing correlation here is likely the direct result of increasing correlation in the number of chances.

Second, even after that correction, changing the sample of players fundamentally changes the analysis. An increasing correlation could be the result of increasing sample size, or it could be the group of players that play frequently are simply more consistent than those that play infrequently.

Wow, this article looks familiar. Oh, I know why. I posted a similar analysis the same day…in fact that “wild fluctuations” note came from me and I posted a retraction with the analysis.

“Vote -1 Vote +1DC Stack says:
October 3, 2009 at 5:44 pm

Wow! Major “my bad” on my part. This is what happens when you get too cocky about your own abilities to discern fact from fiction. Right after I read your reply I realized I was doing what so many amateurs do. I was making a statement of fact based on a hand selected few players. I knew at that point I needed to come back and say I was wrong. But before doing that I wanted to see if I was wrong in my methods but right in my conclusions. Well I was wrong in both.

I decided to do a simple test about the serial reliability of UZR. I used UZR/150 for all eligible players from 2008 and 2009. There were 83 players eligible from both years. I ran a simple correlation between their 2008 and 2009 numbers. It came back with a fat Pearson’s r of .729. I could already feel the egg on my face. I then wanted to see if this is bigger/smaller/same as other stats that are far less controversial. I did the same procedure for OPS (unadjusted). The correlation came back in the low .5s. Not only is UZR consistent from year to year, it is more consistent than OPS – at least in the two years I looked at.

HUGE CAVEAT: This is a very small sample size. To get a better feel for the reliability of this metric this really should be done across multiple years. My analysis took me about 15 minutes and that was all I was willing to dedicate to it. If someone wants to take an hour or so to do the full proper tests I say go for it.”

Reliability is only one measure of a statistic. All good statistics need to be two things: reliable and unbiased. My analysis and David’s analysis only prove that UZR is reliable. It says nothing about whether the statistic is unbiased. As a statistician I can work with an unreliable but unbiased metric. Noise is just noise that can be overcome with sample size. However, a metric that is reliable but biased is very dangerous when the size of the bias is unknown. It gives the impression of being good when in fact it may be awful.

We have no way of knowing how biased UZR is at the moment. It may have very little bias in it which would mean we have a reliable and unbiased measure – the gold standard in statistics. Or it could be reliable and very biased which would be simply fools gold.

There’s no reason to suspect bias, when MGL (intelligently) controls for as many parameters as he does. Read his two-part UZR artticles from several years back.

Comment by Rodney King — October 5, 2009 @ 7:01 pm

You are being argumentative. I already said that once you control (intelligently) for a series of parameters, parameters that do show bias, what you are left with is an unbiased estimator.

So, there is a reason NOT to suspect bias: he handles the parameters in an intelligent fashion.

And, even by some reason you want to pull out of thin air that there is bias, how much of an effect can there by, once you handled all the big ones? One run?

I think part of the problem is that people “perceive” the fluctuations in UZR to be larger than those in, say, batting RAR because the scale is so tightly-packed.

For instance, it’s not uncommon for a fielder to post a +5 UZR one year and a -5 UZR the next. It’s also not uncommon for a hitter to give you 45 batting RAR one year and 55 the next. Because of the way the two stats are scaled, the difference in UZR jumps out at you way more than the difference in batting RAR does.

Quick question: has anyone looked into UZR and park factors? For instance, if you put Adam Dunn in Fenway’s left field, would his UZR still be historically bad?

In the past, I have looked at the correlations for pitching stats. I got r= .33 for ERA (R2=.11) and .53 for FIP (R2=.28). So, the reliability of UZR is similar to FIP and significantly better than ERA. I think if there is concern that UZR is not reliable, then there should be similar or greater concern about common pitching stats.

I’m surprised this hasn’t been run before. What about if you run it by a minimum of innings played at the position. This might bring in players who have less chances but suffered from poor range scores during certain years.

Comment by Troy Patterson — October 3, 2009 @ 7:25 pm

How many innings does a player need under his belt for UZR to be considered a decent tool to evaluate defensive ability?

Comment by Slick1 — October 3, 2009 @ 7:33 pm

It actually has, Troy. I did something similar for THT a while back:

http://www.hardballtimes.com/main/blog_article/how-reliable-is-uzr/

I split it up by infielder and outfielder, and took a bit of a different approach (I converted UZR to a pseudobinomial first). But the results are pretty consistent with each other. (I got a higher correlation than Dave did, I think, because I put UZR on a similar scale to wOBA – I think if Dave took wRAA instead of wOBA UZR would probably look even better in comparison.)

Comment by Colin Wyers — October 3, 2009 @ 7:33 pm

Those R-squared values are very, very low. And, R-squared tends to increase with more data anyway. Can you try a different metric to see if there is more correlation?

Comment by David C. — October 3, 2009 @ 7:43 pm

I think a Stats book might be helpful here, too bad I sold mine back to the book store for $7. I don’t think R^2 helps to explain why there may or may not be wild fluctuations from year to year.

And llike David C. pointed out, an R^2 that is low is more important to me than the fact that it is increasing, in fact in typical statistical studies I’ve seen for non baseball related field, any r^2 less than .90 implies that the actual data doesn’t fit with the expected results based on the study parameters…

Comment by bobo — October 3, 2009 @ 7:54 pm

To those noting the low r-squared values, I think the point of the post was not necessarily to show that the correlation is fantastic, but to show that UZR correlates year-to-year at least as well as wOBA and perhaps other metrics that are considered “reliable”. At least for the last two years.

Comment by tyrone — October 3, 2009 @ 8:41 pm

He’s not using rsquared to show they stay the same year to year as a .90 would. Just that the year to year is as variable as wOBA.

Comment by Troy Patterson — October 3, 2009 @ 8:41 pm

good thing Im taking Stats right now (despite it being a watered down wussy version for MBAs).

I’d suggest expanding this to be a multivariable regression, perhaps by adding age at the very least. That way, the effect of age will be separated out. Then you can look at how much the data fluctuates just from UZR, as opposed to UZR and players getting older.

Comment by Tim_the_Beaver — October 3, 2009 @ 9:06 pm

There’s lots of things you could do to make this analysis a lot more sophisticated, but I think the gist of it would still be the same. We’re looking at players who have only aged 1 year here anyway and while there may very well be a “fielding shelf” at a certain age, most of the players aren’t going to be effected drastically by it in this quick study.

Comment by David Appelman — October 3, 2009 @ 9:23 pm

Colin has way more statistical chops than I do, so in terms of accuracy I would defer to his study, but it’s good to know we’re both on the same page.

Comment by David Appelman — October 3, 2009 @ 10:16 pm

THT uses the “OOZ” stat (out of zone).

Is this the same “zone” you guys are using for UZR, or do the methodologies differ?

Comment by CH — October 4, 2009 @ 1:24 am

This may mean UZR is as reliable at measuring true talent as wOBA, but it’s still not as good at measuring performance. It may still be the case – it probably is still the case – that fluctuation in batting stats from year to year is due more to random fluctuation in performance, reflected in the metrics, and the fluctuation in defensive stats is due more to the inaccuracy of the metric itself. That is, batting lines just tend to vary more, even considered in context neutral comparisons – more singles, doubles, hrs, one year than the next – than defensive performance, which one suspects is roughly equivalent year to year; and so the noise in the one has a different source from the noise in the other. Even should one doubt the consistency of defensive performance and claim that it is just as unpredictable as offensive, some argument has to be made to that end, to the end of showing that these comparable r^2s suggest that UZR is just as reliable at measuring performance as wOBA.

Of course, none of this applies to true talent. True talent should be constant from year to year (more or less), and whether the source of variation is chance reflected in one’s metric or the inaccuracy of the metric itself doesn’t particularly matter: both screw up one’s ability to detect true talent. Still, this is an important and here unremarked distinction, as outlier UZRs are sometimes used to indicate an extraordinary season (a la outlier wOBAs) and not just its unreliability as a metric (a la outlier RBI totals) – e.g., http://ussmariner.com/2009/09/10/gutierrezs-defense/

Comment by Hejuk — October 4, 2009 @ 3:48 am

No. OOZ (and RZR, as well as ZR) have only one zone for each position, and each batted ball is either in that zone or out of it. UZR divides the whole field into a sort of grid with several smaller zones, and the zones don’t depend on the position. OOZ/RZR considers all balls in the zone the same value and all balls out of the zone the same value. UZR assigns a unique value to each zone for each position based on a number of parameters and how frequently a batted ball in that zone is converted to an out. Technically, there is no such thing as out of zone in UZR since the whole field is divided into zones independent of position, unlike OOZ/RZR zones.

Say, for example, that a medium hit fly ball in a particular zone between center and right is caught 50% of the time by the center fielder and 40% of the time by the right fielder. Each fielder gets .1 plays for making the play (the probability of the play not being made). The center fielder loses .5 plays and the right fielder loses .4 plays if no one makes the play (the probability that their position makes the play on average. If the batted ball is a hard fly ball or a soft fly ball or any type of line drive, the values will be different. Or if the fly ball goes to a different zone. Or if the batter is left- or right-handed. There are also adjustments made to the value for the park and pitcher characteristics.

OOZ/RZR, on the other hand, will take the same fly ball and simply ask, was it in a fielder’s one zone or not? OOZ and RZR are purely binomial where the answer is either yes or no, and that is the only thing the value of the play depends on.

Comment by Kincaid — October 4, 2009 @ 12:29 pm

Come on guys, r^2 is not correlation, just r is. And in fact, there could actually be a NEGATIVE correlation (r) in play, but it gets lost in the process of squaring it. IE, an r of 0.8 and an r of -0.8 both have the same r^2.

This is Stat 101, people.

Comment by Matthew Harms — October 4, 2009 @ 2:41 pm

R^2 doesn’t increase with more data, but rather with more variables.

Comment by Matthew Harms — October 4, 2009 @ 2:43 pm

The problem you’re going to run into there is colinearity. Age and UZR in year one are going to be significantly correlated.

Comment by Colin Wyers — October 4, 2009 @ 5:17 pm

Why are you reporting the R^2 anyway, rather than the autocorrelation coefficient in the regression UZR_t = constant + rho*UZR_(t-1) + white noise? You can strip the UZRs of the age effect and other covariates by running a first period regression, and work with the residuals.

Comment by Sam — October 4, 2009 @ 9:22 pm

Should read “You can strip the UZRs of the age effect and other covariates by running a first STAGE regression, and work with the residuals.”

Comment by Sam — October 4, 2009 @ 9:53 pm

Does UZR know how to adjust for starting position?

Comment by Kevin S. — October 4, 2009 @ 10:40 pm

As in fielder positioning? Yes and no. In the sense of having actual coordinates or anything like that (i.e. Fielder F/X), no. There is no data on fielder positioning to input, so that is not viable at this point even if you want it. Whether such specifics on positioning are necessary or even better than what UZR currently does is somewhat debatable (depending on how much you want to hold fielders accountable for being positioned well or poorly). Certain adjustments in UZR do account for positioning being different in different situations, though.

For one, it makes adjustments for the base/out state. For example, a double play situation will call for a different handling of a ball in play for the middle infielders than a non-DP situation. In a situation where the first baseman usually holds the runner on, that is accounted for. Positioning is generally different for left- and right-handed hitters, so that adjustment also addresses fielder positioning. If pitcher GB/FB tendencies generally affect positioning, that would be addressed by that adjustment in UZR as well. For players who commonly face obscure shifts (like David Ortiz), the data is just discarded since there is not enough data to establish reliable probabilities for converting outs in that defensive configuration.

Comment by Kincaid — October 4, 2009 @ 11:16 pm

Putting time series class to use.

Comment by John C — October 4, 2009 @ 11:26 pm

David doesn’t have to do much more. He’s already shown the r-squared (I prefer seeing it as r) for UZR and for wOBA.

He should actually do it for UZR/150, since that’s a rate stat like wOBA. No biggie.

Anyway, they both show an r-squared of .24 (roughly r=.50) when UZR chances is a minimum of 150 (I’d like to know the mean) and when PA for wOBA is a minimum of 300 (again, I’d like to know the mean). Say the mean is 250 UZR chances and 450 PA. THAT IS THE EQUIVALENCY.

That is, whatever you think of someone with 450 PA, that’s how you think of someone with 250 UZR chances. They are equally reliable.

I also can’t believe the 450 PA (or whatever it would be). Historically r=.50 when PA is roughly 200 to 300. Perhaps it’s just this year (2008 to 2009) that shows the inconsistency. Obviously, we really need to do this for many years.

When I’ve done the correlation, I get r=.50 for UZR at a mean of around 100 games. (And, as I said about 50 games for wOBA.)

That’s the equivalency that I’ve always been using.

Comment by tangotiger — October 5, 2009 @ 12:39 am

First, as was mentioned, you have to do this with a rate stat like UZR/150 or it is entirely meaningless. The increasing correlation here is likely the direct result of increasing correlation in the number of chances.

Second, even after that correction, changing the sample of players fundamentally changes the analysis. An increasing correlation could be the result of increasing sample size, or it could be the group of players that play frequently are simply more consistent than those that play infrequently.

Comment by Ken — October 5, 2009 @ 1:16 am

Wow, this article looks familiar. Oh, I know why. I posted a similar analysis the same day…in fact that “wild fluctuations” note came from me and I posted a retraction with the analysis.

“Vote -1 Vote +1DC Stack says:

October 3, 2009 at 5:44 pm

Wow! Major “my bad” on my part. This is what happens when you get too cocky about your own abilities to discern fact from fiction. Right after I read your reply I realized I was doing what so many amateurs do. I was making a statement of fact based on a hand selected few players. I knew at that point I needed to come back and say I was wrong. But before doing that I wanted to see if I was wrong in my methods but right in my conclusions. Well I was wrong in both.

I decided to do a simple test about the serial reliability of UZR. I used UZR/150 for all eligible players from 2008 and 2009. There were 83 players eligible from both years. I ran a simple correlation between their 2008 and 2009 numbers. It came back with a fat Pearson’s r of .729. I could already feel the egg on my face. I then wanted to see if this is bigger/smaller/same as other stats that are far less controversial. I did the same procedure for OPS (unadjusted). The correlation came back in the low .5s. Not only is UZR consistent from year to year, it is more consistent than OPS – at least in the two years I looked at.

HUGE CAVEAT: This is a very small sample size. To get a better feel for the reliability of this metric this really should be done across multiple years. My analysis took me about 15 minutes and that was all I was willing to dedicate to it. If someone wants to take an hour or so to do the full proper tests I say go for it.”

Comment by DC Stack — October 5, 2009 @ 5:03 pm

Reliability is only one measure of a statistic. All good statistics need to be two things: reliable and unbiased. My analysis and David’s analysis only prove that UZR is reliable. It says nothing about whether the statistic is unbiased. As a statistician I can work with an unreliable but unbiased metric. Noise is just noise that can be overcome with sample size. However, a metric that is reliable but biased is very dangerous when the size of the bias is unknown. It gives the impression of being good when in fact it may be awful.

We have no way of knowing how biased UZR is at the moment. It may have very little bias in it which would mean we have a reliable and unbiased measure – the gold standard in statistics. Or it could be reliable and very biased which would be simply fools gold.

Comment by DC Stack — October 5, 2009 @ 5:28 pm

There’s no reason to suspect bias, when MGL (intelligently) controls for as many parameters as he does. Read his two-part UZR artticles from several years back.

Comment by tangotiger — October 5, 2009 @ 6:19 pm

But there’s also no reason NOT to suspect bias…

Comment by Rodney King — October 5, 2009 @ 7:01 pm

You are being argumentative. I already said that once you control (intelligently) for a series of parameters, parameters that do show bias, what you are left with is an unbiased estimator.

So, there is a reason NOT to suspect bias: he handles the parameters in an intelligent fashion.

And, even by some reason you want to pull out of thin air that there is bias, how much of an effect can there by, once you handled all the big ones? One run?

Comment by tangotiger — October 5, 2009 @ 4:57 pm

I think part of the problem is that people “perceive” the fluctuations in UZR to be larger than those in, say, batting RAR because the scale is so tightly-packed.

For instance, it’s not uncommon for a fielder to post a +5 UZR one year and a -5 UZR the next. It’s also not uncommon for a hitter to give you 45 batting RAR one year and 55 the next. Because of the way the two stats are scaled, the difference in UZR jumps out at you way more than the difference in batting RAR does.

Quick question: has anyone looked into UZR and park factors? For instance, if you put Adam Dunn in Fenway’s left field, would his UZR still be historically bad?

Comment by Adam W — October 8, 2009 @ 1:57 pm

In the past, I have looked at the correlations for pitching stats. I got r= .33 for ERA (R2=.11) and .53 for FIP (R2=.28). So, the reliability of UZR is similar to FIP and significantly better than ERA. I think if there is concern that UZR is not reliable, then there should be similar or greater concern about common pitching stats.

Lee

Comment by Lee Panas — October 17, 2009 @ 7:32 pm