Aggregate Defense Evaluations
There’s no denying defensive metrics are controversial. Whether they clash with what you’ve seen with your own eyes, or you just don’t believe them, it seems like everyone has some sort of opinion to offer on their validity.
On FanGraphs, we carry no less than four different defensive metrics:
UZR – Mitchel Lichtman’s Ultimate Zone Rating
DRS – John Dewan’s Defensive Runs Saved
TZL – Rally’s Total Zone (location based version)
TZ – Rally’s Total Zone (standard version)
There’s no denying that we use some more frequently than others (cough, UZR), but the reason we have all four is because it’s great to see what different data sets and different models spit out. And In addition to the four, there’s also a fifth completely unrelated metric in the Fans Scouts Report that is run each and every year on insidethebook.com by Tangotiger.
It’s important to note that all these defensive metrics are not on the same scale, so it’s difficult to glance at all four (five if you use the Fans Scouting Report) and get a good sense if they’re in agreement or not. Which brings me to the preliminary look at the Aggregate Defensive Evaluations, where each metric is put on the same scale for each position, averaged, and then a standard deviation is computed for each player. Here are the 2009 Shortstops (min 82 games played):

As you can see, Paul Janish and Brendan Ryan are the clear leaders atop the list and even all the metrics are for the most part in agreement. +/- 5 runs in either direction is still going to make them elite defenders.
And there are players like Yunel Escobar who is considered by Total Zone and DRS to be very good, but by UZR and the Fans to be more or less average. On an aggregate level he still ends up as very good, though there is a good amount of disagreement as to just how good he is, even if no system thinks he’s below average.
All in all, it should be easy to go up and down the list and see which players there’s a high level of confidence about defensively, and which there is not.
From a mere computational standpoint, is this the best way to go about combining defensive metrics? I’m really not sure and it’s certainly worth looking into further. There are a lot of options in weighting the metrics differently and how to scale them, but overall I feel this is at least a decent start and something I hope to delve into a bit more.
The point here is that there’s a lot of information in these metrics with so many models out there it’s becoming increasingly important to try and identify what we’re fairly confident about and what we’re not so confident about instead of making the mistake of throwing them all away.












1

Cliff Pennington?
82 game cutoff.
Yuniesky Betancourt?
Hmm… seems like some players who switched teams got left off, will fix in a moment.
Oh God! Its too horrible to look at! Please take Betancourt off again!
Oh, just relax, avert your eyes, and Trust the Process.
Great stuff, David.
Alex Gonzalez (TOR/ATL) is another who’s been left off.
As a Jays fan, it’s great to see that Yunel Escobar has a good glove. Love that kid
I love it. Keep those stats comin’.
Any metric that doesn’t have Escobar as an above average SS this year needs to be tossed. He’s been the best defensive shortstop in both leagues and I don’t see an argument otherwise. Dewan’s +/- has him at +32
A shame the Braves’ traded him because their “win now” plan was to get worse offensively and defensively.
It’s a list of 2009 stats.
Dewan’s +/- is the rDRS listed above. Typically, Dewan’s/Fielding Bible/ +/- is listed in terms of PLAYS made, not RUNS. This is what David was referencing above and why this exercise is so interesting.
I was going to say the same about Alexei Ramirez. He’s been an outstanding shortstop this year and even had a post dedicated to his great defense here at Fangraphs. Then I noticed it’s ’09 stats. be interesting to see the ’10 numbers.
How can I fimilarize myself with this metrics ? Any good articles/books?
Here are a few places to look. You’ll notice that most of the systems are trying to do mostly the same thing, using different data sources.
UZR: http://www.fangraphs.com/blogs/index.php/the-fangraphs-uzr-primer/
+/- DRS: http://www.billjamesonline.net/fieldingbible/Fielding-Bible-FAQ.asp
-John Dewan has also published The Fielding Bible (I’m not sure how many volumes there are now), where he introduces and describes plus/minus.
TZ: http://www.baseball-reference.com/about/total_zone.shtml
-Total Zone was primarily developed for creating defensive estimates from limited data, historically through Retrosheet and also with less detailed minor league data.
TZL: http://www.baseballprojection.com/articles/tz_hitlocation.htm
-Recent revisions to TZ, for seasons where hit location data is available (and reliable).
Fans’ Scouting Report: http://www.tangotiger.net/scout/
-An attempt to use the wisdom of crowds to rate fielders.
Thanks for your help guys
Ray, I have a couple of chapters in my book “Beyond Batting Average” covering these and other defensive metrics in detail. You can order it at Amazon.com. Or you can get a cheaper pdf at Lulu.com.
I just finished an article “Measuring Defense: Entering the Zones of Fielding Statistics” for SABR’s Baseball Research Journal that includes a comprehensive overview of every statistic related to defense.
To order a copy of the journal, go to sabr.org.
There are 2 versions of The Fielding Bible out now by John Dewan/Baseball Info Solutions.
Fantastic. Dave, has any thought been given to plugging this into the WAR calculus rather than UZR?
1) What would a correlation matrix look like?
2) Any qualms about using equal weighting for both version of Total Zone as well as the others which only have one version listed? Doing this might skew the analysis towards TZ numbers (I’m assuming those two are higher correlated than the others, but if not than this isn’t as much of a concern)
As the creator of Total Zone, yes indeed.
If you have TZL, it tells you everything that old TZ could tell you about the players, plus more. I would use only TZL for years where it is available.
I think adding Chris Dial’s zone rating based stuff would be a good addition as well, if possible.
Well, it confirms the suspicion of some Reds fans that playing Janish over Cabrera may be one of the single biggest upgrades at the team’s disposal. I’m assuming that Cabrera is getting dinged most for his range in these measurements, as in his defense he does seem to field what he gets to and his arm is solid average. Cabrera isn’t exactly inspiring with the bat this year either, so any hope that he has an edge over Janish there is probably a pipe dream as well.
The first thing that stands out by running down the STDEV column is Jack Wilson. DRS has him as #1 while FSR has him as #24, with a discrepancy of 39(!) runs. To a lesser extent the same can be said for Everth Cabrera, who is ranked tied for #7 by FSR and no higher than #24 by anyone else. Both cases are extremely alarming, so I looked at the numbers to see which rating scheme was the farthest from the average for each player.
Using the absolute deviation from the average and assigning 1 for each case (0.5/0.5 for ties), I get the following number of players on which each metric was farthest:
FSR: 12
UZR: 7
TZ, TZL, RDS: 3
The sum of the absolute deviations:
FSR: 160.6
UZR: 118.4
TZ: 84.6
DRS: 80.8
TZL: 80
The standard deviation of the raw deviations:
FSR: 8.09
UZR: 5.37
DRS: 4.09
TZL: 3.90
TZ: 3.62
.
In sum, FSR is clearly the most independent of the five metrics. This does not mean that it is the most inaccurate, but it may be best to remove it from further aggregate studies until the source and quality of this independence are discovered.
excellent post.
I would heartily endorse substituting some kind of aggregate measure for UZR in the WAR calculus. Considering the way people slice and dice a 3.7 WAR player versus a 3.4 WAR guy he was traded for, it would be nice to have more confidence in that component of the equation.
That made me chuckle. Well played,
MauerEric.Following up on Eric’s comment, you need to do something to de-weight the outliers on the assumption that if one metric is completely out of line of the other 4, then it’s messed up. Something like bisquare weights since you have to few datapoints to do anything else. Either that or throw out FSR which is a pretty different class of metric and perhaps shouldn’t even be included.
That’s only if you have reason to think that singular outliers are less, rather than more, accurate than the consensus. Of course the entire premise here is the idea that consensus among defensive metrics means consensus around the “correct” measurement.
Outliers are outliers – if one of the #s is >3 SD away from the others you can justifiably do something with it as is done across all sciences. If you think one outlier metric is somehow more correct than the other 4 consensus measures that are grouped together, you have bigger problems.
Well, let’s be honest. All of the other metrics have shared biases. UZR and +/- share the same data source. Both variants of TotalZone obviously have commonalities. And all four of them seem to be subject to a “range bias” in the evaluation of fielding.
In that context, I think the outlier nature of the Scouting Report is interesting, not evidence that it should be thrown out.
Do you see any reasons not to give any preference to the aggregate over using separate systems? If so, do you think a weighted aggregate would be a viable solution? I often reference UZR and occasionally look at other systems for reference but I don’t have great confidence in anyone system being “run-level” accurate. Therefore the UZR component of WAR is questionable to me, I would think using an aggregate would be an appropriate regression tool and increase the confidence we have in the stats as accurate to real production. I would love to hear more on this.
I’ve always favored an index approach when it comes to fielding metrics. I’m glad to see it could potentially be implemented and that someone agrees with me.
Has anybody every come up with a defensive statistic that compares location of fielding the ball and time it takes to field that ball?
It seems that the “range” of a player should simply be a distance formula of starting position to where the ball was fielded. But this “range” can be greatly affected by how hard the ball is hit. A ball smoked up the middle is almost guaranteed to be a base hit, while a slow chopper could be fielded, and if its a slow runner, there is potential for an out.
I guess I am trying to find out if anyone has ever tried comparing the distance a player can successfully field a ball and throw out a runner to the amount of time that it takes to make the play?
I feel that by using time as a metric it will isolate a lot of the skill of defense to the actual defender instead of original positioning of the defender (i.e. calculating for the Big Papi shift), or how fast the batter-runner is.
This kind of data might not be out there, but I would love a heads up if anyone has heard of this kind of time based analysis.
Jamie,
Tangotiger has been complaining about the lack of timing data for a long time. No one’s recording it. The stringers do try to record how hard the ball was hit, but that’s quite inferior.
Janish has three nipples.