FanGraphs Logo

Aggregate Defense Evaluations

There’s no denying defensive metrics are controversial. Whether they clash with what you’ve seen with your own eyes, or you just don’t believe them, it seems like everyone has some sort of opinion to offer on their validity.

On FanGraphs, we carry no less than four different defensive metrics:

UZR – Mitchel Lichtman’s Ultimate Zone Rating
DRS – John Dewan’s Defensive Runs Saved
TZL – Rally’s Total Zone (location based version)
TZ – Rally’s Total Zone (standard version)

There’s no denying that we use some more frequently than others (cough, UZR), but the reason we have all four is because it’s great to see what different data sets and different models spit out. And In addition to the four, there’s also a fifth completely unrelated metric in the Fans Scouts Report that is run each and every year on insidethebook.com by Tangotiger.

It’s important to note that all these defensive metrics are not on the same scale, so it’s difficult to glance at all four (five if you use the Fans Scouting Report) and get a good sense if they’re in agreement or not. Which brings me to the preliminary look at the Aggregate Defensive Evaluations, where each metric is put on the same scale for each position, averaged, and then a standard deviation is computed for each player. Here are the 2009 Shortstops (min 82 games played):

As you can see, Paul Janish and Brendan Ryan are the clear leaders atop the list and even all the metrics are for the most part in agreement. +/- 5 runs in either direction is still going to make them elite defenders.

And there are players like Yunel Escobar who is considered by Total Zone and DRS to be very good, but by UZR and the Fans to be more or less average. On an aggregate level he still ends up as very good, though there is a good amount of disagreement as to just how good he is, even if no system thinks he’s below average.

All in all, it should be easy to go up and down the list and see which players there’s a high level of confidence about defensively, and which there is not.

From a mere computational standpoint, is this the best way to go about combining defensive metrics? I’m really not sure and it’s certainly worth looking into further. There are a lot of options in weighting the metrics differently and how to scale them, but overall I feel this is at least a decent start and something I hope to delve into a bit more.

The point here is that there’s a lot of information in these metrics with so many models out there it’s becoming increasingly important to try and identify what we’re fairly confident about and what we’re not so confident about instead of making the mistake of throwing them all away.




Print This Post

David Appelman is the creator of FanGraphs.

35 Responses to “Aggregate Defense Evaluations”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. Billy Bob says:

    Cliff Pennington?

    Vote -1 Vote +1

  2. robneyer says:

    Yuniesky Betancourt?

    Vote -1 Vote +1

  3. Erik says:

    Great stuff, David.

    Vote -1 Vote +1

  4. OT says:

    Alex Gonzalez (TOR/ATL) is another who’s been left off.

    As a Jays fan, it’s great to see that Yunel Escobar has a good glove. Love that kid

    Vote -1 Vote +1

  5. Lucas says:

    I love it. Keep those stats comin’.

    Vote -1 Vote +1

  6. Russell says:

    Any metric that doesn’t have Escobar as an above average SS this year needs to be tossed. He’s been the best defensive shortstop in both leagues and I don’t see an argument otherwise. Dewan’s +/- has him at +32

    Vote -1 Vote +1

    • Bronnt says:

      A shame the Braves’ traded him because their “win now” plan was to get worse offensively and defensively.

      Vote -1 Vote +1

    • Temo says:

      It’s a list of 2009 stats.

      Vote -1 Vote +1

    • alskor says:

      Dewan’s +/- is the rDRS listed above. Typically, Dewan’s/Fielding Bible/ +/- is listed in terms of PLAYS made, not RUNS. This is what David was referencing above and why this exercise is so interesting.

      Vote -1 Vote +1

    • MikeS says:

      I was going to say the same about Alexei Ramirez. He’s been an outstanding shortstop this year and even had a post dedicated to his great defense here at Fangraphs. Then I noticed it’s ’09 stats. be interesting to see the ’10 numbers.

      Vote -1 Vote +1

  7. ray guy says:

    How can I fimilarize myself with this metrics ? Any good articles/books?

    Vote -1 Vote +1

  8. The Hit Dog says:

    Fantastic. Dave, has any thought been given to plugging this into the WAR calculus rather than UZR?

    Vote -1 Vote +1

  9. glassSheets says:

    1) What would a correlation matrix look like?
    2) Any qualms about using equal weighting for both version of Total Zone as well as the others which only have one version listed? Doing this might skew the analysis towards TZ numbers (I’m assuming those two are higher correlated than the others, but if not than this isn’t as much of a concern)

    Vote -1 Vote +1

    • Rally says:

      As the creator of Total Zone, yes indeed.

      If you have TZL, it tells you everything that old TZ could tell you about the players, plus more. I would use only TZL for years where it is available.

      I think adding Chris Dial’s zone rating based stuff would be a good addition as well, if possible.

      Vote -1 Vote +1

  10. E Dub says:

    Well, it confirms the suspicion of some Reds fans that playing Janish over Cabrera may be one of the single biggest upgrades at the team’s disposal. I’m assuming that Cabrera is getting dinged most for his range in these measurements, as in his defense he does seem to field what he gets to and his arm is solid average. Cabrera isn’t exactly inspiring with the bat this year either, so any hope that he has an edge over Janish there is probably a pipe dream as well.

    Vote -1 Vote +1

  11. Eric says:

    The first thing that stands out by running down the STDEV column is Jack Wilson. DRS has him as #1 while FSR has him as #24, with a discrepancy of 39(!) runs. To a lesser extent the same can be said for Everth Cabrera, who is ranked tied for #7 by FSR and no higher than #24 by anyone else. Both cases are extremely alarming, so I looked at the numbers to see which rating scheme was the farthest from the average for each player.

    Using the absolute deviation from the average and assigning 1 for each case (0.5/0.5 for ties), I get the following number of players on which each metric was farthest:
    FSR: 12
    UZR: 7
    TZ, TZL, RDS: 3

    The sum of the absolute deviations:
    FSR: 160.6
    UZR: 118.4
    TZ: 84.6
    DRS: 80.8
    TZL: 80

    The standard deviation of the raw deviations:
    FSR: 8.09
    UZR: 5.37
    DRS: 4.09
    TZL: 3.90
    TZ: 3.62

    .

    In sum, FSR is clearly the most independent of the five metrics. This does not mean that it is the most inaccurate, but it may be best to remove it from further aggregate studies until the source and quality of this independence are discovered.

    Vote -1 Vote +1

    • batpig says:

      excellent post.

      I would heartily endorse substituting some kind of aggregate measure for UZR in the WAR calculus. Considering the way people slice and dice a 3.7 WAR player versus a 3.4 WAR guy he was traded for, it would be nice to have more confidence in that component of the equation.

      Vote -1 Vote +1

    • joser says:

      until the source and quality of this independence are discovered.

      That made me chuckle. Well played, Mauer Eric.

      Vote -1 Vote +1

  12. mettle says:

    Following up on Eric’s comment, you need to do something to de-weight the outliers on the assumption that if one metric is completely out of line of the other 4, then it’s messed up. Something like bisquare weights since you have to few datapoints to do anything else. Either that or throw out FSR which is a pretty different class of metric and perhaps shouldn’t even be included.

    Vote -1 Vote +1

    • Colin Wyers says:

      That’s only if you have reason to think that singular outliers are less, rather than more, accurate than the consensus. Of course the entire premise here is the idea that consensus among defensive metrics means consensus around the “correct” measurement.

      Vote -1 Vote +1

      • mettle says:

        Outliers are outliers – if one of the #s is >3 SD away from the others you can justifiably do something with it as is done across all sciences. If you think one outlier metric is somehow more correct than the other 4 consensus measures that are grouped together, you have bigger problems.

        Vote -1 Vote +1

      • Colin Wyers says:

        Well, let’s be honest. All of the other metrics have shared biases. UZR and +/- share the same data source. Both variants of TotalZone obviously have commonalities. And all four of them seem to be subject to a “range bias” in the evaluation of fielding.

        In that context, I think the outlier nature of the Scouting Report is interesting, not evidence that it should be thrown out.

        Vote -1 Vote +1

  13. Matt S says:

    Do you see any reasons not to give any preference to the aggregate over using separate systems? If so, do you think a weighted aggregate would be a viable solution? I often reference UZR and occasionally look at other systems for reference but I don’t have great confidence in anyone system being “run-level” accurate. Therefore the UZR component of WAR is questionable to me, I would think using an aggregate would be an appropriate regression tool and increase the confidence we have in the stats as accurate to real production. I would love to hear more on this.

    Vote -1 Vote +1

  14. E-6 says:

    I’ve always favored an index approach when it comes to fielding metrics. I’m glad to see it could potentially be implemented and that someone agrees with me.

    Vote -1 Vote +1

  15. Jamie says:

    Has anybody every come up with a defensive statistic that compares location of fielding the ball and time it takes to field that ball?

    It seems that the “range” of a player should simply be a distance formula of starting position to where the ball was fielded. But this “range” can be greatly affected by how hard the ball is hit. A ball smoked up the middle is almost guaranteed to be a base hit, while a slow chopper could be fielded, and if its a slow runner, there is potential for an out.

    I guess I am trying to find out if anyone has ever tried comparing the distance a player can successfully field a ball and throw out a runner to the amount of time that it takes to make the play?

    I feel that by using time as a metric it will isolate a lot of the skill of defense to the actual defender instead of original positioning of the defender (i.e. calculating for the Big Papi shift), or how fast the batter-runner is.

    This kind of data might not be out there, but I would love a heads up if anyone has heard of this kind of time based analysis.

    Vote -1 Vote +1

    • Patrick says:

      Jamie,

      Tangotiger has been complaining about the lack of timing data for a long time. No one’s recording it. The stringers do try to record how hard the ball was hit, but that’s quite inferior.

      Vote -1 Vote +1

  16. LOL says:

    Janish has three nipples.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Player Linker - Contact Us - Advertise - Terms of Service - Privacy Policy