The FanGraphs UZR Primer

Index

1. Introduction
2. Fielder Credits and Debits
3. Batted Ball Types
4. GDP and Arm
5. Base Runner and Outs Adjustments
6. Other Infield Positioning
7. Outfield Positioning and “Wall Balls”
8. Infield Line Drives and Pop-ups
9. Park Adjustments
10. The Baselines
11. Recent Revisions
12. Sample Size and Reliability
13. UZR and Aging
14. Consistency from Year to Year
15. Does UZR tell us what actually happened on the field?
16. Conclusions

Introduction

As many of you already know, UZR is an advanced defensive metric that uses play-by-play data recorded by Baseball Info Solutions (BIS) to estimate each fielder’s defensive contribution in theoretical runs above or below an average fielder at his position in that player’s league and year. Thus, a SS with a UZR of zero is exactly average as compared to a SS in the same year and in the same league. If his UZR is plus, he is above average, and if it is minus, he is below average.

It is similar to offensive linear weights, where each event is assigned a number of runs, or fraction of a run, which is equal to the average value of that event as compared to a generic PA, generally for that year and for that league. With UZR and offensive linear weights a player gets credit for the theoretical value of an event (for UZR, those events are turning a batted ball into an out, allowing a batted ball to drop for a hit, making an error – or a fielder’s choice – that allows the batter to reach base, or making an error that allows a base runner to advance one or more bases) rather than what actually transpired during or subsequent to that event, in terms of any scoring on that play, base runner advances, etc., and regardless of the score or inning of the game.

One of the differences between UZR and linear weights is that with UZR the amount of credit that the fielder receives on each play, positive (if he makes an out) or negative (if he allows a hit or an ROE), depends on how often that particular kind of batted ball, in terms of its location, speed and several other factors, is fielded by an average fielder at the same position, measured over a time span of several years, in addition to whether the batted was a hit, out, or error (or FC). With offensive linear weights, if a batted ball is a hit or an out, the credit that the batter receives is not dependent on where or how hard the ball was hit, or any other parameters.

With UZR, if a fielder makes an out, and the UZR engine estimates that it was a difficult ball to field (and turn into an out) by an average fielder at that position, then the fielder will get more credit than if the UZR engine determined that it was an easy ball to field. Likewise, if a batted ball drops for a hit, a fielder will get more negative credit if UZR determined that it was an easy ball to field (for that fielding position) and less negative credit if it was a difficult ball to field. If a fielder makes an error, UZR automatically assumes that it was a relatively easy ball to field, since that is presumably the definition of an error in the first place, so there is no need to incorporate the speed and location of the batted ball and other factors that can influence how difficult a batted ball is to field. In other words, in UZR, errors are treated as balls that are normally fielded by that fielder and that fielder only (the one who made the error), 95% of the time, or whatever the average error rate is for that position and that type of ball.

Fielder Credits and Debits

How does UZR determine how much credit, positive or negative, to award a fielder on each batted ball? First it goes through 6 years of batted ball data and determines how often each type and location of batted ball is fielded by each defensive position, making adjustments for the speed of the ball, and the handedness, speed, and power of the batter. Later on, further adjustments are made, such as the outs and base runners, and various park adjustments, like the size and configuration of the OF, the speed of the infield, and the speed of batted balls in general, as influenced by temperature, altitude, and the ground ball percentage of the pitcher (e.g. ground ball pitchers allow easier to field ground balls and harder to field air balls). For example, UZR might find that from 2004-2009, of all hard-hit line drives hit by a LH batter with above-average power to a certain location in an average OF, 15% are fielded by the CF’er, 10% by the LF, and 75% fall for a hit. Remember, those would be average numbers across all MLB parks.

Now, let’s say that we want to compute a UZR for every player in 2009. For every batted ball, either it is caught by a fielder and turned into an out (either the batter or a base runner is out, or both of course), it is scored as a hit, or the batter reaches on an error or a fielder’s choice (and no out is made). One or more fielders will receive positive or negative credit depending on the outcome of the play and depending on how often that same batted ball in that same situation (outs, base runners, attributes of the batter, etc.) was successfully fielded by each fielder from 2004-2009.

Let’s say that that same batted ball in the example above was caught by the CF’er on the first play of a game. Since typically someone will catch that same ball only 25% of the time (see above), this particular CF’er will get credit for an extra .75 plays – 100% minus 25%. We then convert .75 plays into runs by multiplying .75 by the difference between an average hit in that location and the average value of an air ball out. A typical outfield hit is worth around .56 runs and any batted ball out is worth around -.27 runs, so the difference between a hit and an out is worth around .83 runs. (We don’t vary the value of the hit or out based on the outs or base runners because we want “game situation-neutral” defensive evaluations.) Since our fielder gets credit for .75 extra plays, we give him credit for .75 times .83 runs, or +.6255 runs for that play.

The LF’er, even though he typically catches that same ball 10% of the time, gets no demerits for not making the play. In UZR, when a ball is caught and turned into an out by one fielder, no other fielder gets docked any runs. This helps to minimize the effects of “ball-hogging.” If we didn’t do this, for example, on teams where the CF’er liked to take charge of just about every lazy fly ball hit into the gaps, the LF’er and RF’er would end up being penalized for balls that they could have easily fielded. Of course, with this method, a ball hogger will get slightly more credit than he deserves, but as long as his ball-hogging is done on easy fly balls, he isn’t going to get much credit anyway. For example, if a certain type of ball and location is caught 90% of the time, whoever catches it is only going to receive .1 (1 minus .9) times .83, or .083 runs in credit.

Notice that our CF’er did not get credit for 100% minus 15%, where the 15% is the percentage of time a typical CF’er catches the ball. Why is this? Because the total credit for catching that ball has to be 100% minus how often it is typically caught by anyone (25% in this case), and since no one else but the CF’er will (or should) receive credit, he has to get all of the total credit on that play, which is .75 extra balls. You may also notice that there is a slight flaw in that methodology. Let’s say that the LF’er normally catches 80% of those balls and the CF’er 10%. If the CF’er does catch that ball, he is only going to get credit for .1 plays even though it appears that it was a difficult catch for him (since he only catches similar balls in similar situations 10% of the time). But, if we give him more credit than .1 extra balls, we would have to take something away from the LF’er who normally fields that ball 80% of the time. We could do something like that, but we choose not to. As I mentioned before, if we dock one or more fielders when another fielder makes a play, we run into ball hogging problems. All of these problems arise, of course, because we don’t know precisely where a ball is hit, we don’t know exactly how long the ball was airborne or on the ground before it lands, is touched, or passes a fielder, and we don’t know exactly where the fielders were positioned when the ball was hit. Many decisions we have to make regarding the UZR methodology involve a trade-off. Given the limitations of the data, while the outcome is quite reliable, especially with large amounts of data (say, several years for a player), it is not perfect.

If that same ball were to drop for a hit, now our CF’er and our LF’er would both be penalized. First we figure the total cost of the hit. Since this particular batted ball is typically fielded by someone 25% of the time, the cost of the hit is .25 plays. Now, how do we apportion that .25? Since the LF’er catches 10% of these balls and the CF’er 15%, the LF’er is responsible for 40% (10/25) of the hits and the CF’er, 60% (15/25). So, the LF’er gets docked .4 (40%) times .25, or .1 plays, and the CF’er, .6 times .25, or .15, a total of .25 plays. Again, the run value of a play is the difference between an out and a typical hit in that area of the field, or around .83 runs. So when that ball drops for a hit, the LF’er gets docked .83 times .1, or .083 runs, and the CF’er gets nicked by .83 times .15, or .123 runs. Makes sense, right? If the CF’er catches a certain ball more often than another fielder or fielders, and the ball falls for a hit, he should bear the majority or plurality of the “blame.”

On any given batted ball, it is possible for 2, 3 or even 4 fielders to receive negative credit when a ball drops for a hit. As long as a position even occasionally catches a ball in that “bucket” (a “bucket” is a certain type of ball hit to a certain location at a certain speed with a certain kind of batter at the plate, etc.), a player at that position will get docked some fraction of a run when that ball falls for a hit. By the way, when figuring the value of a play, or the difference between a hit and an out for a certain type of batted ball in a certain location, we always use the average hit value for that kind of ball over our 6-year time span and across all parks. We don’t use the actual hit value on that play (e.g. a double) if the batted ball lands for a hit, and we don’t use the average hit value for that park. Again, we are trying to evaluate defense in as much of a context-neutral environment as possible.

Anyway, UZR goes through each batted ball for every game and does the same calculations as above, awarding one or more fielders plus or minus credit depending on what type of ball was hit, it’s location, and the estimated position of the fielders, as determined by things like the handedness, speed, and power of the batters, and the outs and base runners. In addition, adjustments are made for the characteristics and configuration of the ballpark, and for the G/F tendencies of the pitcher.

Batted Ball Types

A bunt ground ball is treated as a separate kind of a batted ball than a non-bunt ground ball, but only for the first, second, and third baseman. In fact, bunt ground balls are ignored when figuring the UZR of the SS (I realize that once in a blue moon the SS fields a bunt). The types of batted balls that UZR processes are ground balls, bunt ground balls, outfield line drives, and outfield fly balls (including so-called pop flies). All batted balls are put in one of those categories. No other batted ball type distinctions are used, such as “fliners,” which are used in Dewan’s plus/minus system. The speed of the each batted ball is also considered and is indicated in the data as “slow/soft, medium, or fast/hard” (3 categories).

GDP and Arm

Double plays are treated the same as a single out. There is a separate calculation for GDP’s above or below average, based simply on the number of DP’s turned per DP opportunity, given the speed and location of the ground ball. First baseman scoops are not included in any UZR numbers. My research suggests that a good first baseman can save maybe 2-3 runs a year in “scoops” – probably less than what you might think. As well, outfield arm run values are also computed separately from “regular” UZR. They are based on the speed and location of batted balls to the outfield and how often base runners advance extra bases (advances), don’t advance the extra base (holds), or get thrown out trying to advance (kills). Park factors are used in arm ratings. For example, because the left fielder plays so shallow in Fenway and balls tend to quickly ricochet off the Green Monster, it is difficult to advance an extra base on a hit to LF in Boston. In Colorado, because the OF is so expansive, base runners advance more easily than in an average park. The UZR “arm engine” adjusts for those things.

Base Runner and Outs Adjustments

The base runner and outs adjustments are a proxy for infield defensive alignment. With a runner on first and less than 2 outs, UZR assumes that the SS and 2B are playing in “double play position” which is typically closer to second base and a little shallower. With a runner on first and no one on second, it is assumed that the first baseman is holding the runner. With a runner on first or second and no outs (or 1 out and a pitcher at bat), the third baseman often has to play up in anticipation of a bunt.

Other Infield Positioning

Left-handed and right-handed batters are treated separately since infielders and outfielders are positioned differently for each. Infield ground balls are also handled separately for two categories of batters: Above-average speed and below-average speed. All batters are put into one or the other category, using a Bill James type of speed score. It is assumed that infielders must play a little shallower and are more hurried in general with a faster runner at the plate. Also, the data includes whether a shift (a generic one, in the opinion of the “stringer” – the person recording the data) was on, and whether the shift likely affected the play at all. If it did – again, according to the “stringer”- then the play is ignored.

Outfield Positioning and “Wall Balls”

For outfield air balls, two separate categories of batters are used as a proxy for outfielder depth: Batters with less than average power and batters with greater than average power. As with batter speed, all batters are assigned to one group or the other, based on the average distance of their air balls. The data also tells us if a ball hits off an outfield wall (on the fly) and whether, in the judgment of the “stringer,” it was catchable or not. If not, then the play is ignored. If a “wall ball” was catchable, it is treated like any other batted ball at that distance, with no regard for whether it hit the wall or not.

Infield Line Drives and Pop-ups

Finally, line drives and pop flies that are less than 180 feet from home plate are ignored. With pop flies at that distance, they are either rarely caught by an outfielder or there is too much potential discretion between outfielders and infielders. Infield line drives involve more luck than skill. With infield pop ups, most are caught and when one is dropped it is usually a fluke or a mix-up. And we would certainly have ball hogging problems with infield pop flies, so they are ignored.

Park Adjustments

Infield (ground ball) park adjustments are handled by assigning a “factor” to all ground balls in each park, depending upon the speed of the infield, which includes not only the height and texture of the IF grass (or indoor playing surface), but the altitude (and average temperature of the park to some extent). For example, in Colorado, the IF is fast (and the OF is difficult) because of the altitude, and in San Diego, the IF is slow, at least partially because the park is at sea level. Park factors are updated every time a material change occurs to a park or a team moves into a new park.

In the OF, each section, LF, CF, and RF, is divided into two zones, shallow and deep, for park adjustment purposes. Each of those 6 zones per park has their own adjustment factor. For example, the deep zone in LF at Fenway has an adjustment factor of .5, meaning that of all balls hit past a certain distance in LF at Fenway, the overall “catch rate” is only half that of the average major league park. Similarly, in Houston’s LF “short porch,” it is .86. In Seattle, fly balls in all sections of all fields are easier to catch than at an average MLB park, presumably because of the altitude, the cold weather, and the large but not too-large outfield dimensions, and thus have a park factor above 1.0.

The Baselines

Again, the baseline “catch rates” for all of the various “buckets” (batted ball types, speed, locations, etc.) are based on 6 years of data. That is an arbitrary number. It could be 3 years and it could be 10. I chose 6 years in order to accumulate fairly large samples of data in each bucket. The UZR numbers (each player’s runs saved above or below average) are initially presented as “as compared to the average player at all MLB parks over the 6-year baseline.” The numbers you see on Frangraphs, however, are scaled to an average player at each position for that league and year. So if you add up everyone’s UZR (the seasonal to-date or end-of-year totals – not the “per 150” rate) in any given year and league, it will sum to zero (or close to it because of rounding errors). Because of that, there is no guarantee that an average player in any one league or year is equal to an average player in any other league or year, even the same player. For example, let’s say that a certain player was zero in 2009 and 2008, thus you consider him to be an average defender for those two years, and you assume that his defensive ability or performance did not change from one year to the next. However, if in one of those years, the overall quality of defense in that player’s league was better or worse than the other year, the player may have actually gotten better or worse himself even though his UZR is zero in both years. That is a minor point, but it is something to keep in mind.

Recent Revisions

As many of you know, there was a recent UZR revision which has changed a few players’ numbers, especially outfielders who play their home games in quirky parks, like Bay and Ellsbury at Fenway. One of the changes involved an improvement in the way the park factors are computed and incorporated into the UZR engine. That’s why you might see some different numbers for players in these parks, primarily in the OF. Park factors are very difficult to do when constructing defensive metrics. They are much more difficult than offensive park factors, and those are no walk in the park either. The saving grace is that most parks do not have major defensive park factors. In any case, I think that the current version of UZR handles those factors pretty well, even in parks like Fenway and Coors. As well, Fangraphs now has home/road UZR splits so if you still don’t trust a certain player’s UZR because of the park factors issue, you can check out his road numbers. Keep in mind that you will see lots of random differences between some players’ home and road numbers which have nothing to do with park effects – they are simply an artifact of small sample sizes. Remember also that even large sample sizes can have large random fluctuations as well.

Speaking of Jason Bay, one interesting thing about the “controversy” surrounding him and the UZR revisions is that he had terrible UZR’s in Pittsburgh in 2007 and 2008, prior to his trade to Boston, so it should not have been surprising that he would have bad numbers in Boston as well, even without any park adjustment problems. Now, whether he is indeed a terrible, bad, slightly below average, or average (or even above average) fielder, is another story. Just because UZR, or any other defensive metric “says” that someone is X, even if that X is based on many years of data, does not make it so. When you are dealing with sample data, as we almost always are with every metric in baseball that we encounter, there is a certain chance that the metric is going to be “wrong.” Sometimes, you can use other information (such as scouting and observation, or physical attributes like size and speed) in order to adjust your “conclusions” and decrease your chance of being “wrong” and sometimes you can’t (because the requisite information is not available).

There were some other minor improvements and changes in the new version of UZR (such as the “shift” and “wall” data) which shouldn’t affect the numbers all that much. If they do, the new numbers are bigger and better!

Sample Size and Reliability

One thing to keep in mind is that as with all metrics based on sample data where you are trying to estimate a true mean or value, the more data you have generally the more reliable your estimate. In other words, the more opportunities that UZR is based on, the more reliable the number, everything else being equal. On defense, 2B, SS, and CF have almost twice the number of opportunities per game than do the other positions on the field, but that does not necessarily mean that a UZR based on 100 games at SS is as reliable as 200 games at 3B. There are other factors that affect the reliability of a sample number.

How many UZR opportunities do you need for UZR to be reliable? There isn’t any magic number. If I asked you how many AB you need before a player’s BA becomes reliable, you would likely answer, “I don’t know. The more the merrier I guess.” That is true with UZR and with all metrics. Of course, for some metrics, you need more or less data than for other metrics for an equivalent reliability. It depends on the sampling error and the spread in underlying talent, and other things that are inherent in that metric. Most of you are familiar with OPS, on base percentage plus slugging average. That is a very reliable metric even after one season of performance, or around 600 PA. In fact, the year-to-year correlation of OPS for full-time players, somewhat of a proxy for reliability, is almost .7. UZR, in contrast, depending on the position, has a year-to-year correlation of around .5. So a year of OPS data is roughly equivalent to a year and half to two years of UZR.

Another way to look at it is after one year, a player’s true talent UZR or what you might expect from him in the future is as close to that one-year number as it is to zero (technically, the average of a similar type player, which might not be zero). The best estimate is somewhere in between – in fact more or less the mid-point. Given that, I don’t think it is fair to say that one year of UZR data is “unreliable.” Of course, the words “reliable” or “unreliable” have no quantitative meaning. You can make of them whatever you want. Personally, no matter what size sample of data I look at, I always do a mental regression. For a one-year UZR, I mentally regress UZR halfway toward the mean, which means basically to “cut it in half” since the mean is defined more or less as zero. If you want to refine that “rule of thumb” a little, you can regress a player’s UZR (per 150 games) toward +2 for a fast player, -2 for a slow player, and zero for anyone in between. That is more true in the OF than in the IF, and more true at SS and 2B than at 3B or 1B, as you might expect. In addition, when I say “fast” or “slow,” I mean relative to the average player at that position. So, for example, if a player is fast, but only as fast as the average CF’er, and he is a CF’er, then you still want to regress his UZR to zero.

One problem that comes up with any metric when you combine years in order to increase sample size and thus reliability, is that a player’s true talent may change from one year to the next, such that you are in some sense adding apples to oranges. We generally handle that by giving more weight to recent years and less weight to more distant years. So keep that in mind when you are looking at multi-year UZR’s.

UZR and Aging

Also keep in mind that for most defensive positions, players decline in talent starting in their early to mid-20’s. About the only exception to that is first base, which seems to be more of a skill and learning position, while the other ones are mostly about speed and agility. So almost any player who has been a combined average defender over the last 3 years is likely a below-average defender due to aging. The notion that a player learns how to play defense in the major leagues is largely nonsense. He might learn and improve upon some specific defensive skills, but his overall defensive value is likely already declining by the time he gets to the majors. If you find that hard to believe, look at the aging curve for a player’s triples rate. It too declines from an early age. That is because speed, agility, and the willingness to put one’s body in harm’s way tend to be fleeting skills for most professional athletes. And all three of those skills are what correlate best with defensive ability (again, with the exception of 1B and to some extent 3B).

Consistency from Year to Year

Another issue that often comes up is the significance of consistent or inconsistent year-to-year UZR’s. It is human nature to look at Player A, who has a UZR of -11, +14, and -3 over the last 3 years, and be confused and unsure of that player’s true defensive skills. At the same time, we see Player B, who is +1, 0, and -1, and we feel confident that he is an average defender. Don’t be fooled by these illusions. There is virtually no difference between the two players’ stats. The fact of the matter is that both players have a 3-year UZR average (albeit non-weighted) of zero, and therefore both players are likely around average defenders. It makes very little difference what those yearly samples look like. They are merely 3 arbitrary sub-sets of a 3-year sample of data. Keep in mind I didn’t say that it makes no difference – only that it makes very little difference. If you are uncomfortable treating both of those players the same, then you can weight those 3 years (which is correct to do anyway) using a weighting scheme such as 3/4/5, and then you can ignore those year-to-year “inconsistencies.”

Does UZR tell us what actually happened on the field?

People often say something like, “Well, he had a +10 UZR last year, which means that he actually played well, even though he might be an average or even below average defender.” For example, Jeter had a very nice UZR in 2009, a decent one in 2008, and some terrible ones for many years prior to that. So, he is a perfect example of a below-average defender who played excellent defense last year and pretty good defense the year before, right? Well, maybe and maybe not. A player’s UZR does not necessarily tell you how he actually played just as it does not necessarily tell you what his true talent is. That is a very important point. It is not like we pulled a coin from our pocket and flipped it 100 times and came up with 60 heads (which is entirely possible, even though we presumably have a fair coin). In that case, we can safely say that, yes, we did in fact get 60 heads (Jeter did in fact play well last year), even though we know that the true heads percentage of our coin is around 50% (Jeter’s true talent at SS is very likely below-average). UZR does not work that way. Why is that?

That is because it is not measuring something that is categorized, like a coin flip which either comes up heads or tails, or BA, whereby a player either gets a hit or he doesn’t, or even simple Zone Rating, where a fielder either fields a ball within his zone or he doesn’t. Now, to some extent we are measuring something which is categorized, even though I just said that we aren’t. It is just that it is not particularly evident. For example, if we report that Jeter’s BA was .334 last year, we can look at his last 3 years’ BA or his career BA and declare that he is not likely a true .334 hitter, but there is no doubt about the fact that he hit .334 last year. We can even go to the video of all his games, and say, “Yup, he definitely hit .334 last year.”

Now, even though, as I said, to some extent with UZR we are measuring whether a fielder caught a certain “type” (speed, location, etc.) of ball or not, and that measurement is unambiguous, just because a player has a plus UZR does not mean that he necessarily played good defense – the same for a negative UZR. The analogy with BA is, just because a player had a .334 BA does not mean that he hit the ball well. It is entirely possible the only reason that he hit .334 was because he got a lot of bloops and bleeders and most of his hard hit balls dropped for a hit. But, because we can verify that a player did indeed hit .334, we say that a player’s BA is a good record of what actually happened. In fact, we would be better off if we didn’t record his batting performance by using his BA. We would be better off if we made adjustments to that BA based on how often his softly hit balls happened to fall for a hit and how often his hard hit balls were caught, as compared to the averages for those kinds of balls. If we did that, we would be better able to predict that player’s future BA, and we would have a better handle on his true batting talent, wouldn’t we? So we might actually say that so-and-so had a “virtual BA” of .285, even though he had an actual BA of .334, if lots of those .334 hits were lucky ones. And that .285 would likely be closer to the player’s true talent BA and he would be more likely to hit .285 next year than .334, since we don’t expect his good fortune to continue, if indeed it was good fortune and not some skill that our player had.

That is exactly what we are doing with UZR! UZR tries to record a player’s likely true talent and estimate his future performance based on the nuances of the batted ball and the player’s response to those nuances. It is not trying to capture exactly what happens on the field according to some arbitrary categories, like most of the offensive metrics (which make no distinction between a lucky ground ball bleeder through the “5-hole” or a clean, line drive base hit to the outfield), even the advanced ones like wOBA or linear weights.

Now, that being said, there is still a potentially large gap between what you might see on the field if you were to watch every play of every game and what UZR “says” happened on the field. And that is one of several reasons why one year or even 10 years of UZR (or any other sample metric) does not give us a perfect estimate of a player’s true talent or even an accurate picture of what actually happened on the field. The reason for that is that the data is imperfect. For example, UZR might put a certain batted ball in a certain bucket and determine that that batted ball was extremely difficult for the CF’er to catch, based on the recorded (by the BIS “stringers”) qualities of the ball and other data. We don’t, of course, know for sure whether it was indeed a difficult to field batted ball. We don’t know exactly where each fielder was stationed, we certainly don’t know the exact location of the batted ball to the nearest square inch on the field, and we definitely don’t know how long the ball was in the air or on the ground. In reality, it might have been an easy ball to catch or it might have been a difficult one to catch, or somewhere in between. We can only hope that in the long run, those balls were indeed hard to catch, on the average, for each individual player. We certainly know that those balls were hard to catch, on the average, for the league as a whole over a 6-year period.

Now, will those things even out over the course of a season or even three seasons for every single player? Of course not! The more the data, the more likely they are to even out, but there is never a guarantee. And that is why a player can have a plus UZR in any one year (or even 2 or 3) and perhaps play badly – or a minus UZR and play well. That is why we try and get as much data as possible before we declare a player to be a good or a poor defender and that is why we regress all sample data some amount toward a reasonable mean (usually zero for UZR) before we even open our mouths.

For example, if you see a player with a one year UZR of +10, think +5! He might not have actually played well at all, or he might have played off the charts, but our best guess as to how he played was +5. If you see a +10 after 1 month and you have no other data or information on a player, think of that as a +1.5 (regress it 85% toward the mean). Literally forget the +10. It means nothing. It does NOT mean that he played like a +10 fielder and we are just regressing it because we are not sure that his true talent is +10. We are regressing it (heavily) because not only do we not think that he has +10 talent, but because we don’t think that he played like as +10 fielder either.

Now, while +1.5 is our best guess as to that fielder’s actual play, we are much less certain of that +1.5 as we would be if a player were +1.5 after 5 years. Both players would have the same mean estimate of their true talent on defense, but our certainty surrounding that number would be vastly different between the two players.

Conclusions

So, what are the lessons here? One, use as much data as possible before drawing any conclusions about a player’s likely defensive ability, talent or value. But, because true talent can change from year to year, try and weight recent data more heavily than past data. Two, consistency from year to year means almost nothing. Ignore it, combine the data (hopefully with some weighting), and go on your merry way. Three, a player’s UZR, be it one year, one month or 5 years, is not necessarily what happened on the field and is not necessarily that player’s true talent level over that period of time either. That is why we regress, regress, and regress! A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect. It is exactly the same with offense and pitching. Do not for a second think that that is a unique problem with defensive metrics. It is not! The more data we have, however, the less likely the gap between UZR and what actually happened, and the smaller the gap between UZR and that player’s true defensive talent. And once we regress the sample numbers appropriately, we essentially shrink those gaps to zero, although there is still uncertainty with regard to the regressed number itself. So, even after regression, there is no guarantee that our UZR number reflects what the player actually did or his true defensive talent over that time period. But, it is the best we can do (not knowing anything else about that player)!




Print This Post



38 Responses to “The FanGraphs UZR Primer”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Chris says:

    As soon as I saw the article title, I knew I wasn’t gonna get any work done for atleast 15mins.

    +7 Vote -1 Vote +1

  2. BJsWorld says:

    Wow.

    Just a quick note – in the aging section there is a typo I believe. Players performance generally decreases in their early to mid 30′s. The article says 20′s. Either that or I am totally confused about when defensive regression hits.

    Vote -1 Vote +1

    • Newcomer says:

      I believe that is not a typo. Overall player performance doesn’t usually decline in the mid 20s, but that is because offensive performance is usually improving while defensive performance erodes. Note his following remarks:

      “his overall defensive value is likely already declining by the time he gets to the majors. If you find that hard to believe, look at the aging curve for a player’s triples rate. It too declines from an early age.”

      Vote -1 Vote +1

  3. Lee Panas says:

    Very good. I’ve been wanting to see the complete details of UZR all in one place for a long time. People are using UZR all over the place now and many don’t really know what it is. This article will be linked frequently I’m sure.

    Lee

    Vote -1 Vote +1

  4. Nik Oza says:

    And one way to make it more accurate in a pure fielding sense is to average out the buckets with the batted ball percentages of the AL or NL, therefore creating a Pitching and Hitting Independent UZR, the subject of my community blog post currently under review.

    Vote -1 Vote +1

  5. misc says:

    How is UZR/150 calculated? Ostensibly, the 150 refers to 150 games, but do you use actual games played, or do you use innings as a proxy? I’ve tried a couple of times for a couple of minutes to calculate UZR/150 from UZR and games or innings played at a position, and it never matches.

    Vote -1 Vote +1

    • Nik Oza says:

      It’s calculated using innings and averaged to 150 games. This makes it unreliable because not every fielder gets the same frequency of batted balls per inning, unless the inning batted balls are averaged to the percentages of the AL or NL.

      Vote -1 Vote +1

      • No it uses Defensive Games for both Range+Errors, ARM, and Double Plays. Innings are not used at all.

        Defensive Games – The number of outs made by an average fielder at his position given the exact distribution of balls in play for that player divided by the number of outs an average player at that position makes per game.

        Vote -1 Vote +1

      • misc says:

        And that’s why my numbers never matched. But that makes sense. Thanks.

        Vote -1 Vote +1

  6. MGL says:

    Nik, not sure what you mean. You can either explain here or I’ll wait for your community blog post.

    BJ, no typo. As far as we know, fielding starts to decline in a player’s early to mid 20′s, depending a little on position. That is because, as I said, fielding depends mostly on speed, agility, and recklessness. Those skills peak early, as you can partially see from the aging curves for triples rates and stolen base attempts. Fielding probably doesn’t decline as much as the decline in those skills would suggest by itself, because there is probably some learning curve associated with fielding, but not enough to prevent an overall decline at an early age. Obviously your results for individual players may vary, and a good gauge for how much you think a player’s defense might be eroding as he ages, especially in the OF, is a player’s speed and overall physical shape. A player like Ichiro, who has appeared to retain his health, speed and agility (and his weight), is likely going to decline a lot less and later (or not at all) than a player like A. Jones (before he lost all the weight this year) or Griffey.

    Vote -1 Vote +1

  7. MGL says:

    Yup, FG used to have defensive games listed with the UZR numbers. Maybe David will put them back. Basically, if you see that a player played in 50 defensive games, that means that an average player (for that position and that year and that league) would have had to play exactly 50 games in order to get the number of fielding chances that this field got. So, this fielder may have only played in 40 games, but got lots of chances for whatever reason, or he may have played in 60 games and got fewer than average number of chances. Again, we do that because we always want to try and reflect a fielder’s performance and/or talent in a completely context neutral environment.

    Vote -1 Vote +1

    • Eric M. Van says:

      I hugely miss the DG being listed. It’s like not having BFP for pitchers or PA for hitters; it absolutely needs to be squeezed back in.

      I mean, the current FanGraphs design doesn’t fill my laptop browser with the bookmark pane open. Wider columns, more data!

      Vote -1 Vote +1

  8. GiantPain says:

    Question:

    If UZR attempts to compare players to league average, isn’t it possible that data could be skewed by the quality of other players in the league that season? And could this not lead to some questionable comps?

    For example, can you really compare players UZR from different time periods?

    If player X had a +10 UZR from 2004-2006 and player y had a +11 UZR from 2007-2009, couldn’t that just as easily be a product of the play of OTHER players around the league rather than a statement of player Y’s superiority?

    Vote -1 Vote +1

  9. MGL says:

    GiantPain. Yes! I explained exactly that in the Primer. Did you read it? I could have “baselined” everyone to the entire 6 years of data I use for each year of UZR, but I didn’t. Most metrics are scaled to a particular year and often to a particular year and league. Like UZR, they don’t have to be. If you truly want to compare players across years and leagues, you have to do some other things than just compare their UZR’s. Anyway, using multi-years as the basline creates other problems, so it is 6 of one and half a dozen of another. I probably would have preferred to base line everyone to the entire 6 years of base line data (both leagues combined), but people really do not like metrics that do not sum to zero for a particular year, and I can’t really blame them. If I saw that the entire AL, for example, had an average UZR per player per 150 games of +3, I would be skeptical that defense was that good in that year and league. The data itself could cause something like that rather than the entire league being above average in defense for that year and league. There is no good reason for us to assume that the data we use for UZR is extremely consistent from year to year. In fact, we know it is not. So I am very wary of not zeroing out UZR for that year. Of course I could have zeroed it out for the AL and NL combined and not for each league individually. The way I do it now, as you correctly point out, we don’t know if one league is better than the other in defense (just like we don’t know which league is better in pitching or offense unless we do some more research in IL games and players who switch leagues), and we don’t know if overall defense happens to be better in one year compared to another. It is a dangerous world!

    Vote -1 Vote +1

    • GiantPain says:

      Now I feel like a dummy. Thanks for the response though. I will admit to having skimmed certain sections, heh. I guess UZR isn’t great, but it’s the best we’ve got.

      Vote -1 Vote +1

    • Bronnt says:

      So you’re saying it’s possible that Andruw Jones completely skewed the metric for centerfielders playing during the 1990s? Is that possible?

      Vote -1 Vote +1

      • Bronnt says:

        I should say, the earlier aughts. I wondered about that once, when noticing how he was in the 20s in UZR every year but there were very few CFs in the teens. I’m probably just imagining things, though.

        Vote -1 Vote +1

  10. pft says:

    “So, even after regression, there is no guarantee that our UZR number reflects what the player actually did or his true defensive talent over that time period. But, it is the best we can do (not knowing anything else about that player)!”

    I defintely agree about the true talent part. But your metric does reflect what a player does over a short period, since it is measuring actual events, albeit imperfectly and with uncertainty.

    Now say you observe Player A over a 30 game period. Adrian Beltre for example. If you read SOSH, you may have seen this discussion (if you can call it a discussion). UZR had him at 0. Doesn’t say anything really without observational data. If it was +10, it might suggest he was excellent if you did not watch the games. But I did, and he was not great, and I would throw that number out and not use it. If it was -10, it might suggest he was terrible, and if your eyes agree, you would be on pretty good ground to say that he was terrible. But it wasn’t -10.

    Based on my observations, I though UZR had Beltre pretty much right, since among the errors and some mental mistakes, he made some nice plays. For Beltre and per expectations, he played terrible in these 30 games, yet for a league average 3B’man, not terrible.

    Now if I look at the 2010 UZR leaders and trailers 5+ or -5-, I see names that I would expect to be there based on what is known about their true ability, and few that do not. So there is correlation even in a SSS. Of course, if you want to look at a players +2 and another player at -2, and make any comparisons without having seen any of the games, thats stupid.

    “Now, to some extent we are measuring something which is categorized, even though I just said that we aren’t. It is just that it is not particularly evident”.

    I agree with that, it is categorized, and it’s not evident because unlike offensive stats, we can not see it in the box score or play by play. Thats fine though. And I agree with your latter point that offensive stats over short periods do not always reflect how well a player did, just as UZR does not always do so. However, sometimes they do, by random chance or whatever.

    An 800 OPS hitter over 3 years may have an OPS of 800 today after 150 PA. A fielder may have a UZR/150 which matches his true ability based on a 3 year average. The difference between offensive stats and defensive stats is that offensive stats are based on counts, which have essentially no uncertainty (except umpires or official scorers decisions), while defensive metrics have some subjectivity, perhaps observational errors, and are based on probabilities and assigning runs for a player based on league averages (the latter point being the same for linear wts).

    Larger samples help to smooth things out, but there is still some uncertainty. I believe Tango estimated it at 5 runs over 2 years or something like that. Of course, any attempt to estimate uncertainty is uncertain.

    People do not like the word uncertainty. They prefer things to be black and white. They are not. So just because a number has uncertainty, does not mean it has no value. But using such numbers with certainty without taking into account the uncertainty, especially if you have not watched the player much, may lead to wrong conclusions.

    As for park factors, I saw Jason bay last year and knew he was not as bad as UZR had him to be. Maybe thats because he played at Fenway where being a great defensive LF’er is not as important, but Jason Bay did play in Fenway. In 2007 Bay was troubled with knee problems which may be why he had such a poor UZR. In 2008, he played 1/2 of his games in Fenway and he did not play it well since it takes a bit to know how to play there.

    Vote -1 Vote +1

  11. John says:

    Double wow. Even as a software developer/DBA I find it hard to imagine the size and structure of the databases you have access to.

    Vote -1 Vote +1

  12. Mike Green says:

    Excellent, MGL.

    Personally, I am persuaded that Jeter’s improved UZR numbers in 2009 resulted from a focused effort to improve his mobility to the right during the off-season.

    Incidentally, one thing that could be published that might help make the link between subjective observations and statistical measures of defense would be simply the importance of each position of balls in each location. So for centerfielders, if you divided up the zones, you could get numbers which on average reflected the importance of balls in the alleys, balls in front of the centerfielder and balls over his head. Similarly, with shortstops, balls in the hole vs. balls up the middle would be helpful (I believe that it’s 60-65% up the middle).

    Vote -1 Vote +1

  13. neuter_your_dogma says:

    Great read! I was wondering if there is any consideration given to the effect, if any, a great fielder has on other player’s UZR. For example, a great center fielder who can cover acres of extra ground could impact the data of a below average left fielder by taking away out chances and limiting the area the LF has to cover.

    Vote -1 Vote +1

  14. Nutlaw says:

    This is a very interesting read, though whether or not you would call the variation in year to year data “reliable” or not, given that you don’t rely on it to display a player’s true fielding ability, what can you rely on it to tell you? How do you feel comfortable using this data as generated?

    If a player with yearly ratings of -11, 14, and -3 should be considered the same as a player with yearly ratings of 1, 0, and -1, what happens if you measure the three most recent years of the first player one year later and he measures 14, -3, and 0? Then you’d think that he was rated 3.75 instead of zero.

    The data may not be so unreliable as to be useless, but I sure as heck can’t figure out what I’d want to use it for if I were worried about any sort of accuracy.

    Vote -1 Vote +1

  15. Nik says:

    so Appleman, does UZR 150 use an averaged batted ball chart and filter it through the fielder’s performance or use the fielder’s performance and filter it through the batted ball chart? I believe the first one would be more accurate for judging a fielder because it is taking the league average ratio of batted balls instead of the actual ratio that the fielder fielded. Or am I just not getting the Defensive Games concept?

    Vote -1 Vote +1

  16. MGL says:

    Nik, we want to know how a fielder would have done if we received the league average distribution of balls in 150 league average games, so that is the way the defensive games are calculated. Here is an example which should make it clear:

    A fielder makes 300 outs. It does not matter what the distribution of balls looks like that he fields or does not field, but he makes 300 outs, period. An average fielder with those exact same balls would have made 310 outs. Obviously we have a below average fielder to the tune of 10 outs or around 8 runs (-8 UZR).

    Now, here is how we assign defensive games to this fielder: The average fielder in his league and at his position gets 3 outs per game. 310/3 is 103.3. That is his number of defensive games. Literally had he played for a random team in that same year and league and gotten the same number of chances to field a ball and make an out, he would have played 103.3 games.

    Vote -1 Vote +1

  17. Nik says:

    Ok, thanks, I get the defensive games concept now.

    The aspect I think would be better (or at least supplementary) for UZR is as well as applying to the fielder the average number of outs for that position, apply the ratio of distribution of batted balls. Also, instead of taking the fielder’s effect from the batted balls he fielded and multiply it by the outs (to get the defensive games) multiply the fielder’s effect to the average ratio of batted ball distribution.

    I hope the blog post gets approved. There’s much more detail in there.

    Vote -1 Vote +1

  18. Acton says:

    Great work. Is this proprietary? Why not open it up completely so that others can replicate it.

    Also, can anyone quantify how much change there has been in UZR over the years? E.g., how much difference is there from, say, what it indicated in 2003 about that year’s performance, compared to what the model today indicates about 2003 performance. Because, as a decision-maker, I would want to know how stable the product’s indications have been in the past.

    TIA.

    Vote -1 Vote +1

  19. Dwight S. says:

    I have a couple questions about UZR that I’ve been wondering about. Hopefully these weren’t already answered and I just missed it. Sorry if I did.

    1. Do you get credited for getting to a ball even if the batter isn’t out? Take for instance your a SS and you show a bunch of range to even get to the ball but you can’t throw the runner out because he is too fast. Does that hurt you since you didn’t get the runner out?

    2. For outfielders does it take into account you getting to a ball that isn’t in the air? Like say there is a line drive that looks like its heading to the wall for extra bases but you show enough range to cut it off and hold them to just a single. Do you get any credit for that, or is it just ignored because you didn’t make an out?

    Vote -1 Vote +1

  20. Mark says:

    Does UZR take into consideration double plays turned, or only double plays started?

    Vote -1 Vote +1

  21. sdfadfasd says:

    consideration double plays turned, or only double plays

    Vote -1 Vote +1

  22. will says:

    still get confused by this. also, can’t it be a bit misleading?

    Vote -1 Vote +1

  23. Bonzi says:

    How are there UZR stats for players that played before this was tracked? Doesn’t it involve watching each game?

    Vote -1 Vote +1

  24. Jack says:

    Is there a website that lists shows uzr runs awarded and taken away for individual pa and games?

    Vote -1 Vote +1

  25. kominki says:

    I want to show some appreciation to this writer for bailing me out of such a trouble. As a result of researching through the the net and coming across opinions which are not powerful, I was thinking my life was done. Living without the presence of answers to the issues you’ve fixed by means of your good posting is a critical case, and those that might have adversely affected my entire career if I had not discovered your web blog. Your talents and kindness in controlling all the details was vital. I don’t know what I would have done if I hadn’t come across such a subject like this. I can also at this moment relish my future. Thank you very much for your reliable and results-oriented help. I will not hesitate to propose the website to any person who will need guide on this issue.

    Vote -1 Vote +1

  26. obviously like your web-site however you have to test the spelling on several of your posts. Several of them are rife with spelling issues and I to find it very troublesome to inform the reality then again I’ll definitely come again again.

    Vote -1 Vote +1

  27. I do accept as true with all of the ideas you’ve introduced in your post. They are really convincing and can definitely work. Nonetheless, the posts are too quick for beginners. May just you please extend them a little from next time? Thanks for the post.

    Vote -1 Vote +1

  28. Robert says:

    This is a great article and it really helped me understand the metric a lot better. It really goes in depth in analyzing the game beyond what we see on TV. But I just have one question. When you talk about regressing towards the mean, you mentioned that you like to “cut the number in half.” Let’s just say that a player has a UZR/150 of +10.0 through 2 months. Would you assume that his true talent UZR would be +5.0, because 10/2=5? It’s probably a stupid question, I got confused in that part, but nonetheless great article.

    Vote -1 Vote +1

  29. Payton says:

    i really need help trying to figure out this problem… its really hard … the problem is 2+2

    Vote -1 Vote +1

  30. Payton says:

    lata blablabla said this I do accept as true with all of the ideas you’ve introduced in your post. They are really convincing and can definitely work. Nonetheless, the posts are too quick for beginners. May just you please extend them a little from next time? Thanks for the post

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *