- FanGraphs Baseball - http://www.fangraphs.com/blogs -

# The FanGraphs UZR Primer

Index

Introduction

As many of you already know, UZR is an advanced defensive metric that uses play-by-play data recorded by Baseball Info Solutions (BIS) to estimate each fielder’s defensive contribution in theoretical runs above or below an average fielder at his position in that player’s league and year. Thus, a SS with a UZR of zero is exactly average as compared to a SS in the same year and in the same league. If his UZR is plus, he is above average, and if it is minus, he is below average.

It is similar to offensive linear weights, where each event is assigned a number of runs, or fraction of a run, which is equal to the average value of that event as compared to a generic PA, generally for that year and for that league. With UZR and offensive linear weights a player gets credit for the theoretical value of an event (for UZR, those events are turning a batted ball into an out, allowing a batted ball to drop for a hit, making an error – or a fielder’s choice – that allows the batter to reach base, or making an error that allows a base runner to advance one or more bases) rather than what actually transpired during or subsequent to that event, in terms of any scoring on that play, base runner advances, etc., and regardless of the score or inning of the game.

One of the differences between UZR and linear weights is that with UZR the amount of credit that the fielder receives on each play, positive (if he makes an out) or negative (if he allows a hit or an ROE), depends on how often that particular kind of batted ball, in terms of its location, speed and several other factors, is fielded by an average fielder at the same position, measured over a time span of several years, in addition to whether the batted was a hit, out, or error (or FC). With offensive linear weights, if a batted ball is a hit or an out, the credit that the batter receives is not dependent on where or how hard the ball was hit, or any other parameters.

With UZR, if a fielder makes an out, and the UZR engine estimates that it was a difficult ball to field (and turn into an out) by an average fielder at that position, then the fielder will get more credit than if the UZR engine determined that it was an easy ball to field. Likewise, if a batted ball drops for a hit, a fielder will get more negative credit if UZR determined that it was an easy ball to field (for that fielding position) and less negative credit if it was a difficult ball to field. If a fielder makes an error, UZR automatically assumes that it was a relatively easy ball to field, since that is presumably the definition of an error in the first place, so there is no need to incorporate the speed and location of the batted ball and other factors that can influence how difficult a batted ball is to field. In other words, in UZR, errors are treated as balls that are normally fielded by that fielder and that fielder only (the one who made the error), 95% of the time, or whatever the average error rate is for that position and that type of ball.

Fielder Credits and Debits

How does UZR determine how much credit, positive or negative, to award a fielder on each batted ball? First it goes through 6 years of batted ball data and determines how often each type and location of batted ball is fielded by each defensive position, making adjustments for the speed of the ball, and the handedness, speed, and power of the batter. Later on, further adjustments are made, such as the outs and base runners, and various park adjustments, like the size and configuration of the OF, the speed of the infield, and the speed of batted balls in general, as influenced by temperature, altitude, and the ground ball percentage of the pitcher (e.g. ground ball pitchers allow easier to field ground balls and harder to field air balls). For example, UZR might find that from 2004-2009, of all hard-hit line drives hit by a LH batter with above-average power to a certain location in an average OF, 15% are fielded by the CF’er, 10% by the LF, and 75% fall for a hit. Remember, those would be average numbers across all MLB parks.

Now, let’s say that we want to compute a UZR for every player in 2009. For every batted ball, either it is caught by a fielder and turned into an out (either the batter or a base runner is out, or both of course), it is scored as a hit, or the batter reaches on an error or a fielder’s choice (and no out is made). One or more fielders will receive positive or negative credit depending on the outcome of the play and depending on how often that same batted ball in that same situation (outs, base runners, attributes of the batter, etc.) was successfully fielded by each fielder from 2004-2009.

Let’s say that that same batted ball in the example above was caught by the CF’er on the first play of a game. Since typically someone will catch that same ball only 25% of the time (see above), this particular CF’er will get credit for an extra .75 plays – 100% minus 25%. We then convert .75 plays into runs by multiplying .75 by the difference between an average hit in that location and the average value of an air ball out. A typical outfield hit is worth around .56 runs and any batted ball out is worth around -.27 runs, so the difference between a hit and an out is worth around .83 runs. (We don’t vary the value of the hit or out based on the outs or base runners because we want “game situation-neutral” defensive evaluations.) Since our fielder gets credit for .75 extra plays, we give him credit for .75 times .83 runs, or +.6255 runs for that play.

The LF’er, even though he typically catches that same ball 10% of the time, gets no demerits for not making the play. In UZR, when a ball is caught and turned into an out by one fielder, no other fielder gets docked any runs. This helps to minimize the effects of “ball-hogging.” If we didn’t do this, for example, on teams where the CF’er liked to take charge of just about every lazy fly ball hit into the gaps, the LF’er and RF’er would end up being penalized for balls that they could have easily fielded. Of course, with this method, a ball hogger will get slightly more credit than he deserves, but as long as his ball-hogging is done on easy fly balls, he isn’t going to get much credit anyway. For example, if a certain type of ball and location is caught 90% of the time, whoever catches it is only going to receive .1 (1 minus .9) times .83, or .083 runs in credit.

Notice that our CF’er did not get credit for 100% minus 15%, where the 15% is the percentage of time a typical CF’er catches the ball. Why is this? Because the total credit for catching that ball has to be 100% minus how often it is typically caught by anyone (25% in this case), and since no one else but the CF’er will (or should) receive credit, he has to get all of the total credit on that play, which is .75 extra balls. You may also notice that there is a slight flaw in that methodology. Let’s say that the LF’er normally catches 80% of those balls and the CF’er 10%. If the CF’er does catch that ball, he is only going to get credit for .1 plays even though it appears that it was a difficult catch for him (since he only catches similar balls in similar situations 10% of the time). But, if we give him more credit than .1 extra balls, we would have to take something away from the LF’er who normally fields that ball 80% of the time. We could do something like that, but we choose not to. As I mentioned before, if we dock one or more fielders when another fielder makes a play, we run into ball hogging problems. All of these problems arise, of course, because we don’t know precisely where a ball is hit, we don’t know exactly how long the ball was airborne or on the ground before it lands, is touched, or passes a fielder, and we don’t know exactly where the fielders were positioned when the ball was hit. Many decisions we have to make regarding the UZR methodology involve a trade-off. Given the limitations of the data, while the outcome is quite reliable, especially with large amounts of data (say, several years for a player), it is not perfect.

If that same ball were to drop for a hit, now our CF’er and our LF’er would both be penalized. First we figure the total cost of the hit. Since this particular batted ball is typically fielded by someone 25% of the time, the cost of the hit is .25 plays. Now, how do we apportion that .25? Since the LF’er catches 10% of these balls and the CF’er 15%, the LF’er is responsible for 40% (10/25) of the hits and the CF’er, 60% (15/25). So, the LF’er gets docked .4 (40%) times .25, or .1 plays, and the CF’er, .6 times .25, or .15, a total of .25 plays. Again, the run value of a play is the difference between an out and a typical hit in that area of the field, or around .83 runs. So when that ball drops for a hit, the LF’er gets docked .83 times .1, or .083 runs, and the CF’er gets nicked by .83 times .15, or .123 runs. Makes sense, right? If the CF’er catches a certain ball more often than another fielder or fielders, and the ball falls for a hit, he should bear the majority or plurality of the “blame.”

On any given batted ball, it is possible for 2, 3 or even 4 fielders to receive negative credit when a ball drops for a hit. As long as a position even occasionally catches a ball in that “bucket” (a “bucket” is a certain type of ball hit to a certain location at a certain speed with a certain kind of batter at the plate, etc.), a player at that position will get docked some fraction of a run when that ball falls for a hit. By the way, when figuring the value of a play, or the difference between a hit and an out for a certain type of batted ball in a certain location, we always use the average hit value for that kind of ball over our 6-year time span and across all parks. We don’t use the actual hit value on that play (e.g. a double) if the batted ball lands for a hit, and we don’t use the average hit value for that park. Again, we are trying to evaluate defense in as much of a context-neutral environment as possible.

Anyway, UZR goes through each batted ball for every game and does the same calculations as above, awarding one or more fielders plus or minus credit depending on what type of ball was hit, it’s location, and the estimated position of the fielders, as determined by things like the handedness, speed, and power of the batters, and the outs and base runners. In addition, adjustments are made for the characteristics and configuration of the ballpark, and for the G/F tendencies of the pitcher.

Batted Ball Types

A bunt ground ball is treated as a separate kind of a batted ball than a non-bunt ground ball, but only for the first, second, and third baseman. In fact, bunt ground balls are ignored when figuring the UZR of the SS (I realize that once in a blue moon the SS fields a bunt). The types of batted balls that UZR processes are ground balls, bunt ground balls, outfield line drives, and outfield fly balls (including so-called pop flies). All batted balls are put in one of those categories. No other batted ball type distinctions are used, such as “fliners,” which are used in Dewan’s plus/minus system. The speed of the each batted ball is also considered and is indicated in the data as “slow/soft, medium, or fast/hard” (3 categories).

GDP and Arm

Double plays are treated the same as a single out. There is a separate calculation for GDP’s above or below average, based simply on the number of DP’s turned per DP opportunity, given the speed and location of the ground ball. First baseman scoops are not included in any UZR numbers. My research suggests that a good first baseman can save maybe 2-3 runs a year in “scoops” – probably less than what you might think. As well, outfield arm run values are also computed separately from “regular” UZR. They are based on the speed and location of batted balls to the outfield and how often base runners advance extra bases (advances), don’t advance the extra base (holds), or get thrown out trying to advance (kills). Park factors are used in arm ratings. For example, because the left fielder plays so shallow in Fenway and balls tend to quickly ricochet off the Green Monster, it is difficult to advance an extra base on a hit to LF in Boston. In Colorado, because the OF is so expansive, base runners advance more easily than in an average park. The UZR “arm engine” adjusts for those things.

The base runner and outs adjustments are a proxy for infield defensive alignment. With a runner on first and less than 2 outs, UZR assumes that the SS and 2B are playing in “double play position” which is typically closer to second base and a little shallower. With a runner on first and no one on second, it is assumed that the first baseman is holding the runner. With a runner on first or second and no outs (or 1 out and a pitcher at bat), the third baseman often has to play up in anticipation of a bunt.

Other Infield Positioning

Left-handed and right-handed batters are treated separately since infielders and outfielders are positioned differently for each. Infield ground balls are also handled separately for two categories of batters: Above-average speed and below-average speed. All batters are put into one or the other category, using a Bill James type of speed score. It is assumed that infielders must play a little shallower and are more hurried in general with a faster runner at the plate. Also, the data includes whether a shift (a generic one, in the opinion of the “stringer” – the person recording the data) was on, and whether the shift likely affected the play at all. If it did – again, according to the “stringer”- then the play is ignored.

Outfield Positioning and “Wall Balls”

For outfield air balls, two separate categories of batters are used as a proxy for outfielder depth: Batters with less than average power and batters with greater than average power. As with batter speed, all batters are assigned to one group or the other, based on the average distance of their air balls. The data also tells us if a ball hits off an outfield wall (on the fly) and whether, in the judgment of the “stringer,” it was catchable or not. If not, then the play is ignored. If a “wall ball” was catchable, it is treated like any other batted ball at that distance, with no regard for whether it hit the wall or not.

Infield Line Drives and Pop-ups

Finally, line drives and pop flies that are less than 180 feet from home plate are ignored. With pop flies at that distance, they are either rarely caught by an outfielder or there is too much potential discretion between outfielders and infielders. Infield line drives involve more luck than skill. With infield pop ups, most are caught and when one is dropped it is usually a fluke or a mix-up. And we would certainly have ball hogging problems with infield pop flies, so they are ignored.

Infield (ground ball) park adjustments are handled by assigning a “factor” to all ground balls in each park, depending upon the speed of the infield, which includes not only the height and texture of the IF grass (or indoor playing surface), but the altitude (and average temperature of the park to some extent). For example, in Colorado, the IF is fast (and the OF is difficult) because of the altitude, and in San Diego, the IF is slow, at least partially because the park is at sea level. Park factors are updated every time a material change occurs to a park or a team moves into a new park.

In the OF, each section, LF, CF, and RF, is divided into two zones, shallow and deep, for park adjustment purposes. Each of those 6 zones per park has their own adjustment factor. For example, the deep zone in LF at Fenway has an adjustment factor of .5, meaning that of all balls hit past a certain distance in LF at Fenway, the overall “catch rate” is only half that of the average major league park. Similarly, in Houston’s LF “short porch,” it is .86. In Seattle, fly balls in all sections of all fields are easier to catch than at an average MLB park, presumably because of the altitude, the cold weather, and the large but not too-large outfield dimensions, and thus have a park factor above 1.0.

The Baselines

Again, the baseline “catch rates” for all of the various “buckets” (batted ball types, speed, locations, etc.) are based on 6 years of data. That is an arbitrary number. It could be 3 years and it could be 10. I chose 6 years in order to accumulate fairly large samples of data in each bucket. The UZR numbers (each player’s runs saved above or below average) are initially presented as “as compared to the average player at all MLB parks over the 6-year baseline.” The numbers you see on Frangraphs, however, are scaled to an average player at each position for that league and year. So if you add up everyone’s UZR (the seasonal to-date or end-of-year totals – not the “per 150” rate) in any given year and league, it will sum to zero (or close to it because of rounding errors). Because of that, there is no guarantee that an average player in any one league or year is equal to an average player in any other league or year, even the same player. For example, let’s say that a certain player was zero in 2009 and 2008, thus you consider him to be an average defender for those two years, and you assume that his defensive ability or performance did not change from one year to the next. However, if in one of those years, the overall quality of defense in that player’s league was better or worse than the other year, the player may have actually gotten better or worse himself even though his UZR is zero in both years. That is a minor point, but it is something to keep in mind.

Recent Revisions

As many of you know, there was a recent UZR revision which has changed a few players’ numbers, especially outfielders who play their home games in quirky parks, like Bay and Ellsbury at Fenway. One of the changes involved an improvement in the way the park factors are computed and incorporated into the UZR engine. That’s why you might see some different numbers for players in these parks, primarily in the OF. Park factors are very difficult to do when constructing defensive metrics. They are much more difficult than offensive park factors, and those are no walk in the park either. The saving grace is that most parks do not have major defensive park factors. In any case, I think that the current version of UZR handles those factors pretty well, even in parks like Fenway and Coors. As well, Fangraphs now has home/road UZR splits so if you still don’t trust a certain player’s UZR because of the park factors issue, you can check out his road numbers. Keep in mind that you will see lots of random differences between some players’ home and road numbers which have nothing to do with park effects – they are simply an artifact of small sample sizes. Remember also that even large sample sizes can have large random fluctuations as well.

Speaking of Jason Bay, one interesting thing about the “controversy” surrounding him and the UZR revisions is that he had terrible UZR’s in Pittsburgh in 2007 and 2008, prior to his trade to Boston, so it should not have been surprising that he would have bad numbers in Boston as well, even without any park adjustment problems. Now, whether he is indeed a terrible, bad, slightly below average, or average (or even above average) fielder, is another story. Just because UZR, or any other defensive metric “says” that someone is X, even if that X is based on many years of data, does not make it so. When you are dealing with sample data, as we almost always are with every metric in baseball that we encounter, there is a certain chance that the metric is going to be “wrong.” Sometimes, you can use other information (such as scouting and observation, or physical attributes like size and speed) in order to adjust your “conclusions” and decrease your chance of being “wrong” and sometimes you can’t (because the requisite information is not available).

There were some other minor improvements and changes in the new version of UZR (such as the “shift” and “wall” data) which shouldn’t affect the numbers all that much. If they do, the new numbers are bigger and better!

Sample Size and Reliability

One thing to keep in mind is that as with all metrics based on sample data where you are trying to estimate a true mean or value, the more data you have generally the more reliable your estimate. In other words, the more opportunities that UZR is based on, the more reliable the number, everything else being equal. On defense, 2B, SS, and CF have almost twice the number of opportunities per game than do the other positions on the field, but that does not necessarily mean that a UZR based on 100 games at SS is as reliable as 200 games at 3B. There are other factors that affect the reliability of a sample number.

How many UZR opportunities do you need for UZR to be reliable? There isn’t any magic number. If I asked you how many AB you need before a player’s BA becomes reliable, you would likely answer, “I don’t know. The more the merrier I guess.” That is true with UZR and with all metrics. Of course, for some metrics, you need more or less data than for other metrics for an equivalent reliability. It depends on the sampling error and the spread in underlying talent, and other things that are inherent in that metric. Most of you are familiar with OPS, on base percentage plus slugging average. That is a very reliable metric even after one season of performance, or around 600 PA. In fact, the year-to-year correlation of OPS for full-time players, somewhat of a proxy for reliability, is almost .7. UZR, in contrast, depending on the position, has a year-to-year correlation of around .5. So a year of OPS data is roughly equivalent to a year and half to two years of UZR.

Another way to look at it is after one year, a player’s true talent UZR or what you might expect from him in the future is as close to that one-year number as it is to zero (technically, the average of a similar type player, which might not be zero). The best estimate is somewhere in between – in fact more or less the mid-point. Given that, I don’t think it is fair to say that one year of UZR data is “unreliable.” Of course, the words “reliable” or “unreliable” have no quantitative meaning. You can make of them whatever you want. Personally, no matter what size sample of data I look at, I always do a mental regression. For a one-year UZR, I mentally regress UZR halfway toward the mean, which means basically to “cut it in half” since the mean is defined more or less as zero. If you want to refine that “rule of thumb” a little, you can regress a player’s UZR (per 150 games) toward +2 for a fast player, -2 for a slow player, and zero for anyone in between. That is more true in the OF than in the IF, and more true at SS and 2B than at 3B or 1B, as you might expect. In addition, when I say “fast” or “slow,” I mean relative to the average player at that position. So, for example, if a player is fast, but only as fast as the average CF’er, and he is a CF’er, then you still want to regress his UZR to zero.

One problem that comes up with any metric when you combine years in order to increase sample size and thus reliability, is that a player’s true talent may change from one year to the next, such that you are in some sense adding apples to oranges. We generally handle that by giving more weight to recent years and less weight to more distant years. So keep that in mind when you are looking at multi-year UZR’s.

UZR and Aging

Also keep in mind that for most defensive positions, players decline in talent starting in their early to mid-20’s. About the only exception to that is first base, which seems to be more of a skill and learning position, while the other ones are mostly about speed and agility. So almost any player who has been a combined average defender over the last 3 years is likely a below-average defender due to aging. The notion that a player learns how to play defense in the major leagues is largely nonsense. He might learn and improve upon some specific defensive skills, but his overall defensive value is likely already declining by the time he gets to the majors. If you find that hard to believe, look at the aging curve for a player’s triples rate. It too declines from an early age. That is because speed, agility, and the willingness to put one’s body in harm’s way tend to be fleeting skills for most professional athletes. And all three of those skills are what correlate best with defensive ability (again, with the exception of 1B and to some extent 3B).

Consistency from Year to Year

Another issue that often comes up is the significance of consistent or inconsistent year-to-year UZR’s. It is human nature to look at Player A, who has a UZR of -11, +14, and -3 over the last 3 years, and be confused and unsure of that player’s true defensive skills. At the same time, we see Player B, who is +1, 0, and -1, and we feel confident that he is an average defender. Don’t be fooled by these illusions. There is virtually no difference between the two players’ stats. The fact of the matter is that both players have a 3-year UZR average (albeit non-weighted) of zero, and therefore both players are likely around average defenders. It makes very little difference what those yearly samples look like. They are merely 3 arbitrary sub-sets of a 3-year sample of data. Keep in mind I didn’t say that it makes no difference – only that it makes very little difference. If you are uncomfortable treating both of those players the same, then you can weight those 3 years (which is correct to do anyway) using a weighting scheme such as 3/4/5, and then you can ignore those year-to-year “inconsistencies.”

Does UZR tell us what actually happened on the field?

People often say something like, “Well, he had a +10 UZR last year, which means that he actually played well, even though he might be an average or even below average defender.” For example, Jeter had a very nice UZR in 2009, a decent one in 2008, and some terrible ones for many years prior to that. So, he is a perfect example of a below-average defender who played excellent defense last year and pretty good defense the year before, right? Well, maybe and maybe not. A player’s UZR does not necessarily tell you how he actually played just as it does not necessarily tell you what his true talent is. That is a very important point. It is not like we pulled a coin from our pocket and flipped it 100 times and came up with 60 heads (which is entirely possible, even though we presumably have a fair coin). In that case, we can safely say that, yes, we did in fact get 60 heads (Jeter did in fact play well last year), even though we know that the true heads percentage of our coin is around 50% (Jeter’s true talent at SS is very likely below-average). UZR does not work that way. Why is that?

That is because it is not measuring something that is categorized, like a coin flip which either comes up heads or tails, or BA, whereby a player either gets a hit or he doesn’t, or even simple Zone Rating, where a fielder either fields a ball within his zone or he doesn’t. Now, to some extent we are measuring something which is categorized, even though I just said that we aren’t. It is just that it is not particularly evident. For example, if we report that Jeter’s BA was .334 last year, we can look at his last 3 years’ BA or his career BA and declare that he is not likely a true .334 hitter, but there is no doubt about the fact that he hit .334 last year. We can even go to the video of all his games, and say, “Yup, he definitely hit .334 last year.”

Now, even though, as I said, to some extent with UZR we are measuring whether a fielder caught a certain “type” (speed, location, etc.) of ball or not, and that measurement is unambiguous, just because a player has a plus UZR does not mean that he necessarily played good defense – the same for a negative UZR. The analogy with BA is, just because a player had a .334 BA does not mean that he hit the ball well. It is entirely possible the only reason that he hit .334 was because he got a lot of bloops and bleeders and most of his hard hit balls dropped for a hit. But, because we can verify that a player did indeed hit .334, we say that a player’s BA is a good record of what actually happened. In fact, we would be better off if we didn’t record his batting performance by using his BA. We would be better off if we made adjustments to that BA based on how often his softly hit balls happened to fall for a hit and how often his hard hit balls were caught, as compared to the averages for those kinds of balls. If we did that, we would be better able to predict that player’s future BA, and we would have a better handle on his true batting talent, wouldn’t we? So we might actually say that so-and-so had a “virtual BA” of .285, even though he had an actual BA of .334, if lots of those .334 hits were lucky ones. And that .285 would likely be closer to the player’s true talent BA and he would be more likely to hit .285 next year than .334, since we don’t expect his good fortune to continue, if indeed it was good fortune and not some skill that our player had.

That is exactly what we are doing with UZR! UZR tries to record a player’s likely true talent and estimate his future performance based on the nuances of the batted ball and the player’s response to those nuances. It is not trying to capture exactly what happens on the field according to some arbitrary categories, like most of the offensive metrics (which make no distinction between a lucky ground ball bleeder through the “5-hole” or a clean, line drive base hit to the outfield), even the advanced ones like wOBA or linear weights.

Now, that being said, there is still a potentially large gap between what you might see on the field if you were to watch every play of every game and what UZR “says” happened on the field. And that is one of several reasons why one year or even 10 years of UZR (or any other sample metric) does not give us a perfect estimate of a player’s true talent or even an accurate picture of what actually happened on the field. The reason for that is that the data is imperfect. For example, UZR might put a certain batted ball in a certain bucket and determine that that batted ball was extremely difficult for the CF’er to catch, based on the recorded (by the BIS “stringers”) qualities of the ball and other data. We don’t, of course, know for sure whether it was indeed a difficult to field batted ball. We don’t know exactly where each fielder was stationed, we certainly don’t know the exact location of the batted ball to the nearest square inch on the field, and we definitely don’t know how long the ball was in the air or on the ground. In reality, it might have been an easy ball to catch or it might have been a difficult one to catch, or somewhere in between. We can only hope that in the long run, those balls were indeed hard to catch, on the average, for each individual player. We certainly know that those balls were hard to catch, on the average, for the league as a whole over a 6-year period.

Now, will those things even out over the course of a season or even three seasons for every single player? Of course not! The more the data, the more likely they are to even out, but there is never a guarantee. And that is why a player can have a plus UZR in any one year (or even 2 or 3) and perhaps play badly – or a minus UZR and play well. That is why we try and get as much data as possible before we declare a player to be a good or a poor defender and that is why we regress all sample data some amount toward a reasonable mean (usually zero for UZR) before we even open our mouths.

For example, if you see a player with a one year UZR of +10, think +5! He might not have actually played well at all, or he might have played off the charts, but our best guess as to how he played was +5. If you see a +10 after 1 month and you have no other data or information on a player, think of that as a +1.5 (regress it 85% toward the mean). Literally forget the +10. It means nothing. It does NOT mean that he played like a +10 fielder and we are just regressing it because we are not sure that his true talent is +10. We are regressing it (heavily) because not only do we not think that he has +10 talent, but because we don’t think that he played like as +10 fielder either.

Now, while +1.5 is our best guess as to that fielder’s actual play, we are much less certain of that +1.5 as we would be if a player were +1.5 after 5 years. Both players would have the same mean estimate of their true talent on defense, but our certainty surrounding that number would be vastly different between the two players.

Conclusions

So, what are the lessons here? One, use as much data as possible before drawing any conclusions about a player’s likely defensive ability, talent or value. But, because true talent can change from year to year, try and weight recent data more heavily than past data. Two, consistency from year to year means almost nothing. Ignore it, combine the data (hopefully with some weighting), and go on your merry way. Three, a player’s UZR, be it one year, one month or 5 years, is not necessarily what happened on the field and is not necessarily that player’s true talent level over that period of time either. That is why we regress, regress, and regress! A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect. It is exactly the same with offense and pitching. Do not for a second think that that is a unique problem with defensive metrics. It is not! The more data we have, however, the less likely the gap between UZR and what actually happened, and the smaller the gap between UZR and that player’s true defensive talent. And once we regress the sample numbers appropriately, we essentially shrink those gaps to zero, although there is still uncertainty with regard to the regressed number itself. So, even after regression, there is no guarantee that our UZR number reflects what the player actually did or his true defensive talent over that time period. But, it is the best we can do (not knowing anything else about that player)!