# Range and Errors

As many of you know, this offseason proved monumental for the site as we added a wide array of evaluative metrics, becoming one of the primary sources for player valuations. One of these additions, UZR, the fielding metric designed by Mitchel Lichtman, enabled analysts and readers alike to incorporate the fielding aspect of baseball into discussions. Several aspects of fielding combine to provide the final UZR figure, and two, range runs and error runs, are of particular interest given their reputations in the world of conventional wisdom.

The conventional wisdom goes that the better a player’s range, the more likely it is that he will commit errors. The underlying reasoning is that the player will be able to get his glove on more balls, thereby not only giving himself a chance to make more plays, but also the chance to mess up on more plays. I like to refer to this as ‘The Abreu Complex’ as Bobby Abreu used to be considered a solid fielder by many fans because he rarely made errors. The issue of course is that his limited range prevented him from covering more ground: he didn’t bobble many balls but he couldn’t get to balls that others would catch and that he might then bobble.

With the different components of UZR freely available on the site, I decided to see if the conventional wisdom held true – does more range really translate to increased errors? I pooled every player with at least 100 innings at a position over the last three years, removed catchers, and wound up with 722 player position seasons. Correlations were then run for infielders and outfielders with regards to both range run and error runs. A correlation is basically a statistical test that measures the lack of independence of two random variables; in this case, do range and errors relate strongly to one another in the sense that as one goes so too does the other?

For two variables to be considered to have at least a moderately strong relationship, a correlation coefficient of at least 0.40 would be needed. Among infielders, range runs and error runs produced a 0.10 correlation, while outfielders featured only a slightly stronger relationship at 0.15. Neither group of fielders exhibited anything close to a moderately strong relationship between range and errors, leading the conventional wisdom astray: more range does not necessarily result in more errors, no matter how much sense the statement might make from an intuitive standpoint.

Even when I restricted the data to at least 800 innings at a position, the correlations remained virtually the same–0.16 for OF, 0.11 for IF. Based on this data it seems that there are certainly cases where range and errors relate to one another, but it is in no way a foregone conclusion that more range results in more errors.

Print This Post

Eric is an accountant and statistical analyst from Philadelphia. He also covers the Phillies at Phillies Nation and can be found here on Twitter.

Guest
7 years 7 months ago

MGL studied the issue a little while ago, and found that more range actually equaled fewer errors.

http://www.insidethebook.com/ee/index.php/site/article/do_fielders_with_good_range_commit_more_errors/

“Bottom line is that a playerâ€™s fielding percentage, especially if it is poor, actually tells us a lot about his range, and it is NOT in the direction that a lot of people were thinkingâ€¦”

Guest
7 years 7 months ago

Well, errors are also subjectively given by scorers.

Guest
ed
7 years 7 months ago

It makes sense to me that there would be only a limited correlation between these two numbers because a fielder does not have the same propensity to commit error on every ball that comes to him. (Actually, I haven’t done any research on that point.)

A player might be more likely to commit errors the farther he has to move relative to his ability to move and the faster the play has to be made. Therefore, a player with more range might be less likely to commit errors when he has to move only a moderate distance. At the same distance, a player with reduced range would struggle to get there and be more likely to make an error. So a player with more range balances a better ability to field balls in a small radius with a greater number of fielded balls (chances for errors) and a slightly increased area of field (the edge of a players range) where the player has a greater chance to commit errors. All told, range seems unlikely to contribute much to errors.

Guest
7 years 7 months ago

That logic works in terms of total errors but not on a percentage basis, as ed describes. Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.

There should be some increase because obviously, a ball hit directly to a fielder (relatively) should be easier to field. Going beyond the normal range puts another degree of difficulty on fielding and thus should lead to more errors on a percentage basis.

More range leading to less errors on a percentage basis also makes more sense because he can get to balls faster and position himself to safely make the catch, whereas the poor range fielder will find himself out of position more often.

Of course, the above is based on the great info in this post and not doable until the data leads the way.

Guest
B
7 years 7 months ago

Do you have a scatterplot of the data?

Guest
LarryinLA
7 years 7 months ago

This is called fanGRAPHS, afterall. (I kid).

Guest
Peter Jensen
7 years 7 months ago

That logic works in terms of total errors but not on a percentage basis, as ed describes. Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.

The logic is bad in both cases. A scorer is rarely going to give an error on a missed play if most fielders would never have gotten to the ball in the first place. The only case that I can think of is if an infielder hurries his throw and throws wildly and a runner or the batter is able to advance to another base.

Guest
MGL
7 years 7 months ago

Eric if you found a positive correlation (.10 or .15, or whatever) and you kept the appropriate signs, then that would indicate that players who had MORE range (a plus range UZR) had FEWER errors (also a plus errors UZR). So assuming that you kept the signs (plus or minus) intact when you ran the regressions, then even though you found a weak correlation, you did in fact find that, so some small degree, more range equals fewer errors, exactly the opposite of conventional wisdom. Which is exactly what I found.

As I said in my Book post, I suspect two things are driving the correlation (albeit a weak one): One, players with better range are probably better athletes, defensive-wise, so they may tend to make fewer errors. Two, a player who is a better athlete with better range may influence the scorers (in their favor) just a tad. IOW, who are you more likely to give an error to on a tough play – Pat Burrell or Carl Crawford?

The idea that since a good range fielder fields 3-5% more balls than an average fielder, he makes more errors is not such a tenable argument. In fact, you could make an argument that the balls that the good fielder is fielding are balls that the bad fielder is making errors on.

Guest
Bryan
7 years 7 months ago

Players with a high number of errors will also probably have low range because they are horrible defenders. This is a hard thing to test by correlation I think because you will probably have two groups of players that have bad range and either good or bad error totals which would destroy your ability to see any effects.

Guest
MGL
7 years 6 months ago

If you don’t do a correlation but look at groups of players (what I call a “poor man’s correlation”), you find that the players with the highest range UZR’s have the highest error UZR’s (basically the best error rates), and vice versa.

For things like this I don’t like to do correlations. Because of the small sample sizes, the correlations will be low even if there is a near prefect relationship.

For example, let’s say that we do it by groups and we find that in one year, all of the highest range UZR players combined – say, an average UZR of +10 per 150 – have an error UZR of +3 (which is high), and all of the low range UZR players, say, -10 on the average, have an error UZR of -3, I think we would say that that is evidence of a strong relationship. In other words, if a player has lots of range, he is likely to have very good hands as well, and vice versa.

However, if the samples for each player are small enough, the correlation coefficient could be very small – even close to zero for small enough samples.

So we really have to very careful with computing correlations on data points when each data point is a sample of performance. If those samples are small enough, the correlations can be near zero even with a prefect relationship. What we are really interested in when we do these correlations to determine “relationships”, such as with this issue, is “true talent” versus “true talent.” The only way to do that with any precision at all is to use data points that are comprised of large samples of performance – like 5 years of range UZR versus 5 years of error UZR, or something like that. That is often not too practical of course.

So, as Tango often does, I caution everyone about putting too much stock into correlations between sample data points when those samples are small or even medium size, and I caution researchers about doing correlations when trying to determine relationships and/or drawing too many conclusions from the resultant “r’s”. It is often better to do the poor man’s correlation which is basically aggregating data to form large samples and then doing a one data point to another data point comparison. In fact, I almost always do that first and then I might do a correlation just to see how consistent the relationship is among all the players (or whatever it is underlying each data point), again, with the caveat being you are never going to get a large correlation no matter how strong the relationship when the data points are comprised of small samples of performance.

Another nice trick is to compute the maximum correlation coefficient for whatever your sample sizes are for comparison purposes, assuming a perfect relationship. Tango does this when evaluating projection systems. For example, let’s say that you want to evaluate a BA projection system or you want to analyze the relationship between one set of BA and another set. And let’s say that the independent variable was BA in 300 AB or the two sets of BA (the x and y variables) were comprised of around 300 AB. What is a “good” correlation? We don’t know unless we at least know what the “prefect” correlation would be. IOW, if every player had the same true BA in each data pair, but each variable was only 300 AB, what would the correlation coefficient be? I don’t know off the top of my head, but it would probably not be greater than .5 or .6 I wouldn’t think. IOW, if batter A was a true .270 hitter in data point x and data point y, and batter B were a true .230 hitter in data point x and data point y, etc., if we samples all the batters for 300 AB, we would have sample batting averages of, like, 263 for batters A in data point x and .281 in data point y (random binomially distributed BA around .270), and for batter B, it might be .240 and .210. Or whatever. Now, we know that the relationships between the x and y variables are perfect and that the correlation is 1 given an infinite number of AB per data point, but in only 300 AB per data point, we are likely to get a correlation of .5 or .6 (or whatever it is). This is critical information to know if we are comparing sets of BA to determine a relationship. If our sample sizes are 300 AB and we get a correlation of .4 or .5 we can say that that is very strong. If our sample sizes were 2000 AB, then a correlation of .4 or .5 is not particularly strong.

Guest
B
7 years 6 months ago

Which is why I was hoping he’d post a scatterplot so we can see if any trends stand out…

Guest
Brian
7 years 6 months ago

Very interesting. Thanks!