<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"
	>
<channel>
	<title>Comments on: Range and Errors</title>
	<atom:link href="http://www.fangraphs.com/blogs/index.php/range-and-errors/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/</link>
	<description>Daily baseball statistical analysis and commentary</description>
	<lastBuildDate>Sun, 12 Feb 2012 13:06:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Eric Seidman</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64506</link>
		<dc:creator>Eric Seidman</dc:creator>
		<pubDate>Mon, 02 Mar 2009 21:05:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64506</guid>
		<description>Will be posting a scatterplot this week.  Apologies for the delay.</description>
		<content:encoded><![CDATA[<p>Will be posting a scatterplot this week.  Apologies for the delay.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: B</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64474</link>
		<dc:creator>B</dc:creator>
		<pubDate>Mon, 02 Mar 2009 17:45:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64474</guid>
		<description>Which is why I was hoping he&#039;d post a scatterplot so we can see if any trends stand out...</description>
		<content:encoded><![CDATA[<p>Which is why I was hoping he&#8217;d post a scatterplot so we can see if any trends stand out&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64338</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Sun, 01 Mar 2009 07:07:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64338</guid>
		<description>Very interesting.  Thanks!</description>
		<content:encoded><![CDATA[<p>Very interesting.  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MGL</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64240</link>
		<dc:creator>MGL</dc:creator>
		<pubDate>Sat, 28 Feb 2009 02:59:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64240</guid>
		<description>If you don&#039;t do a correlation but look at groups of players (what I call a &quot;poor man&#039;s correlation&quot;), you find that the players with the highest range UZR&#039;s have the highest error UZR&#039;s (basically the best error rates), and vice versa.

For things like this I don&#039;t like to do correlations.  Because of the small sample sizes, the correlations will be low even if there is a  near prefect relationship.

For example, let&#039;s say that we do it by groups and we find that in one year, all of the highest range UZR players combined - say, an average UZR of +10 per 150 - have an error UZR of +3 (which is high), and all of the low range UZR players, say, -10 on the average, have an error UZR of -3, I think we would say that that is evidence of a strong relationship.  In other words, if a player has lots of range, he is likely to have very good hands as well, and vice versa.

However, if the samples for each player are small enough, the correlation coefficient could be very small - even close to zero for small enough samples.

So we really have to very careful with computing correlations on data points when each data point is a sample of performance.  If those samples are small enough, the correlations can be near zero even with a prefect relationship.  What we are really interested in when we do these correlations to determine &quot;relationships&quot;, such as with this issue, is &quot;true talent&quot; versus &quot;true talent.&quot;  The only way to do that with any precision at all is to use data points that are comprised of large samples of performance - like 5 years of range UZR versus 5 years of error UZR, or something like that.  That is often not too practical of course.

So, as Tango often does, I caution everyone about putting too much stock into correlations between sample data points when those samples are small or even medium size, and I caution researchers about doing correlations when trying to determine relationships and/or drawing too many conclusions from the resultant &quot;r&#039;s&quot;.  It is often better to do the poor man&#039;s correlation which is basically aggregating data to form large samples and then doing a one data point to another data point comparison.  In fact, I almost always do that first and then I might do a correlation just to see how consistent the relationship is among all the players (or whatever it is underlying each data point), again, with the caveat being you are never going to get a large correlation no matter how strong the relationship when the data points are comprised of small samples of performance.

Another nice trick is to compute the maximum correlation coefficient for whatever your sample sizes are for comparison purposes, assuming a perfect relationship.  Tango does this when evaluating projection systems.  For example, let&#039;s say that you want to evaluate a BA projection system or you want to analyze the relationship between one set of BA and another set.  And let&#039;s say that the independent variable was BA in 300 AB or the two sets of BA (the x and y variables) were comprised of around 300 AB.  What is a &quot;good&quot; correlation?  We don&#039;t know unless we at least know what the &quot;prefect&quot; correlation would be. IOW, if every player had the same true BA in each data pair, but each variable was only 300 AB, what would the correlation coefficient be?  I don&#039;t know off the top of my head, but it would probably not be greater than .5 or .6 I wouldn&#039;t think.  IOW, if batter A was a true .270 hitter in data point x and data point y, and batter B were a true .230 hitter in data point x and data point y, etc., if we samples all the batters for 300 AB, we would have  sample batting averages of, like, 263 for batters A in data point x and .281 in data point y (random binomially distributed BA around .270), and for batter B, it might be .240 and .210.  Or whatever.  Now, we know that the relationships between the x and y variables are perfect and that the correlation is 1 given an infinite number of AB per data point, but in only 300 AB per data point, we are likely to get a correlation of .5 or .6 (or whatever it is).  This is critical information to know if we are comparing sets of BA to determine a relationship.  If our sample sizes are 300 AB and we get a correlation of .4 or .5 we can say that that is very strong.  If our sample sizes were 2000 AB, then a correlation of .4 or .5 is not particularly strong.</description>
		<content:encoded><![CDATA[<p>If you don&#8217;t do a correlation but look at groups of players (what I call a &#8220;poor man&#8217;s correlation&#8221;), you find that the players with the highest range UZR&#8217;s have the highest error UZR&#8217;s (basically the best error rates), and vice versa.</p>
<p>For things like this I don&#8217;t like to do correlations.  Because of the small sample sizes, the correlations will be low even if there is a  near prefect relationship.</p>
<p>For example, let&#8217;s say that we do it by groups and we find that in one year, all of the highest range UZR players combined &#8211; say, an average UZR of +10 per 150 &#8211; have an error UZR of +3 (which is high), and all of the low range UZR players, say, -10 on the average, have an error UZR of -3, I think we would say that that is evidence of a strong relationship.  In other words, if a player has lots of range, he is likely to have very good hands as well, and vice versa.</p>
<p>However, if the samples for each player are small enough, the correlation coefficient could be very small &#8211; even close to zero for small enough samples.</p>
<p>So we really have to very careful with computing correlations on data points when each data point is a sample of performance.  If those samples are small enough, the correlations can be near zero even with a prefect relationship.  What we are really interested in when we do these correlations to determine &#8220;relationships&#8221;, such as with this issue, is &#8220;true talent&#8221; versus &#8220;true talent.&#8221;  The only way to do that with any precision at all is to use data points that are comprised of large samples of performance &#8211; like 5 years of range UZR versus 5 years of error UZR, or something like that.  That is often not too practical of course.</p>
<p>So, as Tango often does, I caution everyone about putting too much stock into correlations between sample data points when those samples are small or even medium size, and I caution researchers about doing correlations when trying to determine relationships and/or drawing too many conclusions from the resultant &#8220;r&#8217;s&#8221;.  It is often better to do the poor man&#8217;s correlation which is basically aggregating data to form large samples and then doing a one data point to another data point comparison.  In fact, I almost always do that first and then I might do a correlation just to see how consistent the relationship is among all the players (or whatever it is underlying each data point), again, with the caveat being you are never going to get a large correlation no matter how strong the relationship when the data points are comprised of small samples of performance.</p>
<p>Another nice trick is to compute the maximum correlation coefficient for whatever your sample sizes are for comparison purposes, assuming a perfect relationship.  Tango does this when evaluating projection systems.  For example, let&#8217;s say that you want to evaluate a BA projection system or you want to analyze the relationship between one set of BA and another set.  And let&#8217;s say that the independent variable was BA in 300 AB or the two sets of BA (the x and y variables) were comprised of around 300 AB.  What is a &#8220;good&#8221; correlation?  We don&#8217;t know unless we at least know what the &#8220;prefect&#8221; correlation would be. IOW, if every player had the same true BA in each data pair, but each variable was only 300 AB, what would the correlation coefficient be?  I don&#8217;t know off the top of my head, but it would probably not be greater than .5 or .6 I wouldn&#8217;t think.  IOW, if batter A was a true .270 hitter in data point x and data point y, and batter B were a true .230 hitter in data point x and data point y, etc., if we samples all the batters for 300 AB, we would have  sample batting averages of, like, 263 for batters A in data point x and .281 in data point y (random binomially distributed BA around .270), and for batter B, it might be .240 and .210.  Or whatever.  Now, we know that the relationships between the x and y variables are perfect and that the correlation is 1 given an infinite number of AB per data point, but in only 300 AB per data point, we are likely to get a correlation of .5 or .6 (or whatever it is).  This is critical information to know if we are comparing sets of BA to determine a relationship.  If our sample sizes are 300 AB and we get a correlation of .4 or .5 we can say that that is very strong.  If our sample sizes were 2000 AB, then a correlation of .4 or .5 is not particularly strong.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bryan</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64215</link>
		<dc:creator>Bryan</dc:creator>
		<pubDate>Fri, 27 Feb 2009 23:07:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64215</guid>
		<description>Players with a high number of errors will also probably have low range because they are horrible defenders.  This is a hard thing to test by correlation I think because you will probably have two groups of players that have bad range and either good or bad error totals which would destroy your ability to see any effects.</description>
		<content:encoded><![CDATA[<p>Players with a high number of errors will also probably have low range because they are horrible defenders.  This is a hard thing to test by correlation I think because you will probably have two groups of players that have bad range and either good or bad error totals which would destroy your ability to see any effects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MGL</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64199</link>
		<dc:creator>MGL</dc:creator>
		<pubDate>Fri, 27 Feb 2009 21:17:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64199</guid>
		<description>Eric if you found a positive correlation (.10 or .15, or whatever) and you kept the appropriate signs, then that would indicate that players who had MORE range (a plus range UZR) had FEWER errors (also a plus errors UZR).  So assuming that you kept the signs (plus or minus) intact when you ran the regressions, then even though you found a weak correlation, you did in fact find that, so some small degree, more range equals fewer errors, exactly the opposite of conventional wisdom.  Which is exactly what I found.

As I said in my Book post, I suspect two things are driving the correlation (albeit a weak one):  One, players with better range are probably better athletes, defensive-wise, so they may tend to make fewer errors.  Two, a player who is a better athlete with better range may influence the scorers (in their favor) just a tad.  IOW, who are you more likely to give an error to on a tough play - Pat Burrell or Carl Crawford?

The idea that since a good range fielder fields 3-5% more balls than an average fielder, he makes more errors is not such a tenable argument.  In fact, you could make an argument that the balls that the good fielder is fielding are balls that the bad fielder is making errors on.</description>
		<content:encoded><![CDATA[<p>Eric if you found a positive correlation (.10 or .15, or whatever) and you kept the appropriate signs, then that would indicate that players who had MORE range (a plus range UZR) had FEWER errors (also a plus errors UZR).  So assuming that you kept the signs (plus or minus) intact when you ran the regressions, then even though you found a weak correlation, you did in fact find that, so some small degree, more range equals fewer errors, exactly the opposite of conventional wisdom.  Which is exactly what I found.</p>
<p>As I said in my Book post, I suspect two things are driving the correlation (albeit a weak one):  One, players with better range are probably better athletes, defensive-wise, so they may tend to make fewer errors.  Two, a player who is a better athlete with better range may influence the scorers (in their favor) just a tad.  IOW, who are you more likely to give an error to on a tough play &#8211; Pat Burrell or Carl Crawford?</p>
<p>The idea that since a good range fielder fields 3-5% more balls than an average fielder, he makes more errors is not such a tenable argument.  In fact, you could make an argument that the balls that the good fielder is fielding are balls that the bad fielder is making errors on.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Jensen</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64195</link>
		<dc:creator>Peter Jensen</dc:creator>
		<pubDate>Fri, 27 Feb 2009 20:48:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64195</guid>
		<description>&lt;i&gt;That logic works in terms of total errors but not on a percentage basis, as ed describes. Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.&lt;/i&gt;

The logic is bad in both cases.  A scorer is rarely going to give an error on a missed play if most fielders would never have gotten to the ball in the first place.  The only case that I can think of is if an infielder hurries his throw and throws wildly and a runner or the batter is able to advance to another base.</description>
		<content:encoded><![CDATA[<p><i>That logic works in terms of total errors but not on a percentage basis, as ed describes. Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.</i></p>
<p>The logic is bad in both cases.  A scorer is rarely going to give an error on a missed play if most fielders would never have gotten to the ball in the first place.  The only case that I can think of is if an infielder hurries his throw and throws wildly and a runner or the batter is able to advance to another base.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LarryinLA</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64175</link>
		<dc:creator>LarryinLA</dc:creator>
		<pubDate>Fri, 27 Feb 2009 18:30:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64175</guid>
		<description>This is called fanGRAPHS, afterall.  (I kid).</description>
		<content:encoded><![CDATA[<p>This is called fanGRAPHS, afterall.  (I kid).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: B</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64160</link>
		<dc:creator>B</dc:creator>
		<pubDate>Fri, 27 Feb 2009 16:17:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64160</guid>
		<description>Do you have a scatterplot of the data?</description>
		<content:encoded><![CDATA[<p>Do you have a scatterplot of the data?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: obsessivegiantscompulsive</title>
		<link>http://www.fangraphs.com/blogs/index.php/range-and-errors/#comment-64151</link>
		<dc:creator>obsessivegiantscompulsive</dc:creator>
		<pubDate>Fri, 27 Feb 2009 15:23:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=3287#comment-64151</guid>
		<description>That logic works in terms of total errors but not on a percentage basis, as ed describes.  Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.

There should be some increase because obviously, a ball hit directly to a fielder (relatively) should be easier to field.  Going beyond the normal range puts another degree of difficulty on fielding and thus should lead to more errors on a percentage basis.  

More range leading to less errors on a percentage basis also makes more sense because he can get to balls faster and position himself to safely make the catch, whereas the poor range fielder will find himself out of position more often.

Of course, the above is based on the great info in this post and not doable until the data leads the way.</description>
		<content:encoded><![CDATA[<p>That logic works in terms of total errors but not on a percentage basis, as ed describes.  Because, even as errors increase, the number of balls you get to and catch increases, almost proportionally.</p>
<p>There should be some increase because obviously, a ball hit directly to a fielder (relatively) should be easier to field.  Going beyond the normal range puts another degree of difficulty on fielding and thus should lead to more errors on a percentage basis.  </p>
<p>More range leading to less errors on a percentage basis also makes more sense because he can get to balls faster and position himself to safely make the catch, whereas the poor range fielder will find himself out of position more often.</p>
<p>Of course, the above is based on the great info in this post and not doable until the data leads the way.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

