<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"
	>
<channel>
	<title>Comments on: R-Squared Fun with BB% and SO%</title>
	<atom:link href="http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/</link>
	<description>Daily baseball statistical analysis and commentary</description>
	<lastBuildDate>Sun, 12 Feb 2012 21:51:40 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Sam</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108666</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 18 Nov 2009 15:38:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108666</guid>
		<description>Also, here is how I would interpret these results: the average pitcher&#039;s correlation between last year&#039;s BB rate and this year&#039;s BB rate is 0.8013, without controlling for other confounding factors. And a one-year cross-section correlation doesn&#039;t tell you much: if you have a time-series of individual pitchers (at least two observations for each pitcher, over a few years), those should be used. And in fact, after a certain age, if you expect them (BB or K rate) to stabilize, that should be the sample to give you your best measure of the degree of persistence (or the skill inherent).</description>
		<content:encoded><![CDATA[<p>Also, here is how I would interpret these results: the average pitcher&#8217;s correlation between last year&#8217;s BB rate and this year&#8217;s BB rate is 0.8013, without controlling for other confounding factors. And a one-year cross-section correlation doesn&#8217;t tell you much: if you have a time-series of individual pitchers (at least two observations for each pitcher, over a few years), those should be used. And in fact, after a certain age, if you expect them (BB or K rate) to stabilize, that should be the sample to give you your best measure of the degree of persistence (or the skill inherent).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108663</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 18 Nov 2009 15:27:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108663</guid>
		<description>R. J.,

Can you include the standard error of the coefficients in the regression when you report them (or p-values, depending on your taste)? These numbers have no meaning without the measure of uncertainty around them.

Also, some other controls should probably be included. The improvement in walk rate or K rate could also be a function of tenure in the league/age. Those factors need to be controlled for.

Finally, as MGL states at length, R^2 is about the last thing that I will trumpet to declare the usefulness of a regression result.</description>
		<content:encoded><![CDATA[<p>R. J.,</p>
<p>Can you include the standard error of the coefficients in the regression when you report them (or p-values, depending on your taste)? These numbers have no meaning without the measure of uncertainty around them.</p>
<p>Also, some other controls should probably be included. The improvement in walk rate or K rate could also be a function of tenure in the league/age. Those factors need to be controlled for.</p>
<p>Finally, as MGL states at length, R^2 is about the last thing that I will trumpet to declare the usefulness of a regression result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: intricatenick</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108662</link>
		<dc:creator>intricatenick</dc:creator>
		<pubDate>Wed, 18 Nov 2009 15:27:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108662</guid>
		<description>Is the data normal? If you look at the X axis and the Y axis by themselves it looks like you have a little bit too much data on the ends and the tails are a little fatter than a strict normal distribution. The further you get away from that idealized picture the less the R^2 will tell you about what you are measuring since the tails will have more influence on the metric since it relies on the squared distance of the sample points from the generated regression line. R^2 is actually very rarely useful - but it is simple to explain and therefore gets used a lot.</description>
		<content:encoded><![CDATA[<p>Is the data normal? If you look at the X axis and the Y axis by themselves it looks like you have a little bit too much data on the ends and the tails are a little fatter than a strict normal distribution. The further you get away from that idealized picture the less the R^2 will tell you about what you are measuring since the tails will have more influence on the metric since it relies on the squared distance of the sample points from the generated regression line. R^2 is actually very rarely useful &#8211; but it is simple to explain and therefore gets used a lot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sandy</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108635</link>
		<dc:creator>Sandy</dc:creator>
		<pubDate>Wed, 18 Nov 2009 12:54:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108635</guid>
		<description>Question.  The chart says BB% and SO% ... but doesn&#039;t clearly identify the denominator.  Is it innings -- *OR* is it batters faced?  For pitchers with excellent (or awful) defense, this could make a difference.  I know that BB/9 and K/9 are the most commonly displayed percentages ... but most of the top notch metrics for pitchers remember to use batters faced instead to avoid defensive quality from skewing results in individual cases.</description>
		<content:encoded><![CDATA[<p>Question.  The chart says BB% and SO% &#8230; but doesn&#8217;t clearly identify the denominator.  Is it innings &#8212; *OR* is it batters faced?  For pitchers with excellent (or awful) defense, this could make a difference.  I know that BB/9 and K/9 are the most commonly displayed percentages &#8230; but most of the top notch metrics for pitchers remember to use batters faced instead to avoid defensive quality from skewing results in individual cases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MGL</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108628</link>
		<dc:creator>MGL</dc:creator>
		<pubDate>Wed, 18 Nov 2009 09:02:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108628</guid>
		<description>The magnitude of an r or r^2 does NOT tell is whether something &quot;is a skill&quot; or not or even &quot;how much of a skill&quot; it is.  In fact, the terms &quot;is a skill&quot; or &quot;how much of a skill&quot; are almost meaningless with respect to an r or r^2.

The magnitude of an r or r^2 in this kind of a regression is a function of two things and two things only.  One, the spread of &quot;skill&quot; (the true rate for each player) in the sample of players (actually the population from which the sample is drawn) and two, the sample size of each variable (dependent and independent).

In this case, he is regressing one year on another year and the sample size of each year is around 150 innings (&gt;100 IP).

To illustrate how the magnitude of an &quot;r&quot; has nothing to do with the &quot;level of skill&quot; that a metric reflects, consider that if he had done the same regression, but regressed 5-year samples on other 5-year samples for each player, the r would be much higher, in the .8 range or so.  Would we then say that a pitcher&#039;s K or BB rate is very much a skill now?  And what if he did the regression using months rather than seasons?  The &quot;r&quot; would be in the .2 or .3 range.  Did the skill associated with BB and K rates suddenly diminish?

You might say, &quot;Yeah, but with smaller samples, there is less skill and more luck,&quot; and you would be right. But that still is not necessarily reflected in the &quot;r&quot;.
Imagine that we took all pitchers with around the same BB or K rates, and put them in their own league.  BB and K rates would still very much be a skill, wouldn&#039;t it?  Do you know what the &quot;r&#039;s&quot; would be if we ran regressions using players from this league?  Zero!  Regardless of the sample size - year to year, month to month, 5 years to 5 years. All zero!  So now we have the same &quot;skill&quot; - BB and K rate - and large samples, but somehow the &quot;r&quot; is zero. Does that mean that BB and K rates have NO skill involved?  No.  As I said, an &quot;r&quot; in this kind of analysis is a reflection of the skill/noise ratio AND the spread of that skill among the population of players from which the sample is drawn. 

So let me repeat.  All the the &quot;r&quot; tells us is some combination of the sample size in each of the variables and the spread of true talent in the population with respect to that attribute being measured.  And without further testing, we have no idea which is stronger - the sample size effect or the spread of true talent.  With little spread in true talent, if we have large enough sample sizes, the &quot;r&quot; will approach 1.  And with a large spread of true talent, even with small sample sizes, the &quot;r&quot; will approach zero.  In addition, a small sample for one attribute may be a large sample for another.  That depends on the &quot;noise&quot; (sample and measurement error) inherent in measuring that attribute.

For example, in measuring a player&#039;s true speed, there is little inherent sample and measurement error and you can get a pretty good idea of a player&#039;s true speed with just a few measurements.  To measure a player&#039;s true batting skill by recording their hits and outs, you obviously need several thousand opportunities to get a similar reliability.  Does that mean that speed is &quot;more of a skill&quot; than hitting ability?

And to reiterate one of my earlier points, even though it is easy to reliably measure a player&#039;s true speed, if all players in a sample/population had the same speed, any regression from one time period to another would yield a zero &quot;r&quot;.  Would that mean that speed is not a skill? But if there were a spread of speed (some players are faster than others) in that sample/population, even a small one, a regression where the sample sizes were small would yield an &quot;r&quot; close to 1.</description>
		<content:encoded><![CDATA[<p>The magnitude of an r or r^2 does NOT tell is whether something &#8220;is a skill&#8221; or not or even &#8220;how much of a skill&#8221; it is.  In fact, the terms &#8220;is a skill&#8221; or &#8220;how much of a skill&#8221; are almost meaningless with respect to an r or r^2.</p>
<p>The magnitude of an r or r^2 in this kind of a regression is a function of two things and two things only.  One, the spread of &#8220;skill&#8221; (the true rate for each player) in the sample of players (actually the population from which the sample is drawn) and two, the sample size of each variable (dependent and independent).</p>
<p>In this case, he is regressing one year on another year and the sample size of each year is around 150 innings (&gt;100 IP).</p>
<p>To illustrate how the magnitude of an &#8220;r&#8221; has nothing to do with the &#8220;level of skill&#8221; that a metric reflects, consider that if he had done the same regression, but regressed 5-year samples on other 5-year samples for each player, the r would be much higher, in the .8 range or so.  Would we then say that a pitcher&#8217;s K or BB rate is very much a skill now?  And what if he did the regression using months rather than seasons?  The &#8220;r&#8221; would be in the .2 or .3 range.  Did the skill associated with BB and K rates suddenly diminish?</p>
<p>You might say, &#8220;Yeah, but with smaller samples, there is less skill and more luck,&#8221; and you would be right. But that still is not necessarily reflected in the &#8220;r&#8221;.<br />
Imagine that we took all pitchers with around the same BB or K rates, and put them in their own league.  BB and K rates would still very much be a skill, wouldn&#8217;t it?  Do you know what the &#8220;r&#8217;s&#8221; would be if we ran regressions using players from this league?  Zero!  Regardless of the sample size &#8211; year to year, month to month, 5 years to 5 years. All zero!  So now we have the same &#8220;skill&#8221; &#8211; BB and K rate &#8211; and large samples, but somehow the &#8220;r&#8221; is zero. Does that mean that BB and K rates have NO skill involved?  No.  As I said, an &#8220;r&#8221; in this kind of analysis is a reflection of the skill/noise ratio AND the spread of that skill among the population of players from which the sample is drawn. </p>
<p>So let me repeat.  All the the &#8220;r&#8221; tells us is some combination of the sample size in each of the variables and the spread of true talent in the population with respect to that attribute being measured.  And without further testing, we have no idea which is stronger &#8211; the sample size effect or the spread of true talent.  With little spread in true talent, if we have large enough sample sizes, the &#8220;r&#8221; will approach 1.  And with a large spread of true talent, even with small sample sizes, the &#8220;r&#8221; will approach zero.  In addition, a small sample for one attribute may be a large sample for another.  That depends on the &#8220;noise&#8221; (sample and measurement error) inherent in measuring that attribute.</p>
<p>For example, in measuring a player&#8217;s true speed, there is little inherent sample and measurement error and you can get a pretty good idea of a player&#8217;s true speed with just a few measurements.  To measure a player&#8217;s true batting skill by recording their hits and outs, you obviously need several thousand opportunities to get a similar reliability.  Does that mean that speed is &#8220;more of a skill&#8221; than hitting ability?</p>
<p>And to reiterate one of my earlier points, even though it is easy to reliably measure a player&#8217;s true speed, if all players in a sample/population had the same speed, any regression from one time period to another would yield a zero &#8220;r&#8221;.  Would that mean that speed is not a skill? But if there were a spread of speed (some players are faster than others) in that sample/population, even a small one, a regression where the sample sizes were small would yield an &#8220;r&#8221; close to 1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: vivaelpujols</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108602</link>
		<dc:creator>vivaelpujols</dc:creator>
		<pubDate>Wed, 18 Nov 2009 04:47:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108602</guid>
		<description>Right.  Even if it was 100% skill, you would expect a lowish r^2 due to random variation (actually, that could be an interesting study.  You could use the binomial distribution to find the expected year to year correlation of a stat that&#039;s 100% skill... but I digress).  

So yeah, K and BB rates are going to be skill for the most part (however, I did find that Pineiro was probably getting a bit lucky with his walks this year), and when you see a massive drop or rise, it&#039;s most likely a combination of skill and luck, but much more skill than luck... if that makes sense.</description>
		<content:encoded><![CDATA[<p>Right.  Even if it was 100% skill, you would expect a lowish r^2 due to random variation (actually, that could be an interesting study.  You could use the binomial distribution to find the expected year to year correlation of a stat that&#8217;s 100% skill&#8230; but I digress).  </p>
<p>So yeah, K and BB rates are going to be skill for the most part (however, I did find that Pineiro was probably getting a bit lucky with his walks this year), and when you see a massive drop or rise, it&#8217;s most likely a combination of skill and luck, but much more skill than luck&#8230; if that makes sense.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: joker</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108600</link>
		<dc:creator>joker</dc:creator>
		<pubDate>Wed, 18 Nov 2009 04:36:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108600</guid>
		<description>They are significant, I&#039;ve seen the (your?) stuff on it, but dare I venture to say Verlander or Lester&#039;s skyrocketing rates have a lot more to do with their FB jumping by 2.0 and 1.5 MPH respectively---or in reverse, Kazmir losing velo/Joba going from relieving to starting.  That&#039;s variation in true skill.

I guess the point stands that &quot;it&#039;s not 100% skill&quot;, but that&#039;s kind of a pointless statement then as no statistic is measuring 100% skill.</description>
		<content:encoded><![CDATA[<p>They are significant, I&#8217;ve seen the (your?) stuff on it, but dare I venture to say Verlander or Lester&#8217;s skyrocketing rates have a lot more to do with their FB jumping by 2.0 and 1.5 MPH respectively&#8212;or in reverse, Kazmir losing velo/Joba going from relieving to starting.  That&#8217;s variation in true skill.</p>
<p>I guess the point stands that &#8220;it&#8217;s not 100% skill&#8221;, but that&#8217;s kind of a pointless statement then as no statistic is measuring 100% skill.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: vivaelpujols</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108589</link>
		<dc:creator>vivaelpujols</dc:creator>
		<pubDate>Wed, 18 Nov 2009 03:58:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108589</guid>
		<description>&quot;There isn’t a lot of luck involved in a BB or K outside of umpire calls and quality of competition.&quot;

The problem is, that those two luck variables are a lot more relevant than you might think.</description>
		<content:encoded><![CDATA[<p>&#8220;There isn’t a lot of luck involved in a BB or K outside of umpire calls and quality of competition.&#8221;</p>
<p>The problem is, that those two luck variables are a lot more relevant than you might think.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: joker</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108585</link>
		<dc:creator>joker</dc:creator>
		<pubDate>Wed, 18 Nov 2009 03:44:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108585</guid>
		<description>Whoa now: &quot;Minimal difference. Both are skills, albeit not 100% so.&quot;

Just because something isn&#039;t correlated r^2=1 doesn&#039;t mean it&#039;s not almost entirely a skill.  Randomness does not imply---for lack of a better term---luck.  There isn&#039;t a lot of luck involved in a BB or K outside of umpire calls and quality of competition.  The variation is almost entirely simply variation in pitcher&#039;s actual skill levels year to year i.e. new pitches, arm strength/health, mechanics changes, pitching strategy changes etc.</description>
		<content:encoded><![CDATA[<p>Whoa now: &#8220;Minimal difference. Both are skills, albeit not 100% so.&#8221;</p>
<p>Just because something isn&#8217;t correlated r^2=1 doesn&#8217;t mean it&#8217;s not almost entirely a skill.  Randomness does not imply&#8212;for lack of a better term&#8212;luck.  There isn&#8217;t a lot of luck involved in a BB or K outside of umpire calls and quality of competition.  The variation is almost entirely simply variation in pitcher&#8217;s actual skill levels year to year i.e. new pitches, arm strength/health, mechanics changes, pitching strategy changes etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LeeTro</title>
		<link>http://www.fangraphs.com/blogs/index.php/r-squared-fun-with-bb-and-so/#comment-108581</link>
		<dc:creator>LeeTro</dc:creator>
		<pubDate>Wed, 18 Nov 2009 03:12:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.fangraphs.com/blogs/?p=11689#comment-108581</guid>
		<description>I just figured out that the regression line actually shows regression towards the mean. (who woulda thunk?)  I forgot to factor in the intercept in my initial analysis.

Unrelated, I wonder if league changes really affected the results that much.  Other than Blanton, who already had a third of a season with PHI last year, no other league changing pitchers up there, one way or another.</description>
		<content:encoded><![CDATA[<p>I just figured out that the regression line actually shows regression towards the mean. (who woulda thunk?)  I forgot to factor in the intercept in my initial analysis.</p>
<p>Unrelated, I wonder if league changes really affected the results that much.  Other than Blanton, who already had a third of a season with PHI last year, no other league changing pitchers up there, one way or another.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

