With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone.
This afternoon, I talked about why defensive statistics are not like offensive statistics, and closed with a statement about why I believe that defensive metrics should be viewed as inferential statistics, rather than the results of something that actually occurred. The definition above states it as well as anything I could write – what we want to do with metrics like the advanced defensive statistics we currently have is to make conclusions based on probability that go beyond the data that we have.
Let’s use a baseball example. The +/- system spit out a +47 rating for Chase Utley for 2008, calling him 47 plays better than an average defensive second baseman last year. It’s such an amazingly high number that, on it’s own, it’s basically unbelievable. Did Utley really display such amazing defense that he got to 47 more balls than an average fielder? And if so, how did such a remarkable performance go basically unnoticed by baseball observers?
Perhaps your initial reaction to such an unbelievable number would be to throw it out and discredit the system. After all, if I invented a metric that said that Chase Utley hit .434 last season, you’d just point to the facts and tell me I was wrong. But with defensive metrics, one of the basic tenets we have to accept is that we just can’t know for certain whether an average fielder would have actually fielded a particular ball, because this mythical average fielder didn’t have a chance to field that ball – only the fielder that we’re watching got a chance to field that ball. Whether anyone else could have fielded that ball has to be inferred, since it cannot be known.
This is the fundamental point to accepting defensive statistics – they know very little and infer an awful lot.
This doesn’t make them wrong or invalid. There are all kinds of statistics in life that are inferential and, when constructed correctly, give us meaningful information to make our life better. Political polling data is one of the best examples, and the match between polling data and baseball statistics got quite a bit of play with Nate Silver’s rise to fame this summer. When the data is handled correctly, inferential statistics help us answer questions we can’t figure out through descriptive statistics, and right now, defensive value is one of those things that must be inferred.
So, how do we view these numbers differently than if they were descriptive in nature? The key is to see them as data points as part of a larger sample and not take any one single data point too seriously. +/- thinks Utley was +47 last year. Okay. That’s nice. We’ll toss it into the stew, along with as many other valid data points as we can gather and determine how confident we can be within certain boundaries based on the sample that we have.
If you’ve taken a college course on statistics, you’ve probably learned about t tests and how to calculate necessary sample sizes based on given data. We won’t go through the math here, but research from guys like Chris Dial, TangoTiger, and MGL suggest that we need at least two years worth of data before we can start drawing reasonable conclusions from the defensive data we have now. Two years is a minimum. Three is a lot better, and gets us close to the point where we can be comfortable with the results.
With several years worth of data, we can be confidant that the sample is large enough that the noise in the data can be reduced to the point where our inferences can be at least generally accurate. Viewed by itself, Utley’s +47 is highly questionable. When viewed in concert with his +20 ratings in both 2006 and 2007, we can infer that Utley is probably something like a +25 defender compared to an average second baseman.
The human factor is still there, and we can’t pretend like a larger sample eliminates noise entirely, but we can begin to be confident that we can describe a player’s defensive value within a given range and be fairly accurate. Maybe we can’t prove that Utley’s a +25 fielder, but we could say that the probability of his real defensive value being between +20 and +30 is very high.
When someone tells you that defensive statistics simply aren’t as reliable as their offensive brethren, they’re right – there’s no doubt that the tools we have to measure offense are more precise than the ones to measure defense. But as statistics like UZR and +/- have come along, our ability to infer reasonably accurate conclusions about defensive value has grown immensely. They aren’t perfect, but when viewed as a data point, and analyzed as an inferential statistic, we can gather all kinds of information that we’ve never had before. And that’s exciting.