Can we objectively evaluate advanced fielding data?

Colin has an article at Baseball Prospectus today that is required Sabermetrics 101 reading. It’s been linked already in the THTLinks Twitter feed this morning, but it deserves more prominence and more discussion. This is an article of fundamental importance for the baseball analysis community. Anyone who is evaluating fielding, which is almost everyone in these heady days of Wins Above Replacement (WAR) statistics, needs to read and understand what Colin is saying.

He states an approach that should have been tackled by the analytical community before advanced defensive metrics started to gain such widespread acceptance. How did we ever come to accept such statistics without ever objectively testing them?

Now that they are being tested objectively, it should not surprise us that problems are being found. That does not invalidate the metrics. It is the path to knowledge. We are, after all, on a search for objective knowledge about baseball, are we not? Openly and objectively testing defensive metrics is not the quest of those who want to destroy baseball knowledge, as some will tell you when this topic is broached. It is a path well-worn by sabermetric pioneers, though “small is the gate and narrow the road that leads to life, and only a few find it.”

We want to know what we know and why we know, when we trust it and when we don’t trust it. We want to know what the sources and ranges of the errors are. This way lies improved fielding metrics and the ability to silence critics with facts that can be demonstrated in a way that is convincing, not one that demands blind faith that sabermetricians know what they are doing.

This also does not mean that we should stop using advanced fielding data today or that it has zero utility in objective sabermetrics. First and foremost, this is a clarion call to the community to turn its research efforts toward cracking this problem. Secondly, it’s a wake-up call to understand and quantify the uncertainty in our measurements related to fielding, and to the derivative statistics like WAR, not a call to abandon them altogether.

Scientific inquiry has always operated in an environment of measurements made with uncertainty. This had led scientists to devote great effort to estimating the bounds of that uncertainty in order to determine their confidence in their measurements, and thus their confidence in the conclusions based on those measurements. There is no need to abandon the “science” of fielding measurement. Far from it. There is a need for the application of the time-tested sabermetric approach.

Doubt is not something to be feared. When its source is based on facts, doubt is healthy. Colin’s doubt, which I share, is healthy. Let’s take this opportunity as an analytical community and turn doubt into growth.


Print This Post
Sort by:   newest | oldest | most voted
TOLAXOR
Guest
TOLAXOR

“the subjectivist (i.e. Bayesian) states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science’‘

– IJ GOOD

Mike Fast
Guest
Mike Fast

One correction: My sentence, “How did we ever come to accept such statistics without ever objectively testing them?” would be more accurately stated as “How did we ever come to accept such statistics without thoroughly testing them?”

There has been some objective testing of fielding metrics before, and I don’t mean to diminish that work.  It’s just that I believe we need to do a much more thorough job of testing than we have done.

MikeS
Guest
MikeS
Finally! Too often people take stats at face value and don’t test them.  Too many times I’ve commented along the lines of “bob is a bad fielder” and someone will say “But UZR says he’s good and besides, fielding has less effect on WAR than hitting.”  All those things are true but they do not necessarily mean that A) Bob is a good fielder or B) Fielding is less important than hitting.  What those things really mean is A) our best fielding metric which may not be any good disagrees with you and B) Our model of player value does… Read more »
Colin Wyers
Guest
Colin Wyers

Very nice post, Mike. And thanks for the kind words.

MikeS – Having WAR (whichever version of it you want to name) is more than just measuring agreement with team win totals; it’s a necessary but not sufficient condition to show that WAR(P) agrees with team wins. You need to show that you’re correctly distributing those team wins to individual players.

It’s actually possible that once you get past a certain level of agreement at the team level for an “uberstat,” a metric that’s less predictive of team wins may be preferable, if it does a better job handling split credit.

MikeS
Guest
MikeS
Thanks Colin.  My stats education is about 20 years behind me so I may be wrong but how’s this line of reasoning sound: If you add up all the individual WAR and get the team’s win total you may have a stat that works well for evaluating individual ballplayers. If you get a wildly different different and inconsistently different number your stat probably does not reflect the individual contributions very accurately. (incidentally, this is how UZR and other fielding metrics look to me.  A player will be great one year and bad the next.  Much less consistent than offense or… Read more »
John Walsh
Guest
John Walsh

I’ve been thinking about these same issues over the last couple of years.  So much so that I posted a “mailbag question” on Tango’s blog asking about how we know we can trust UZR or any other defensive metric.

MGL responded with a suggestion on how to test a defensive metric, but he also suggested that UZR was simple enough so that we can just “look at it” and know that it makes sense.  I’m paraphrasing here, you can see the details here:

http://www.tangotiger.net/wiki/index.php?title=Mailbags#Fielding_metric_-_reliability

MikeS
Guest
MikeS
UZR does not “just make sense.”  anybody who says it does is trying to force it down your throat. UZR for Paul Konerko since 2002: -3.1, 2.4, -6.5, 2.8, 0.0, 0.4, -0.9, 1.9, -4.0 Sometimes good, sometimes bad.  For a first baseman.  How does that make sense? UZR fro Alfonso Soriano since 2006 (all LF) 6.7, 31.6, 15.9, -3.1, 4.8 From a little above average to great to below average?  Ask any Cub fan if Soriano was ever acceptable as a LF.  His best position is batters box. UZR for Derrick Lee since 2002 -4.8, -1.3, 1.8, -1.4 1.0, 0.6,… Read more »
Laurent Courtines
Guest
Laurent Courtines
I am more of a lurker in these debates and have minimal math knowledge.  I care about baseball and am fascinated by the work done by Hardball Times,  Baseball ThinkFactory, Baseball Pro, Fangraphs and our populist mouthpiece Rob Neyer. I think the biggest step forward will come when the MLB installs the full field in play effects data.  It is my understanding that they are beginning to get these cameras up and will be able to collect data on the balls in play with ball flight paths, player positions and speeds of the ball in flight.  While the data may… Read more »
John Walsh
Guest
John Walsh

MikeS,

The variation in year-to-year UZR does not necessarily make it incorrect.  It’s possible (actually, I think it’s established) that it takes several years’ worth of data for UZR to stabilize.

I don’t think MGL is trying to shove anything down our throats.  I believe he really finds UZR intuitive and in some sense it is.  The underlying method is straightforward.  But, even though it might be straightforward in concept, there are many details.  And in any case, until you verify that something works by verifying it with real data, you can’t really tell if it’s working or not.

Mark
Guest
Mark

For 2009, if you take actual wins and subtract hitting WAR and pitching WAR (Fangraphs Totals) you get a number anywhere from 34.9 (TB) to 53.0 (Cin) for replacement wins. That’s a 18.1 difference between the high and low.

Colin Wyers
Guest
Colin Wyers

Mark, there’s also a league difference issue to account for in Fangraphs WAR, as they don’t have pitcher hitting sum to zero for the league.

Mark
Guest
Mark

AL range 34.9 to 50.8 15.9 difference
NL range 37.8 to 53.0 15.2 difference

AL average 43.66
NL average 48.57

Cle 29.8 WAR 65 Actual Wins
Bal 22.5 WAR 64 Actual Wins

LAA led the AL in wins over WAR 50.8

wpDiscuz