What Mean Do You Regress Defensive Metrics To?

Jerry Crasnick has an excellent article on defensive metrics as they relate to valuing free agents, especially diving into how they affect Matt Holliday and Jason Bay. It’s no secret that, as the hosts of UZR, we’re big proponents of its usefulness. However, I still agree with essentially everything in Crasnick’s article.

There are aspects of defense that zone-based metrics won’t capture. There are results from UZR that make you scratch your head and say “really?” There is value in having the experienced eyes of a scout watch a player and offer an opinion on the abilities that he saw. We agree with all of that.

The cases where the value of metrics like UZR are the most contentious are when the results diverge significantly from what the perceived scouting wisdom says about a player. Often times, the reaction to counterintuitive data is to dismiss it entirely, offering up the example as evidence that the metric is flawed beyond use. Or, on the other side, to offer up the player’s numbers as proof that scouts just don’t get it, and that subjective opinions are worthless. Simply go back and re-read the threads about Mark Teixeira‘s defense over the summer to see this effect in full force on both sides.

In reality though, both positions are wrong. Re-quoting the assistant GM from Crasnick’s piece:

“If there’s some kind of discrepancy, you need to use your best judgment,” the assistant says. “If a scout says, ‘This guy stinks,’ but the numbers say he’s excellent, the truth probably lies somewhere in between.”

This is essentially a paraphrase of the concept of regression to different means. If we have two players with identical UZRs, but scouts love one and abhor the other, our projection for their relative UZRs going forward should favor the one preferred by scouts. The fact that observational information is available gives us a useful data point to add to the calculation, pushing forward analysis that leads to “best judgment”.

I said last week that I think Teixeira is probably a bit better defensively than his recent UZR scores have indicated, and the foundation of that belief lies in the value of scouting information. Teixeira is revered by almost every scout in the game as an exceptional defensive first baseman. That matters when we’re projecting future defensive performance. There is no reason to simply ignore those opinions simply because they don’t line up with what UZR has measured. We account for those opinions by regressing Teixeira’s UZR projections to a different mean than a player that scouts are less enamored of.

UZR is a tool. Scouts are a tool. They can be used together to produce better information than either can on their own. It is not an either/or proposition. Use both.



Print This Post



Dave is the Managing Editor of FanGraphs.


Sort by:   newest | oldest | most voted
joser
Guest
joser
6 years 6 months ago

But is there an “asterisk” column in the fielding section on Fangraphs so we know which UZR scores we can accept at face value and which ones we need to adjust? Or will those adjustments be taken into account in some future refinement of UZR?

Because right now if I’m trying to evaluate (say) one player vs another, I can compare their wOBA numbers and be fairly confident that has captured most of their offensive value and, therefore, I know which one is better at the plate. But — without doing further research — I don’t have as much confidence when comparing their UZRs, and since the value calculation here uses UZR as well, I can’t really trust that either. And yeah, we should all do more research and not do two-bit 30-second analysis, but that’s about all the time I have… and even if I have more time, I’m not even sure where I go to get the definitive “scout’s consensus” on any given player, let alone how to translate that into “regressing to a different mean.”

So help a brother out: what should we be doing?

Sam
Guest
Sam
6 years 6 months ago

I can compare their wOBA numbers and be fairly confident that has captured most of their offensive value and, therefore, I know which one is better at the plate.

How confident?

One thing I would like fangraphs to add to its impressive list of statistics are the error bars, or 95 percent confidence interval. The only writer who speaks of statistical uncertainty surrounding metrics consistently and incorporates in his analysis is Dave Allen. I would like to see more authors follow suit.

Xeifrank
Guest
6 years 6 months ago

I think the error bars should only be given with projections. I don’t want to see error bars on accumulated stats. I just want to see what their production was. I believe Zips projections have the error bars if you go straight to the source. I don’t think it’s necessary for Fangraphs to carry them when the reader can find them himself. Fangraphs already does enough of the hard leg work.
vr, Xei

Sam
Guest
Sam
6 years 6 months ago

I just want to see what their production was.

Are these “productions” (like wOBA) measured with certainty, like runs scored? If not, then error bars are absolutely critical. And secondly, it is more important when it comes to defense.

These productions that you talk about, are measured based on the average run value of each event using linear weights. These average run values are random variables, and therefore the linear combinations of these random variables are themselves random variables. Therefore, there should be error bars attached to them.

WY
Guest
WY
6 years 6 months ago

I agree with Sam. I think we’d all just like to see what the player’s production was, but that doesn’t change the fact that many of these stats (such as runs saved on defense, to cite one example) are ultimately estimates. That doesn’t make them wrong, but it does mean that there is going to be some uncertainty surrounding them. That is unavoidable.

I agree with Sam that it would be interesting to get a sense of how large or small the error might be for some of these stats. I also think it would help rein in some of the people who get a little too casual in tossing around WAR values to the nearest decimal point as if those numbers were set in stone. These stats are useful, but they are not monolithic or perfect, nor should we expect them to be.

Sam
Guest
Sam
6 years 6 months ago

Linear weights are presented as a context neutral statistic. Therefore, there is no “estimate” – we literally are saying we don’t care how many runs the hit actually produced. It’s an average because we’re intentionally removing the context of the play.

Respectfully disagree. Removing context is not equivalent to removing statistical uncertainty, which may result from factors that are unmeasurable, or factors the statistician cannot control for. Because it is an average, an expected run value from a specific event if you will, it should have error bars or 95 (or 99) percent confidence interval surrounding it. It may turn out that the standard error will be extremely low due to huge sample sizes that are used to calculate the linear weights, but there still is uncertainty.

vivaelpujols
Guest
6 years 6 months ago

Actually Dave, UZR is probably more accurate (in terms of identifying true talent level) than wOBA. wOBA splits everything up into a few buckets. If a guy hits a rope that’s caught at second base, wOBA gives him a 0 for that play. UZR has a lot more buckets than wOBA, so it will be a better indicator of a players true ability than wOBA. That’s why we see UZR have similar year to year correlation as wOBA, despite a much smaller sample size for defense.

Logan
Guest
Logan
6 years 6 months ago

Good work Dave, as always.

The title though- you ended a sentence with a preposition. Bad!

Dingo
Guest
Dingo
6 years 6 months ago

This is exactly the kind of grammatical error up with which I shall not put!

don
Guest
don
6 years 6 months ago

Thanks, Winston.

Carson Cistulli
Editor
Member
6 years 6 months ago

You know, I was looking into this recently and found that almost the only reason for this rule (i.e. Thou shalt not end thy sentence with a preposition) existing in English is because it exists in Latin. Same thing for splitting an infinitive.

Thing is, it’s literally impossible to split an infinitive in Latin, because the infinitive form in Latin (amare – to love, ludere – to play) is only a single word. Likewise for prepositions at the end of the sentence — prepositions in Latin are followed immediately (with few exceptions) by either the ablative or accusative form of the noun in question. They don’t make sense otherwise.

Till recently, I always made a point of placing my prepositions before the corresponding nouns — so that I could broadcast my boarding school education, if for no other reason.

Now the only way I can broadcast said education is by providing long-winded and largely uninteresting observations on English usage in the comments sections of better baseballing websites.

Such is the struggle of life.

Xavier
Guest
Xavier
6 years 6 months ago

Yup.

I’d only add that sometimes — sometimes — not ending sentences with prepositions forces you to put the preposition next to the noun it modifies and can clear things up.

Isn’t “put up with” a set phrase, in any event, and not subject to that rule? (I get the reference).

WY
Guest
WY
6 years 6 months ago

I agree — the preposition rule is one that can be broken, especially since the alternative can sound so much more awkward and uptight. It’s not using “your” instead of “you’re” of “there” instead of “their” or even “it’s” instead of “its” — those mistakes drive me nuts.

neuter_your_dogma
Guest
neuter_your_dogma
6 years 6 months ago

Man you people scare me.

Newcomer
Guest
Newcomer
6 years 6 months ago

Also note that English is a Germanic language. Anyone familiar with German will see that they regularly end their grammatically-correct sentences with “separable prefixes,” which are functionally equivalent (and usually identical) to prepositions. Yet I still prefer to keep my prepositions in their Latin place.

joser
Guest
joser
6 years 6 months ago

Also note that English is a Germanic language

Yes, and it has big chunks of Romance languages (thanks to the Norman conquest) and Scandinavian (thanks to Knute and his bunch) and Greek and Latin (thanks to the Renaissance and the Enlightenment) and little blobs of vocabulary from all over everywhere else. It’s a sprawling mongrel of a language with exceptions to every rule and a glorious tolerance for breaking them, which is why it’s been so successful.

Man you people scare me.

There’s nothing of which to be scared.

Just mind your prepositions, and nobody gets hurt.

DL80
Guest
DL80
6 years 6 months ago

The Chicago Manual of Style notes that infinitives can be split if not doing so makes the sentence sound awkward. Same goes for prepositions at the end. Also, as noted below, any phrase that naturally ends in a preposition is required to remain that way. Thus, “put up with” can end a sentence.

ineedanap
Guest
ineedanap
6 years 6 months ago

Simple solution:

“What Mean Do You Regress Defensive Metrics To, @ssh0le”

Logan
Guest
Logan
6 years 6 months ago

I’ve heard that joke before.

brian recca
Guest
brian recca
6 years 6 months ago

I can live with the opinions of scouts. Lets not get scouts confused with casual fans and other baseball people. I’m looking at you Joe Morgan.

Rick
Guest
6 years 6 months ago

It would be interesting to do an analysis of the variability between UZR and scouting and try to see if there systematic differences that can shed light on disagreement.

What elements of defense are scouts including that can’t be captured by UZR, such as backing up other players, relay throws, 1B scooping/stretching, etc.? What things is UZR picking up that scouts might be undervaluing, such as positiong or making out of zone plays that other players can’t even attempt.

Are scouts likely to regress their assessment too far back to a player’s previous level of performance? What’s the impact of sample size — scouts are assessing ability moreso than performance and these presumably converge given a big enough sample. So what sample size are scouts using?

I imagine that a good portion of the differences can be explained by a few consistent gaps in approach.

Patrick
Guest
Patrick
6 years 6 months ago

I like what Rick said there. I think it would be very interesting to see a detailed analysis of disagreements to try to find a commonality.

Bill
Guest
Bill
6 years 6 months ago

Or it could be that UZR is not an especially good measure of 1B defense, since much of the ability to catch errant throws from other fielders is lost in the data.

jlong
Guest
jlong
6 years 6 months ago

I agree all the way. UZR seems to work great for OF and the 3 infield positions. But the results seem off for 1B, and it makes sense because such a large part of 1B defense is picking bad throws.

I have no doubt that defensive metrics will continue to grow and eventually catch up to the offensive metrics, but they will not be able to until somebody finds a way to better quantify the defensive contribution of 1B, Catchers and Pitchers.

WY
Guest
WY
6 years 6 months ago

Mr. Cameron, you are on a roll lately. I agree that the Crasnick article was really good. Your analysis of the Belichick situation and the broader issues involved was also right on the money.

Jacob Jackson
Guest
Jacob Jackson
6 years 6 months ago

Dave will soon regress to the mean as well.

JB
Guest
JB
6 years 6 months ago

But WHICH mean?

Big Oil
Member
Big Oil
6 years 6 months ago

I, too, agree with the conclusion drawn here and analysis presented in favor of it.

From a limited perspective of developing FanGraphs to more comprehensively measure the talents of players, I thought it would be interesting for the site to contract with either a current or former scout (“Scout X” so as to preserve anonymity) regarding performances of players where there is this sort of Teixera-like disagreement. One of the more statistically-inclined writers would or could collaborate, bringing into the discourse both views on what right now is a statistically-heavy analysis.

As many on here would agree, there is more to the SABR community that numbers without any form of observation, and continuing to recognize that ultimately puts one in a better position to make the best judgment possible, as was the point here. Well done.

Big Oil
Member
Big Oil
6 years 6 months ago

3rd paragraph: “that” to “than”.

Sky Kalkman
Member
6 years 6 months ago

Problem is, I’m not sure we want to listen to one, single scout. A consensus of a few is a lot better. And for many purposes, Tango’s Fan Scouting Report can fill in nicely in lieu of a professional.

Paul Thomas
Guest
Paul Thomas
6 years 6 months ago

I’m not really sure how true this is. I know fans are instructed not to think about defensive metrics and not to think about what other people have said, but… why would you expect that to be any more effective with a jury voting on baseball skill than with a jury voting on convictions?

There’s no way to get a pool of people who are untainted. Given that, I’d rather rely on experts (ideally several of them).

neuter_your_dogma
Guest
neuter_your_dogma
6 years 6 months ago

Whew . . . and to think all season I insisted that Raul Ibanez was a better fielder than Jayson Werth.

Paul
Guest
Paul
6 years 6 months ago

I like your position on this, Dave, but how does this affect WAR? We see on this site all the time claims like David DeJesus is more valuable than Jason Bay, etc. because of the huge defensive value difference based on UZR. Should this view therefore lead to a softening of WAR as the absolute that it is often portrayed as? To the DeJesus-Bay comp, the “book” on David agrees with UZR, but do scouts really believe that Bay is “dipped in a vat of rotting skunk urine” bad as UZR says? I think this is where the error bars concept is intriguing, in cases like Bay. In other words, it is theoretically possible for a fielder to cost his team 70 or 80 runs a year on defense, but is that extreme outlier really realistic?

Sky Kalkman
Member
6 years 6 months ago

On Bay, I’m not sure it’s as much the standard issue error bars as it is a potential for a systematic error with park issues.

Choo
Member
6 years 6 months ago

When there is a discrepancy among scouts and stats, I agree that the truth generally lies somewhere in between. However, scouts give bonus points for aesthetic attributes like soft hands, nifty footwork and other skills which are vital physical components to a calculable event (intercepting a ball at x location and creating an out) but are incalculable on their own (stdev for nifty feet = ?).

Take Adam Kennedy and Felipe Lopez for example. The two have been nearly identical in all components of UZR during the previous three seasons at 2B, yet Kennedy (fundamentally sound and dipped in molasses) is a cult hero among the scouting community while Lopez (fundamentally suspect and dipped in pico de gallo) is generally regarded as something far less. One guy makes scouts drool, the other makes them cringe, and the ultimate result (outs) is exactly the same.

Padman Jones
Guest
Padman Jones
6 years 6 months ago

I wonder how much of the anecdotal evidence is based upon what happened in years previous. I’d be willing to bet that a good deal of the people who talk about Teixeira being a superb defender are basing that on what they’ve heard as much as what they’ve seen. Plus, there’s a confirmation bias in play – if you, or a scout goes to a game already having it in their head that Teixeira (whom I choose because he’s one of the few guys who’s commonly perceived as a great defender), and he makes one great play, you’ve got your confirmation that he’s great. And if he makes mistakes, people are more likely to attribute it to his having a bad day.

Yeah, defense is hard to measure, but absolute trust in either UZR or scouting isn’t the only problem facing its evaluation. I feel like we’ve also got to dispose of hearsay evidence that, especially with defense, seems to keep on being parroted for seasons on end.

scatterbrian
Guest
scatterbrian
6 years 6 months ago

Is Yoda writing the Fangraphs headlines now?

The Real Neal
Guest
6 years 6 months ago

I notice that there is nothing really about regressing the metrics to a mean in this article – it’s more about tempering the values of two systems that give desperate information.

Anyway, it’s never been proven that UZR is actually, you know, correct. It was built backwards starting with \I want a stat that will translate defense into runs’ and doing that caused it a lot of problems. When it was first introduced, I pointed out I think 7 logical and statistical flaws in it, which MGL to my knowledge never refuted or corrected.

The 2004/2005 off-season gave a great way to test the value of it as a metric, because you had three shortstops swap teams. Eckstein, Renteria and Cabrera. As it turned out – by far the best predictor of how a team’s shortstop would do with UZR was not who the shortstop was, but who the team was. It’s sort of sad that no one has actually challenged it since. The general consensus seems to be ‘it’s got a lot of math in der, it must be correct.’,

Michael
Guest
6 years 6 months ago

I don’t want to speak for MGL, but I’ve read through much of the introductory post comments of BBTF regarding UZR, and I saw MGL carefully go through many of the valid points addressed.

At the risk of opening a can of “stuff I don’t want out here,” what flaws did you see in the model?

The Real Neal
Guest
6 years 6 months ago

Well it was five years ago.

Off the top of my head – line drives, throwing, the way that hits through the hole get apportioned responsibility to the fielders, advanced scouting and defensive alignment, the ability of pitchers to pitch to a gameplan… and then there’s the standard misuse of statistics. You can’t say “my sample size is 500 plays” and then start applying things like park factors and pitchers factors to your sample of 500 – because what you in reality are doing is cutting up your samples so that you no longer have samples large enough to apply any confidence interval.

Even MGL guesstimated his SD to be 5 runs. The reason he can’t give an actual confidence interval is because the statistical analysis is so fudged.

walkoffblast
Guest
walkoffblast
6 years 6 months ago

The other important point is not to confuse what scouts believe with what large mob of average joes believes. It always amused me to see people in disbelief of Jeter’s defensive issues ask scouts about it and they would say something along the the lines of if I cannot tell he sucks going to his left then I need to be fired.

Jeff Lewandowski
Guest
Jeff Lewandowski
6 years 6 months ago

Ryan Braun is the one who really sticks out in the UZR rankings. A few folks I know who watch the game carefully feel that Braun has good range and tracks balls quite well. UZR has his ranked as one of the worst LF in baseball in 2009.

Michael
Guest
6 years 6 months ago

Braun also rates well with the fans. It’s one of those strange issues where it appears a player has all the tools to succeed, but maybe he’s just not using them all.

The Real Neal
Guest
6 years 6 months ago

Or it could be that Mike Cameron steps in front of Braun and Hart to catch a lot of balls that both of the outfielders could catch. Which would:

1. Explain the disparity in Braun’s score
2. Explain why Hart who used to play center is a ‘bad’ right fielder
3. Explain why Cameron who is clearly in the decline phase of his fielding career, like just about every other center fielder for the last 120 years, seems to be rejuvinated by joining the Brewers.
4. Illustrate the other flaw with UZR that I forgot above.

Joe R
Guest
Joe R
6 years 6 months ago

I still think Ellsbury is a much bigger point of contention than, say, Teixeira. Most of Ellsbury’s perceived value is his glove, but his UZR was terrible.

And then you have some people who think he’s the best defensive CF in Red Sox history already, and others who think he plays like a blind man.

I like Crasnick, though, his stuff is well worth reading.

walkoffblast
Guest
walkoffblast
6 years 6 months ago

I have yet to see a credible source suggest Ellsbury did not let an above average amount of balls fall in front of him for whatever reason this year.

CH
Guest
CH
6 years 6 months ago

I love any article that implores people to think for themselves and not treat ANY single piece of information as gospel, whether we’re talking about baseball or anything else. So, thanks.

Aqua Narc
Guest
Aqua Narc
6 years 6 months ago

5 bucks says Dayton Moore was the “AL GM” who said that Holliday sucks in the field.

Also, did Yoda write the title to this post?

wpDiscuz