How Much Is Fielding Weighted in WAR?

Occasionally (okay, rather frequently), I’ll see people debate the accuracies between the WAR displayed on FanGraphs and Rally’s WAR on Baseball-Reference.

Joe Posnanski speculated on the differences in a recent article about Josh Hamilton’s MVP chances:

*I could be reading this wrong, but Fangraphs seems to put more emphasis on defense. For instance, Carl Crawford’s WAR at Baseball Reference is 3.7 — his defense is worth eight runs above average. But Fangraphs credits him for 22 runs above average, which thrusts his WAR up to 5.6 and into the No. 4 spot in baseball.

I’ve seen similar sentiments echoed throughout the blogosphere and on Twitter.

In reality, on a per-player basis in 2009, UZR distributed 441 fewer runs than TZ did, excluding pitchers and catchers. And there is not a year that UZR is available where its absolute value has been higher than TZ.

In 2009, the maximum spread of UZR was +31 to -37 and TZ showed a similar spread of +31 to -34. Here’s a graph of the full spread. The blue overlap shows the points at which TZ starts showing a greater spread.

This might not be a perfect comparison in how much defense actually contributes to WAR and a better one might be how much as a whole does fielding contribute to total runs. In 2009, fielding made up about 14.2% of all positive and negative runs according to FanGraphs WAR, while Rally’s WAR made up about 15.5% of all positive and negative runs.

All in all, they are similar in how fielding is weighted as a whole. The biggest difference between the two is how each individual player’s fielding is evaluated.



Print This Post



David Appelman is the creator of FanGraphs.


Sort by:   newest | oldest | most voted
Andy S
Guest
Andy S
5 years 9 months ago

Just a thought, have you guys ever thought of regressing the fielding numbers, based on their inherent volatility?

mb21
Guest
5 years 9 months ago

I like that idea. I also like the idea of expressing defense in a range (something like -7 to +7). I just think as popular as this site is, there needs to be something that shows that the UZR or TZ is not as reliable in a small sample as offense is. I think much of the disagreements start over people saying so and so was worth 9 runs using UZR and -6 runs using TZ. People look at that and say they can’t be right so all defensive metrics should be ignored. They shouldn’t ignored, but I think the unreliability of the metrics should be acknowledged in their measurements. I also think they have to be regressed. UZR/150 is the same as looking at a guy who had 2 home runs after the first game and saying he’s on pace to hit over 300 home runs. Get rid of UZR/150 and add an metric error column.

Dudley
Guest
Dudley
5 years 9 months ago

Love Andy’s idea–if it takes defensive metrics more observations to stabilize, why should we throw out observations from the prior year(s)? In fact, maybe all data should be considered more from a moving average perspective rather than hermetically sealing off each season?

CircleChange11
Guest
CircleChange11
5 years 9 months ago

Pos actually brings up a good point. That’s a HUGE diff between Crawford’s WAR depending which website you’re looking at.

I brought up a similar point the other day with Wainwright leading in WAR at BR, yet 5th, by a long shot, here at FG.

The problems of such variance of the “same overall value stat” are obvious … and perhaps significant.

LeeTro
Member
Member
LeeTro
5 years 9 months ago

While both sites use similar methods for position players, pitchers are evaluated completely different. B-R also combines pitching and hitting for its overall leaderboard, which is why Wainwright ranks so high. Halladay is still a full win above Wainwright in just pitching value.

CircleChange11
Guest
CircleChange11
5 years 9 months ago

It was explained to me that BR also uses something like “runs allowed” while FG just uses FIP.

Just using Walks, K’s, and HR for pitchers doesn’t seem to be very complete.

Wainwright’s ERA is consistently 0.5+ runs below his FIP. Using FIP will always result in AW50 having less WAR by that method. That’s why AW50 ranks higher at BR.

Being a “good” hitting pitcher, as well as, a good fielding pitcher should factor into your WAR.

I’m not arguing that AW50 is as good as RH34, but the margin is closer than what is represented at FG, IMHO.

LeeTro
Member
Member
LeeTro
5 years 9 months ago

Rally uses total runs allowed for performance, then finds a amount of runs allowed that a replacement level would allow, with ballpark, league, and defensive adjustments. It shows actual value, where FIP is more of an expected value.

FG also has pitchers’ hitting value, but it’s not shown in leaderboards. I like the high-positive positional adjustment used on B-R though. Pitcher’s replacement at-bats are by other pitchers, not position players, so a lower replacement level is needed.

CircleChange11
Guest
CircleChange11
5 years 9 months ago

Is it just me or would “actual value” be more important than “expected value”?

In a similar discussion as the one that’s taking place at TT’s blog …. I would favor “averaging the WAR values”. TT made the comment that it would be better to be “half right”, then possibly “all wrong”. I agree with that. (It was in response to Josh Hamilton’s fielding metric variance)

For example, there’s a HUGE difference between Tim Hudson’s rWAR (5.6) and fWAR (2.6). Which is the most accurate representation? Is his season performance CY worthy? Or just merely “a little above average”?

Note: The WAR’s have changed since I last looked them up, so now the order of the pitchers, using both methods is generally “the same”.

BR
—–
RHalladay 6.6
JJohnson 5.9
AWainwright 5.6
THudson 5.6
UJimenez 5.3
TLincecum 2.4

FG
—–
RHalladay 6.4
JJohnson 5.6
AWainwright 5.0
THudson 2.6
UJimenez 4.8
TLincecum 3.6

Average
——-
RHalladay 6.5
JJohnson 5.75
AWainwright 5.3
THudson 4.1
UJimenez 5.05
TLincecum 3.0

CircleChange11
Guest
CircleChange11
5 years 9 months ago

NL pitchers aren’t the best example, as of current, because Halladay is just so good (great pitcher, most innings). A couple of weeks ago AW50 was atop rWAR, while RH34 had a 1.5 WAR lead in fWAR. Now, RH34 leads both, and it’s a done discussion.

AL Pitchers are much different. Namely, IMHO, FG and FIP over-value the walk (or specifically the lack of walks). With FIP, you’re better off giving up a double than a walk. That shows in the difference between Clif Lee’s rWAR and fWAR.

Lee: fWAR — 6.0; rWAR — 3.8
Liriano: fWAR — 5.5; rWAR — 3.9

The most consistent in the two systems is King Felix …

Hernandez: fWAR — 5.2; rWAR — 4.5

When you average the two systems together …

CLee — 4.9
FHernandez — 4.85
FLiriano — 4.7
JWeaver — 4.25
John Danks — 4.2

I like the average much better, as while I feel RH and CL have been the best pitchers in each league, they are not a full win better than the 2nd best pitcher. Their fWAR is such because of walks alone.

LeeTro
Member
Member
LeeTro
5 years 9 months ago

I agree that actual value is more important. What really causes separation in the 2 values is luck. When Hudson allows a BABIP of .241, he’s going to allow a lot less runs than what is expected in DIPS theory. In the near future, I’ll see if Rally WAR or FIP WAR better predicts the next year’s Rally WAR. That may be the only advantage of FIP WAR over the Rally system.

cpebbles
Guest
cpebbles
5 years 9 months ago

I could’ve sworn it was here that I read that tERA was quite a bit better as a predictive tool than FIP, and only slightly worse than xFIP.

CircleChange11
Guest
CircleChange11
5 years 9 months ago

I actually think going the tERA route would be a pretty decent compromise since it effectively ignores fielding, but also accounts for batted ball tendencies.

Strongly agree.

When it comes to Wainwright and why his ERA is consistently lower than his FIP, how much of it is his groundball tendencies and how much of it is because of a good infield defense. I’m sure it’s a combination of the two.

I’m delighted (delighted I tell you) to hear someone say “a combination of both”, rather than just saying “His ERA is only good because of Brendan Ryan”. My only real complaint about FIP is that it does remove batted ball stuff, and I don;t view pitchers as being completely separate from the team. The pitcher and defense work together, very intimately.

How they pitch hitters, how they shade hitters, it’s very much like a “defensive scheme” in football. If the pitcher doesn’t “pitch to the defense” or “stick to the plan”, he puts his team in a bad spot.

One could also look at Wainwright’s situation, and state that because of Wainwright, Carpenter, and Garcia being so groundball heavy, that is the reason why StL sacrifices so much offense at the SS position … because overall, it’s good for the team. Likewise, Ryan is so much more valuable on a team like this than he would be on a “K or fly ball” staff. Very similar to “John Tudor and Ozzie Smith”. Perfect pitcher for that stadium and defense. It wasn’t just that “John Tudor was lucky to have that defense” as much as it was “John Tudor was a master LHP at getting RHBs to hit the ball on the ground to Ozzie.” Anyway …

If you take someone like Justin Masterson (60% + GBs), and you put the worlds best defense behind him and every year he outperforms his FIP by 1or even 1.5 runs, what would you say about him?

I would say you schemed your defense toward a strength. It would be like punishing a QB for having a good running back to balance out the O. Everything works together. But, I do agree, that if a pitcher is primarily dependent on good defense, that having a good defense would make the P look better than he likely is. But having predominantly good pitchers, and putting a poor defense behind them, would be terrible management by the organization. You’re taking a strength and turning it into a weakness, or at least average.

It’s been pointed out that Peyton Manning has the only O-Line that has been together for something like 8 years, but we don;t hold that against him even though we know he’d be “less great” if he had a porous o-line.

It does seem, at times, that smart team management, and valuing players clash It’s a tough deal to take all these stats and situations and factor them into one value stat. I think, for the most part, WAR works as intended.

CircleChange11
Guest
CircleChange11
5 years 9 months ago

As part of a discussion at TT, Adam Wainwright’s WAR is not different with fWAR or rWAR, at least not to any significant degree.

The only way that it is drastically different is in an extreme situation, such as a really low BABIP, or a really low HR rate, or walk rate.

CircleChange11
Guest
CircleChange11
5 years 9 months ago

Nevermind, fWAR is 5.0, rWAR is 5.6.

I’m losing it.

Dan
Guest
Dan
5 years 9 months ago

Well, typically my thinking has been that the ground-ball tendencies of pitchers is already factored into WAR with FIP because more ground balls = fewer HR’s = lower FIP = higher WAR. LD%, on the other hand, is something I would be more interested into seeing incorporated into FIP, so I suppose this is a moot point and tERA would nonetheless be more effective.

pft
Guest
pft
5 years 9 months ago

Here is the problem. You have 3 models that estimate a players defensive contribution. All 3 models have some strong points and some negatives, and are based on some credible analysis. The true value of a given players defensive contribution is unknown. For some players, the models give similar results. For others, they vary quite a bit.

In general, when you have several models, the proper thing to do is to average them. If you have more than 3 models, you might drop any outliers. This is done in meteorology in hurricane forecasting.

At the very least, WAR should be based on an average between DRS and UZR.

Also, people still fail to understand that single year WAR is not meant to measure ability, but is to determine what a player has done in a given year. If you are looking at ability, you need 3 years fielding data. If you are looking at what a player did in 2010, you look only at the data from 2010. If UZR is high or lower than what a players true ability is for a given year, it may be due to luck, the player playing better or worse than normal. or whatever. The same holds true for offensive stats. Players sometimes perform better or worse than what the multi-year data say is their true ability.

Angel Pagan's agent
Guest
Angel Pagan's agent
5 years 9 months ago

“If UZR is high or lower than what a players true ability is for a given year, it may be due to luck, the player playing better or worse than normal. or whatever.”

If by ‘whatever’ you mean ‘it may be that UZR doesn’t measure ANYthing’ then you’re correct.

Mike
Guest
Mike
5 years 9 months ago

We all know that the larger the sample size the better, so let’s look at career defensive #’s:

Carl Crawford (LF)
UZR 123
DRS 115
TZ 69

You think that’s bad, check this one out.

Ryan Zimmerman (3B)
UZR 55
DRS 79
TZ 7

UZR, DRS, and subjective opinions all agree that these two are at the elite level for their respective positions. TZ knocks Crawford down quite a bit and Zimmerman down to merely average.

Something is seriously amiss with TZ.

Dan
Guest
Dan
5 years 9 months ago

Seeing TZ rate someone like Zimmerman, an obvious defensive wizard with the eye test, really discredits it some… but, sadly, I’m sure UZR has similar cases.

Mike
Guest
Mike
5 years 9 months ago

I guess it’s possible that UZR rates a player’s defense that wildly different from the other defensive measurements and the eye tests of just about everyone.

But if such a case exists, I sure haven’t seen it yet.

Rich
Guest
Rich
5 years 9 months ago

“We all know that the larger the sample size the better, ”

thats not true at all.

A larger sample size is better when you’re measuring something where true talent is not changing. Its possible that UZR fluctuates so much because players actually play better defense some years.

In that case, a larger sample size is just combining unlike data.

Mike
Guest
Mike
5 years 9 months ago

“A larger sample size is better when you’re measuring something where true talent is not changing”.

True talent or peak talent ? Fielders can have an off years I suppose, just like hitters.

If a player alternates from 10 to 20 UZR every other year for 10 years, I have no problem calling saying that his career 150 UZR probably makes him a very good fielder.

Augustus
Guest
Augustus
5 years 9 months ago

I’m sorry, but can’t a simple gander at both formulas explain the great divide?

And I think I’d prefer a system in which UZR wasn’t rated as highly. The great potential fluctuations in defense on a year-to-year basis can make WAR a tad jumpy, when we’re really trying to find out who exactly is going to be a solid 2.5 WAR player on a year to year basis.

It doesn’t matter so much for, I don’t know, Albert Pujols, since determining whether he’s a 6.5 or a 7.0 WAR player doesn’t really make that much of a difference to his bottom line, but it kind of matters when seeing if someone like Austin Kearns is a 1.0 or a 2.0 WAR player.

How about some kind of weighting of this season’s UZR vs. average total UZR/150 for the past three seasons?

I feel like that would help give a better read on what to expect from players. I’m just glancing through Miguel Cabrera’s page, and I think some kind of overall adjustment with defense’s calculation in WAR would give me a much better idea what type of player he is, value-wise.

Although I guess on a personal basis I really just look at hitting and see someone’s defense as a gold star or black mark that doesn’t really take that much away from my opinion of someone most of the time. But that’s not the point right now.

MikeD
Guest
MikeD
5 years 9 months ago

Is it no wonder that many people, including specifically those who are inclined toward sabermetrics, zone out on these conversations because there is no consistent or accepted measurement? WAR, UZR, TZ, etc., etc. can’t be taken seriously when there is such a large gap between each rating system. I appreciate the enthusiasm, but when someone on a bulletin board starts screaming player A is better than player B because of WAR, then they better be able to explain why they like that rating system better than another, and why I should accept that rating system compared to another. Right now there’s more nonsense in the stats community than there is in the regular baseball community. Let me know when you have it all figured out. (Note: I’ve been following advanced statistics since the late ’70s when I first started reading Bill James.)

JonnyBS
Guest
JonnyBS
5 years 9 months ago

Wait for FieldFX.

hank
Guest
hank
5 years 9 months ago

Great article…. given the variability of the defensive statistics (especially for one season or less), it seems the variability in WAR is overlooked as these #’s are used to parse trades, contracts, free agent pickups.

Some sort of regression is clearly needed in the defensive component of WAR. A complex system may look at chances per position and use the past “X” seasons to get a regressed defensive component of the WAR calculation(for example a shortstop will not need the same length of time to regress as say a right fielder.

For players with not enough service time, perhaps a use of weighted regression between actual stats and a league average player (similar to regressions done for platoon splits with limited plate appearances)

Lee Panas
Guest
5 years 9 months ago

I try not to start with WAR when comparing players A and B . I think that’s not a good way to win a debate. I ‘d rather start with offensive run contribution because it’s fairly easy to compare the players on that basis. It’s a lot easier to come to an agreement on that. Let’s say that player A is 5 runs better than player B.

After that, I like to look at a few different defensive statistics and if there is a lot of variation I’ll also look at past years One thing I like to try to to do is average stats the way David has done on a couple of recent poss.. Based on that, I can make a judgement as to whether player B makes up the 5 runs with his defense. There is no definitive answer though. There is still room for debate, which is not a bad thing.

Don Headly
Guest
Don Headly
5 years 9 months ago

That’s not a bad way to look at the situation and would be a nice method for the time being. Having an offensive WAR that only compares offensive ability (a much more measurable stat at this point) and separating most analysis of a player into offensive and defensive parts makes the most sense IMO.

It is nice to have an all encompassing stat however it’s obvious the variability of defensive metrics makes it impossible to measure the true value of a player to his team and the true talent of a player during a given season which is what most of us are looking for during our analysis.

Saying Carl Crawford is a 5.6 WAR player is nice but it doesn’t mean much when their is so much variability in the number. With pitchers it’s even more difficult as they are dependent on their defense which affects the “value” of their pitching. Separating WAR and other metrics into offensive and defensive components should help with the valuation and create an atmosphere of discussion that doesn’t hinder itself along one or two all encompassing statistics.

I for one feel this site is amazing and would love more articles that focussed on a players offensive and defensive worth separately rather than Player A is 5.6 WAR and Player B is 5.0 WAR therefor Player A is better and the GM made the right move by signing him. It’s just not that simple.

Marver
Guest
Marver
5 years 9 months ago

“When it comes to Wainwright and why his ERA is consistently lower than his FIP, how much of it is his groundball tendencies and how much of it is because of a good infield defense. I’m sure it’s a combination of the two.”

Placing a GB% pitcher in front of a better defense shouldn’t make the pitcher more valuable and it shouldn’t make the defenders more valuable. Ultimately, the credit of placing a good defense behind a groundball pitcher or placing a groundball pitcher in front of a good defense should fall to a general manager. The same should apply to relief pitchers; the proper leveraging of a relief pitcher should be a boon to the manager’s WAR, not the pitcher’s.

lifewontwait
Member
lifewontwait
5 years 9 months ago

“The same should apply to relief pitchers; the proper leveraging of a relief pitcher should be a boon to the manager’s WAR, not the pitcher’s.”

I really like the idea of manager’s WAR

lifewontwait
Member
lifewontwait
5 years 9 months ago

Given that “UZR tries to record a player’s likely true talent and estimate his future performance based on the nuances of the batted ball and the player’s response to those nuances. It is not trying to capture exactly what happens on the field according to some arbitrary categories” (quoted from http://www.fangraphs.com/blogs/index.php/the-fangraphs-uzr-primer/#15) Why is UZR included in WAR at all?

I see why you would want to count the player’s defensive contributions, but UZR by definition doesn’t do that. How can we say that so and so has been worth 4.5 WAR, with 2.5 wins coming from offense and 2 wins coming from defense when “A player’s UZR does not necessarily tell you how he actually played”?

The Q
Guest
The Q
5 years 9 months ago

The answer is too much.

See a guy like Jeter, who gets too much credit for standing out at SS when he’s been brutal there. Those +18s should be triggered by at least being average.

he’s something like the 334th best hitter ever (per OPS+) and is an abominable defensive SS yet he’s like 52nd in WAR. That just proves how flawed the extra bonus is. His 120 OPS+ isn’t exactly good and he’s a terrible defender…does that make him one heck of a baserunner?

Braves Paul
Guest
2 years 1 month ago

Andrelton Simmons rWAR per plate appearance ranks him 7th all-time (among Ruth, Bonds, Mays, Williams, etc.), but his fWAR per plate appearance ranks 74th all-time. Therefore, it seems defense is rated more highly by rWAR… Perhaps Crawford’s basestealing skills were weighted more highly by fWAR?

wpDiscuz