Love Andy’s idea–if it takes defensive metrics more observations to stabilize, why should we throw out observations from the prior year(s)? In fact, maybe all data should be considered more from a moving average perspective rather than hermetically sealing off each season?
Pos actually brings up a good point. That’s a HUGE diff between Crawford’s WAR depending which website you’re looking at.
I brought up a similar point the other day with Wainwright leading in WAR at BR, yet 5th, by a long shot, here at FG.
The problems of such variance of the “same overall value stat” are obvious … and perhaps significant.
Comment by CircleChange11 — August 22, 2010 @ 7:32 pm
Here is the problem. You have 3 models that estimate a players defensive contribution. All 3 models have some strong points and some negatives, and are based on some credible analysis. The true value of a given players defensive contribution is unknown. For some players, the models give similar results. For others, they vary quite a bit.
In general, when you have several models, the proper thing to do is to average them. If you have more than 3 models, you might drop any outliers. This is done in meteorology in hurricane forecasting.
At the very least, WAR should be based on an average between DRS and UZR.
Also, people still fail to understand that single year WAR is not meant to measure ability, but is to determine what a player has done in a given year. If you are looking at ability, you need 3 years fielding data. If you are looking at what a player did in 2010, you look only at the data from 2010. If UZR is high or lower than what a players true ability is for a given year, it may be due to luck, the player playing better or worse than normal. or whatever. The same holds true for offensive stats. Players sometimes perform better or worse than what the multi-year data say is their true ability.
While both sites use similar methods for position players, pitchers are evaluated completely different. B-R also combines pitching and hitting for its overall leaderboard, which is why Wainwright ranks so high. Halladay is still a full win above Wainwright in just pitching value.
It was explained to me that BR also uses something like “runs allowed” while FG just uses FIP.
Just using Walks, K’s, and HR for pitchers doesn’t seem to be very complete.
Wainwright’s ERA is consistently 0.5+ runs below his FIP. Using FIP will always result in AW50 having less WAR by that method. That’s why AW50 ranks higher at BR.
Being a “good” hitting pitcher, as well as, a good fielding pitcher should factor into your WAR.
I’m not arguing that AW50 is as good as RH34, but the margin is closer than what is represented at FG, IMHO.
Comment by CircleChange11 — August 22, 2010 @ 8:57 pm
“If UZR is high or lower than what a players true ability is for a given year, it may be due to luck, the player playing better or worse than normal. or whatever.”
If by ‘whatever’ you mean ‘it may be that UZR doesn’t measure ANYthing’ then you’re correct.
Comment by Angel Pagan's agent — August 22, 2010 @ 9:08 pm
Rally uses total runs allowed for performance, then finds a amount of runs allowed that a replacement level would allow, with ballpark, league, and defensive adjustments. It shows actual value, where FIP is more of an expected value.
FG also has pitchers’ hitting value, but it’s not shown in leaderboards. I like the high-positive positional adjustment used on B-R though. Pitcher’s replacement at-bats are by other pitchers, not position players, so a lower replacement level is needed.
Is it just me or would “actual value” be more important than “expected value”?
In a similar discussion as the one that’s taking place at TT’s blog …. I would favor “averaging the WAR values”. TT made the comment that it would be better to be “half right”, then possibly “all wrong”. I agree with that. (It was in response to Josh Hamilton’s fielding metric variance)
For example, there’s a HUGE difference between Tim Hudson’s rWAR (5.6) and fWAR (2.6). Which is the most accurate representation? Is his season performance CY worthy? Or just merely “a little above average”?
Note: The WAR’s have changed since I last looked them up, so now the order of the pitchers, using both methods is generally “the same”.
Comment by CircleChange11 — August 22, 2010 @ 9:51 pm
I like that idea. I also like the idea of expressing defense in a range (something like -7 to +7). I just think as popular as this site is, there needs to be something that shows that the UZR or TZ is not as reliable in a small sample as offense is. I think much of the disagreements start over people saying so and so was worth 9 runs using UZR and -6 runs using TZ. People look at that and say they can’t be right so all defensive metrics should be ignored. They shouldn’t ignored, but I think the unreliability of the metrics should be acknowledged in their measurements. I also think they have to be regressed. UZR/150 is the same as looking at a guy who had 2 home runs after the first game and saying he’s on pace to hit over 300 home runs. Get rid of UZR/150 and add an metric error column.
NL pitchers aren’t the best example, as of current, because Halladay is just so good (great pitcher, most innings). A couple of weeks ago AW50 was atop rWAR, while RH34 had a 1.5 WAR lead in fWAR. Now, RH34 leads both, and it’s a done discussion.
AL Pitchers are much different. Namely, IMHO, FG and FIP over-value the walk (or specifically the lack of walks). With FIP, you’re better off giving up a double than a walk. That shows in the difference between Clif Lee’s rWAR and fWAR.
I like the average much better, as while I feel RH and CL have been the best pitchers in each league, they are not a full win better than the 2nd best pitcher. Their fWAR is such because of walks alone.
Comment by CircleChange11 — August 22, 2010 @ 10:09 pm
Right, there are some significant differences in pitching. For FanGraphs it’s FIP and for B-R, it’s ERA adjusted by team TZ. Rally could fill you in on the details. I’m not exactly sure how it’s done, but there’s a post somewhere on either baseballprojection.com, or his old blog.
The reasoning behind us using FIP is that it takes defense out of the equation entirely because we’re already accounting for defense in UZR and do not want to double count anything.
I actually think going the tERA route would be a pretty decent compromise since it effectively ignores fielding, but also accounts for batted ball tendencies. The version of tERA we have on the site is I believe less predictive than FIP, but it does match up with ERA better.
Defense is something which I don’t think is considered enough when it comes to pitching stats. When it comes to Wainwright and why his ERA is consistently lower than his FIP, how much of it is his groundball tendencies and how much of it is because of a good infield defense. I’m sure it’s a combination of the two.
I’m not sure looking at his ERA and comparing it to his FIP and saying it’s always lower necessarily says anything about him as a pitcher. If you take someone like Justin Masterson (60% + GBs), and you put the worlds best defense behind him and every year he outperforms his FIP by 1or even 1.5 runs, what would you say about him?
In terms of adding in pitcher’s hitting WAR into pitching WAR, I’m personally not a fan of that. Maybe I can add an option to combine them, but I imagine they’ll remain separate by default.
I agree that actual value is more important. What really causes separation in the 2 values is luck. When Hudson allows a BABIP of .241, he’s going to allow a lot less runs than what is expected in DIPS theory. In the near future, I’ll see if Rally WAR or FIP WAR better predicts the next year’s Rally WAR. That may be the only advantage of FIP WAR over the Rally system.
I’m sorry, but can’t a simple gander at both formulas explain the great divide?
And I think I’d prefer a system in which UZR wasn’t rated as highly. The great potential fluctuations in defense on a year-to-year basis can make WAR a tad jumpy, when we’re really trying to find out who exactly is going to be a solid 2.5 WAR player on a year to year basis.
It doesn’t matter so much for, I don’t know, Albert Pujols, since determining whether he’s a 6.5 or a 7.0 WAR player doesn’t really make that much of a difference to his bottom line, but it kind of matters when seeing if someone like Austin Kearns is a 1.0 or a 2.0 WAR player.
How about some kind of weighting of this season’s UZR vs. average total UZR/150 for the past three seasons?
I feel like that would help give a better read on what to expect from players. I’m just glancing through Miguel Cabrera’s page, and I think some kind of overall adjustment with defense’s calculation in WAR would give me a much better idea what type of player he is, value-wise.
Although I guess on a personal basis I really just look at hitting and see someone’s defense as a gold star or black mark that doesn’t really take that much away from my opinion of someone most of the time. But that’s not the point right now.
I actually think going the tERA route would be a pretty decent compromise since it effectively ignores fielding, but also accounts for batted ball tendencies.
When it comes to Wainwright and why his ERA is consistently lower than his FIP, how much of it is his groundball tendencies and how much of it is because of a good infield defense. I’m sure it’s a combination of the two.
I’m delighted (delighted I tell you) to hear someone say “a combination of both”, rather than just saying “His ERA is only good because of Brendan Ryan”. My only real complaint about FIP is that it does remove batted ball stuff, and I don;t view pitchers as being completely separate from the team. The pitcher and defense work together, very intimately.
How they pitch hitters, how they shade hitters, it’s very much like a “defensive scheme” in football. If the pitcher doesn’t “pitch to the defense” or “stick to the plan”, he puts his team in a bad spot.
One could also look at Wainwright’s situation, and state that because of Wainwright, Carpenter, and Garcia being so groundball heavy, that is the reason why StL sacrifices so much offense at the SS position … because overall, it’s good for the team. Likewise, Ryan is so much more valuable on a team like this than he would be on a “K or fly ball” staff. Very similar to “John Tudor and Ozzie Smith”. Perfect pitcher for that stadium and defense. It wasn’t just that “John Tudor was lucky to have that defense” as much as it was “John Tudor was a master LHP at getting RHBs to hit the ball on the ground to Ozzie.” Anyway …
If you take someone like Justin Masterson (60% + GBs), and you put the worlds best defense behind him and every year he outperforms his FIP by 1or even 1.5 runs, what would you say about him?
I would say you schemed your defense toward a strength. It would be like punishing a QB for having a good running back to balance out the O. Everything works together. But, I do agree, that if a pitcher is primarily dependent on good defense, that having a good defense would make the P look better than he likely is. But having predominantly good pitchers, and putting a poor defense behind them, would be terrible management by the organization. You’re taking a strength and turning it into a weakness, or at least average.
It’s been pointed out that Peyton Manning has the only O-Line that has been together for something like 8 years, but we don;t hold that against him even though we know he’d be “less great” if he had a porous o-line.
It does seem, at times, that smart team management, and valuing players clash It’s a tough deal to take all these stats and situations and factor them into one value stat. I think, for the most part, WAR works as intended.
Comment by CircleChange11 — August 22, 2010 @ 11:56 pm
As part of a discussion at TT, Adam Wainwright’s WAR is not different with fWAR or rWAR, at least not to any significant degree.
The only way that it is drastically different is in an extreme situation, such as a really low BABIP, or a really low HR rate, or walk rate.
Comment by CircleChange11 — August 23, 2010 @ 12:25 am
I could’ve sworn it was here that I read that tERA was quite a bit better as a predictive tool than FIP, and only slightly worse than xFIP.
Is it no wonder that many people, including specifically those who are inclined toward sabermetrics, zone out on these conversations because there is no consistent or accepted measurement? WAR, UZR, TZ, etc., etc. can’t be taken seriously when there is such a large gap between each rating system. I appreciate the enthusiasm, but when someone on a bulletin board starts screaming player A is better than player B because of WAR, then they better be able to explain why they like that rating system better than another, and why I should accept that rating system compared to another. Right now there’s more nonsense in the stats community than there is in the regular baseball community. Let me know when you have it all figured out. (Note: I’ve been following advanced statistics since the late ’70s when I first started reading Bill James.)
Comment by CircleChange11 — August 23, 2010 @ 1:40 am
I try not to start with WAR when comparing players A and B . I think that’s not a good way to win a debate. I ‘d rather start with offensive run contribution because it’s fairly easy to compare the players on that basis. It’s a lot easier to come to an agreement on that. Let’s say that player A is 5 runs better than player B.
After that, I like to look at a few different defensive statistics and if there is a lot of variation I’ll also look at past years One thing I like to try to to do is average stats the way David has done on a couple of recent poss.. Based on that, I can make a judgement as to whether player B makes up the 5 runs with his defense. There is no definitive answer though. There is still room for debate, which is not a bad thing.
Well, typically my thinking has been that the ground-ball tendencies of pitchers is already factored into WAR with FIP because more ground balls = fewer HR’s = lower FIP = higher WAR. LD%, on the other hand, is something I would be more interested into seeing incorporated into FIP, so I suppose this is a moot point and tERA would nonetheless be more effective.
Great article…. given the variability of the defensive statistics (especially for one season or less), it seems the variability in WAR is overlooked as these #’s are used to parse trades, contracts, free agent pickups.
Some sort of regression is clearly needed in the defensive component of WAR. A complex system may look at chances per position and use the past “X” seasons to get a regressed defensive component of the WAR calculation(for example a shortstop will not need the same length of time to regress as say a right fielder.
For players with not enough service time, perhaps a use of weighted regression between actual stats and a league average player (similar to regressions done for platoon splits with limited plate appearances)
That’s not a bad way to look at the situation and would be a nice method for the time being. Having an offensive WAR that only compares offensive ability (a much more measurable stat at this point) and separating most analysis of a player into offensive and defensive parts makes the most sense IMO.
It is nice to have an all encompassing stat however it’s obvious the variability of defensive metrics makes it impossible to measure the true value of a player to his team and the true talent of a player during a given season which is what most of us are looking for during our analysis.
Saying Carl Crawford is a 5.6 WAR player is nice but it doesn’t mean much when their is so much variability in the number. With pitchers it’s even more difficult as they are dependent on their defense which affects the “value” of their pitching. Separating WAR and other metrics into offensive and defensive components should help with the valuation and create an atmosphere of discussion that doesn’t hinder itself along one or two all encompassing statistics.
I for one feel this site is amazing and would love more articles that focussed on a players offensive and defensive worth separately rather than Player A is 5.6 WAR and Player B is 5.0 WAR therefor Player A is better and the GM made the right move by signing him. It’s just not that simple.
Comment by Don Headly — August 23, 2010 @ 10:34 am
“We all know that the larger the sample size the better, ”
thats not true at all.
A larger sample size is better when you’re measuring something where true talent is not changing. Its possible that UZR fluctuates so much because players actually play better defense some years.
In that case, a larger sample size is just combining unlike data.
“When it comes to Wainwright and why his ERA is consistently lower than his FIP, how much of it is his groundball tendencies and how much of it is because of a good infield defense. I’m sure it’s a combination of the two.”
Placing a GB% pitcher in front of a better defense shouldn’t make the pitcher more valuable and it shouldn’t make the defenders more valuable. Ultimately, the credit of placing a good defense behind a groundball pitcher or placing a groundball pitcher in front of a good defense should fall to a general manager. The same should apply to relief pitchers; the proper leveraging of a relief pitcher should be a boon to the manager’s WAR, not the pitcher’s.
Given that “UZR tries to record a player’s likely true talent and estimate his future performance based on the nuances of the batted ball and the player’s response to those nuances. It is not trying to capture exactly what happens on the field according to some arbitrary categories” (quoted from http://www.fangraphs.com/blogs/index.php/the-fangraphs-uzr-primer/#15) Why is UZR included in WAR at all?
I see why you would want to count the player’s defensive contributions, but UZR by definition doesn’t do that. How can we say that so and so has been worth 4.5 WAR, with 2.5 wins coming from offense and 2 wins coming from defense when “A player’s UZR does not necessarily tell you how he actually played”?
Comment by lifewontwait — August 23, 2010 @ 4:41 pm
“The same should apply to relief pitchers; the proper leveraging of a relief pitcher should be a boon to the manager’s WAR, not the pitcher’s.”
I really like the idea of manager’s WAR
Comment by lifewontwait — August 23, 2010 @ 4:43 pm
The answer is too much.
See a guy like Jeter, who gets too much credit for standing out at SS when he’s been brutal there. Those +18s should be triggered by at least being average.
he’s something like the 334th best hitter ever (per OPS+) and is an abominable defensive SS yet he’s like 52nd in WAR. That just proves how flawed the extra bonus is. His 120 OPS+ isn’t exactly good and he’s a terrible defender…does that make him one heck of a baserunner?
Andrelton Simmons rWAR per plate appearance ranks him 7th all-time (among Ruth, Bonds, Mays, Williams, etc.), but his fWAR per plate appearance ranks 74th all-time. Therefore, it seems defense is rated more highly by rWAR… Perhaps Crawford’s basestealing skills were weighted more highly by fWAR?