Descriptive and Predictive Metrics

This morning, Buster Olney took to twitter to offer some more thoughts on WAR, which I think we can surmise is not his favorite statistic. At the risk of making a habit out of responding to Buster’s twitter messages with FanGraphs posts, I did have some thoughts about a few of the things he mentioned, and those thoughts are longer than 140 characters, so I’m putting them here. The subject is worth discussing anyway, and hopefully we can articulate some points about WAR and various other metrics in a way that helps bridge gaps that may currently exist. At least, that’s my goal.

Let’s start off with Buster’s comments:


#1: Teams will invest in players based on OBP, ERA, defense-independent P numbers, home/road splits, left/right splits, etc. They don’t use WAR.

#2: GM said: “I don’t need stats that tell me what happened; I already know that. I need stats that tell me what’s going to happen.”

#3: You’re missing my point, too; if the smartest in the game don’t use WAR, don’t care about it, shouldn’t that tell us something?

Whether or not teams are using WAR in their evaluations is an argument with no resolution – Buster will cite the people he talks to who don’t use it, and I’ll cite the people that I talk to that do, and we’ll just end up in a stalemate. That’s not a point worth debating, I don’t think. However, the overall point that Buster’s making here is worth talking about, I think. The assertion (as I read it, at least) is that WAR – a descriptive metric of past events – is not useful to teams as a predictive metric of future events, so therefore it should have limited value to the public as well. However, I think there’s a bit of an unfair criticism being leveled against WAR here, as it’s being assailed for not doing something that it was not designed to do.

The GM is completely correct in stating that, for his job and the decisions that are required, what matters are metrics that can predict what will happen and not just aggregate what did happen. When trying to decide who to trade for or sign as a free agent, he shouldn’t just look at a list of prior season WAR and get the guy at the top. WAR was not designed as a projection system, but instead, as a retrospective look at what a player produced in a given timeframe.

Any team trying to make decisions about roster construction needs to be factoring in myriad variables about multiple players. Past performance certainly has to be a significant part of the calculation, but so does age, body type, skill set, health, how a player would fit in a particular ballpark… there are numerous things that don’t go into WAR that will significantly effect the expectations of future performance. That’s why people like Dan Szymborski invented systems like ZIPS, so that we could take past performance and adjust for the other factors, coming up with a projected line that tells us more about what may happen than simply looking at past results.

Whether they’re using ZIPS specifically or (far more likely) their own in-house version of a projection system built on proprietary data, I guarantee you that the Smart GM that Buster talked to is looking at that kind of information when deciding which players to acquire, because that’s the tool he needs for help in figuring out how to build out his roster. Projection systems are designed to answer that specific kind of question. Prior season WAR is not.

Projection systems can offer you future expected WAR, and that can be something that is valuable to a GM. But, there are times when we’re not asking about what a player’s future projected WAR is going to be, and so looking at past season data is more applicable.

The post-season awards are the obvious area where you’d want the best descriptive metric possible, and you could really care less about a projection of future performance. When determining who should win the MVP, it doesn’t really matter whether Jacoby Ellsbury or Jose Bautista is more likely to produce similar results in 2012. A projection system would be an incorrect tool to use to answer the question at hand.

Just like a GM shouldn’t focus on prior season WAR when making personnel decisions, baseball writers shouldn’t focus on projection systems when filling out their awards ballots. They are asking two very different types of questions, and there are different tools that help answer different types of questions. Some metrics are predictive, some are descriptive, but let’s not lampoon one for not being the other.




Print This Post



Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.


82 Responses to “Descriptive and Predictive Metrics”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Mike G. says:

    Excellent, well thought out, and reasoned piece Dave. Thanks for posting.

    Vote -1 Vote +1

  2. suicide squeeze says:

    Responding to Buster’s tweets in 140 characters or less:

    1. What do you think makes up WAR?
    2. Most projections systems use past results extensively
    3. They don’t use WAR because they have even better metrics they didn’t come up with in their mother’s basement

    Vote -1 Vote +1

  3. theonemephisto says:

    Do you know what a couple great predictive statistics are? ERA, Wins, Saves, RBIs, AVG.

    Vote -1 Vote +1

  4. Hurtlocker says:

    No baseball statistics can predict what a player will do or accomplish in the future. Every player is an injury away from ending a career, no matter how promising. Every decision is based on the “hope”, for lack of a better word, that the player will continue to perform at a certain level. Dometimes you get lucky
    (Bautista) and sometimes you don’t. (Zito and many, many others)

    Vote -1 Vote +1

    • Telo says:

      Well… it depends on how you define “predict”. Can we know exactly, to the atbat, what a player will do in the future? Of course not. Not unless you have a map of every atom in the universe, and even then, there’s the human element. (This was an interesting discussion at Tango’s blog a month or two ago).

      But if you use this definition of predict:
      “to foretell on the basis of observation, experience, or scientific reason”

      Then you certainly can. And many, many people do so very well. Every year there are improvements to these models. We are getting close to the ceiling of predictive power using the numbers we currently have access to, but technology is always progressing. Things like Hit and Field FX will revolutionize sabermetrics and the models we use to predict performance… assuming we have access to the data.

      Vote -1 Vote +1

    • Blue says:

      If you mean predict in some absolute sense, you are correct. Prediction systems can absolutely make probabilistic predictions and place error bands around them.

      Vote -1 Vote +1

  5. Brendan says:

    Isn’t this the same Buster Onley who was crying about the fact that people were using FIP, which he himself dubbed as a predictive stat, to discredit Trevor Cahill’s Cy Young campaign in 2010?

    Vote -1 Vote +1

  6. Telo says:

    Good rebuttal. Truthfully, Buster was trolling, knowingly or not. To criticize something without completely understanding it is… well, pretty useless.

    Some people have this strange mental block where they are incapable of truly understanding the point of sabermetrics… someone said it best, better than I can, that the number one principle of sabermetrics is that every number and stat has a role, and what’s more important than the number is interpreting what that number actually means. It’s so easy to misuse or misunderstand what exactly a number is telling you, and Buster did exactly that.

    Vote -1 Vote +1

    • Telo says:

      As poor as Seidman’s article was this morning, it’s closely related to that idea. Except more than context of the number among other similar numbers (Bond’s slugging compared to Williams’ slugging), it’s knowing and understanding what the number represents, and how it fits in the context of the game of baseball.

      Vote -1 Vote +1

    • “Truthfully, Buster was trolling, knowingly or not. To criticize something without completely understanding it is… well, pretty useless.”

      The only way for this quote not to be hypocritical, is if Telo is Buster Olney (or God, I guess).

      Vote -1 Vote +1

    • Tim_the_Beaver says:

      perhaps you are remembering one that I appreciate:

      IF you intend to look at numbers, THEN this is one of the best ways you would look at them.
      totally up to you if you are going to use numbers.

      http://www.insidethebook.com/ee/index.php/site/comments/how_to_sell_a_stat/

      Vote -1 Vote +1

  7. Eminor3rd says:

    Why does BABIP factor into WAR at all? Is BABIP useful for anything other than predictive statistics?

    Vote -1 Vote +1

    • Telo says:

      WAR tells you what happened; it’s a backwards looking stat. BABIP is what happened. So it makes sense not to ignore it… for hitter, at least.

      [Insert long discussion about FIP and fWAR for pitchers here]

      BABIP is extremely complex – the dark matter of sabermetrics, if you will. We have such a hard time separating the noise from the data. As our data and methods get better and better (HitFx ahem) we will begin to really understand players’ fluctuations in BABIP.

      Vote -1 Vote +1

      • Eminor3rd says:

        But, in terms of what actually happened, how does it affect on field value? The guy got a hit, and it helped the team. Whether or not it was a lucky hit (IMO) shouldn’t change the fact that it was a hit. Player X provided Y value over Z time, no matter what his BABIP was. I don’t know why his BABIP changes our measure of the value he provided.

        Vote -1 Vote +1

  8. Cloud Computer says:

    I like Buster and Fangraphs! Can’t we all live in peace?

    Vote -1 Vote +1

    • Telo says:

      Buster is great for what he is. But just because he’s one of the most successful baseball beat writers ever, doesn’t mean he thoroughly understands all facets of sabermetrics. We are all equal in the eyes of math and logic, and Buster has shown indications of not totally grasping some major pillars of the sabermetrics curriculum. I think he would probably admit as such.

      Vote -1 Vote +1

    • theonemephisto says:

      Buster is a great reporter. He is not so good at analysis of value and understanding statistics.

      +8 Vote -1 Vote +1

    • Scott Clarkson says:

      I really wish Buster would just stick to scoops, quotes, and classic “beat writer” stuff.

      Whatever “analysis” he brings to the table is usually parroted opinions.

      He’s only showing how threatened he is by stats that he apparently just can’t comprehend by impugning them out of context the way he does.

      ESPN gave him so much false credibility by giving him Peter Gammons’ space on their page and Gammo he is most certainly not.

      Vote -1 Vote +1

      • Telo says:

        “He’s only showing how threatened he is by stats that he apparently just can’t comprehend by impugning them out of context the way he does. ”

        Yep.

        Vote -1 Vote +1

    • baty says:

      Buster has always been a great talking head. He’s well connected, can access rumor speculation, and knows how to break news before it becomes public. Any of his non-inner circle scooping insight should be taken very lightly.

      Vote -1 Vote +1

      • baty says:

        The only point you can take out of his comments, is that he has information regarding conflict between Organizational opinion of talent/production evaluation. He should stick to relaying what those methods are. His opinion is essentially meaningless.

        Vote -1 Vote +1

    • Notrotographs says:

      I think Buster is still sore over his productive out percentage getting laughed off the face of the earth.

      Vote -1 Vote +1

  9. Ryan says:

    Affect* ;)

    Vote -1 Vote +1

  10. jim says:

    good post, dave, and a point that is often misunderstood about advanced stats

    Vote -1 Vote +1

  11. Jason says:

    If WAR isn’t supposed to be a predictive stat and only pertains to what actually happened in the past, then why do you guys use FIP and such in your pitcher’s WAR? Or if you want to keep the pitcher’s WAR fielding independent and negate the luck factor, how come you don’t do the same for hitters by neutralizing their BABIP’s to some extent? Seems a little strange to me.

    Vote -1 Vote +1

    • Telo says:

      The very very short answer is… a pitcher has less control over his BABIP than a hitter does.

      Long answer:

      It’s helpful to think of the situations as separate, as BABIP means something different to pitchers and hitters.

      Things that affect BABIP:
      - how hard the ball is hit
      - where on the field the ball is hit
      - whether it’s hit on the ground, air, or LD
      - speed of batter
      - fielders
      - park

      Now, let’s look at each of these and see how much control the batter and pitcher have over each,and whether or not they should be penalized for the factor:

      BATTER
      - how hard the ball is hit – lots of control
      - where on the field the ball is hit – some control
      - whether it’s hit on the ground, air, or LD – a good amount of control
      - speed of batter – complete control
      - fielders – zero
      - park – zero

      PITCHER
      - how hard the ball is hit – little/some control
      - where on the field the ball is hit – little control
      - whether it’s hit on the ground, air, or LD – some control
      - speed of batter – zero
      - fielders – zero
      - park – zero

      BABIP is out of the pitcher’s control in a lot of the factors, and he really shouldn’t be penalized by a bad defense, bad park, above average speed batters (if for some reason he faced way more fast hitters over a whole season – not likely, ignore this). And what he does have control of, he doesn’t have complete control.

      Vote -1 Vote +1

      • RC says:

        “and he really shouldn’t be penalized by a bad defense, bad park, above average speed batters ”

        Right. The problem here with FIP is that by ignoring these things, they absolutely do penalize the pitcher for being in these parks, etc.

        Park factors affect Ks, BBs, HRs, etc.

        Also, because FIP uses IP as a denominator, and 70% of IP is balls in play, it doesn’t ignore defense at all.

        Vote -1 Vote +1

      • RC says:

        “if for some reason he faced way more fast hitters over a whole season – not likely, ignore this”

        Another thing. We make assumptions and statements like this all the time… IE “It’ll even out over the course of the season,” and yet, in the vast majority of cases, it doesn’t.

        I remember last year looking at this, and the OPS of players that pitchers had faced varied by more than .100. IE, some pitchers average opponent was a .680 guy, while others it was a .780 guy.

        Vote -1 Vote +1

    • test says:

      I assumed this article would be defending the use of FIP in fangraphs WAR rather than runs allowed. FIP is useful because it is a better predictor of future runs allowed than past runs allowed does. But FIP does not do a better job in actually describing past runs allowed, obviously.

      In this sense, fWAR for pitchers is trying to be a predictive stat, along the lines of “if every one pitched the same on the controllable skills and we ran the season 10000 times, who gets the best results”, rather than “who got the best results, lucky or not”.

      Vote -1 Vote +1

      • Telo says:

        That’s an interesting way to frame question. The other way to look at it is – we are taking 30% of what the pitcher did in the past, and REALLY nailing it. He was X good. We can do that, or we can say, well this is 100% of what the pitcher did, but this data is noisy as hell, close to useless. What would you rather base your WAR off of?

        Just because FIP happens to be a better predictor of future ERA/RA than ERA/RA is irrelevant, and really just shows you that ERA has serious deficiencies as a stat.

        Vote -1 Vote +1

      • theonemephisto says:

        As Telo says, the problem with ERA or RA/9 is that there’s so much noise and so much outside of the pitchers control.

        FIP isn’t really predictive. It’s more of a “what should have happened given league-average defense”, while xFIP is the “what is going to happen” stat. ERA and RA/9 are the pure “what happened” stats, but they also include what the fielder’s did.

        Vote -1 Vote +1

      • RC says:

        “FIP isn’t really predictive. It’s more of a “what should have happened given league-average defense”, ”

        No, FIP absolutely is not that. Because of the IP denominator, FIP goes past neutralizing BABIP, and actively punishes players who have skill at depressing BABIP.

        FIP is trying to determine skill level by using what they consider “under a pitcher’s control”.

        Vote -1 Vote +1

      • Bronnt says:

        The Fangraphs authors themselves acknowledged that FIP is problematic for us in WAR. You’re still basically ignoring 70% of what actually happened, part of which is still in the pitcher’s control. ERA is problematic, RA/9 is problematic, and FIP is problematic. So we really trust WAR with any exactitude for pitchers.

        It’s for this reason that I dislike seeing Roy Halladay’s case for MVP being based on his 0.5 lead in WAR. If you take pitcher WAR with a grain of salt, and then provide a little margin for error in the UZR portion of position player WAR, then Roy Halladay isn’t any stronger a candidate than Matt Kemp, Joey Votto, or Justin Upton.

        It’s not that I’m categorically against pitchers winning the MVP, but I think they need to be sufficiently dominant that you can feel quite confident in their case. Vintage Pedro, maybe Clemens in ’97, Greg Maddux in ’95, and beyond that, essentially no one else.

        Vote -1 Vote +1

      • Kevin S. says:

        RC, I think you’re overstating the impact of FIP using IP as a denominator. I don’t feel like re-doing the math, so I’m just going to C&P an example I used on HBT last week:

        Justin Verlander this year has 212 K, 48 UBB+HBP, and 17 HR in 803 TBF. Thanks to randomness (and in spite of the Tigers’ decidedly below-average defense), he has a .232 BABIP, meaning he’s gotten outs on 404 of the 526 balls in play he’s allowed (he’s also gotten 13 outs that weren’t on strikeouts or BIP – we’ll add those back in at the end). Let’s say he had a league-average BABIP which this year would be .290. 29% of 526 BIP is 153 H, leaving 373 outs. Verlander now has (212 + 373 + 13)/3 = 199.1 IP. By comparing Verlander’s component FIP to his posted FIP, I get the FIP constant to be 3.00 this year. His adjusted FIP is now ((13*17)+(48*3)-(212*2)/199.1) + 3.00 = 2.70, two hundredths of a point off his actual 2.72 FIP. Much of that insignificant difference can be attributed to randomness, not fielding…

        TBF is more technically sound (and it made me very happy when FG added K% and BB%), but in practical application there is very little gained from using (1-lgBABIP)*(TBF-K-UBB-HBP-HR)/3 as the denominator instead of IP.

        Verlander’s numbers may be a start or two out of date, but the point still stands.

        Vote -1 Vote +1

    • sean says:

      because FIP isn’t a predictive stat, it uses what has happened. xFIP is the predictive stat, it replaces the hr/fb rate that the pitcher has with a standard number and says that if the pitcher continues to strikeout/walk/get flyballs at the same rates as before, and with nuetral luck his ERA should be X.XX.

      Vote -1 Vote +1

      • test says:

        This isn’t true in a meaningful sense. Runs allowed is the bottom line result for pitchers – a skill, like striking people out, not walking them, limiting homeruns, consistently limiting BABIP (for the few exceptions), getting GB instead of FB, is useful only because it turns out that having it helps to lower the number of runs given up. FIP (or xFIP) looks at what it does because they are the best way to relatively simply predict future results (i.e., runs allowed). To do so they of course use things that have actually happened already, but that doesn’t mean they aren’t basically predictive stats.

        As stated in the articles Dave linked to, it’s a matter of personal preference. I prefer that lucky players get credit for the lucky results, even if it’s unlikely they will ever have the same results again.

        I don’t mind checking both sites, and I don’t think bbref does it perfectly either, but in this case, I find their pitchers WAR more useful as a measure of who had the best results in a given year. If I wanted to pick a player for next year, I come back here. Six one, half dozen the other…

        Vote -1 Vote +1

      • RC says:

        FIP ignores 70% of what happened. Its primarily used because it predicts ERA better than ERA does.

        It most certainly is predictive, and not descriptive.

        Vote -1 Vote +1

      • theonemephisto says:

        The problem is that you’re confounding the pitcher’s contribution and the defense’s contribution. Those are impossible to separate out in ERA.

        Vote -1 Vote +1

      • RC says:

        They’re not seperated out by FIP either (as it uses IP).

        And while no, you can’t seperate out defense just using ERA, its silly to completely ignore balls in play. We know that pitcher’s have some control over BABIP. Just not full control.

        Vote -1 Vote +1

      • Garrett says:

        Hrm. I prob should read thread before ranting. RC wins.

        Good effort DC. You’re improving your knowledge of English. Perhaps Eno Sarris will take note.

        Vote -1 Vote +1

      • A guy from PA says:

        Here’s a simple example RC I’m sure you will understand. Say a pitcher has a 8 K/9, a 3 BB/9, and a 1 HR/9 rate. All of those numbers are descriptive. Now, if I combine them into one stat, those numbers still are descriptive. Does it describe everything, NO, but it is still a descriptive stat. Just because it doesn’t describe everything and ignores certain things, doesn’t make it not descriptive. Just like ERA ignoring things arbitrarily called errors doesn’t make ERA not descriptive, FIP ignoring the small BABIP control a pitcher has does not make it not descriptive.

        Vote -1 Vote +1

  12. Kevin says:

    Doesn’t the fact that Fangraphs WAR for pitchers is largely FIP based mean that it is…at least sort of predictive as well?

    Vote -1 Vote +1

  13. The WAR to end all WARS says:

    My thoughts on Buster aside, I think his questioning of WAR’s usefulness in evaluating a player is valid. He is saying you can’t just Look at WAR, you have to go deeper. Victorino’s WAR is 6.1. If I look at that alone I have no idea what that means for evaluation purposes. I need to break down the components of WAR and see what’s driving the 6.1 WAR.

    That brings up my point. If I have all the components of WAR in front of me. Then why do I need a formula with subjective weights and unreliable statistics (cough I am looking at you UZR) to synthesize the data for me? We all bring our own subjectivity to the table and so does WAR.

    I’d rather use my own preconceptions and assumptions because I know what they are. It’s simply harder to understand WARs

    Vote -1 Vote +1

  14. drewcorb says:

    I know WAR is a very respected statistic. It is useful because it tells us things that we might not know intuitively from looking at unsophisticated statistics. I don’t know how its validity is verified though. Can we total the WARs of each player on a particular team to calculate the expected win total of the team and compare that to the team’s actual win total? I just recall Buster being skeptical that Ben Zobrist was ranked so high with WAR; part of the argument for WAR is that is told us of Zobrist’s value when we might not see it so obviously. But what test did WAR pass that led us to trust it? What standards were used in its development? So far to me, the rationale seems logical yet hand-wavy. So I have a hard time trusting WAR charts where many players are within 1-1.5 WAR of each other in a given season, because I know of no standards that provide some uncertainty of the statistic.

    Sorry for the long comment, but I wanted to make sure I articulated my concern well.

    Vote -1 Vote +1

    • MC says:

      Well, each baseball player has 25 guys on it, so 25*(1) or 25*1.5 is quite a variance, wouldn’t you say?

      But of course you can calculate that – click on ‘Leaders’, then click on ‘Team’. Scroll down to your team and add. Rinse/Repeat for pitchers. Add…Here are the Blue Jays

      Bautista – 7.8
      Escober – 4.0 (11.8 team total)
      Molina/Arecibia – 2.5 (14.3)
      EE – 1.3 (15.6)
      Lawrie – 1.2 already! (16.8)
      Thames – 1.0 (17.8)
      Lind – 0.7 (18.5)
      Snider, Nix, Mccoy, Davis, Cooper, Hill – (-0.1) (18.4)
      Rivera/Rasmus : Call it even

      Pitchers

      Scrabble, Jo Jo, Dotel, Frasor – ~.7
      Morrow – 3.4
      Romero – 2.6
      Villanueva – 1.1
      Rest – 2.3

      Team total=18.4 + 10.1 = ~28.5 WAR

      I believe that a team of replacement level players would win ~43 games per year, for a per game win rate of .265 (I may have replacement team wins wrong). The Blue Jays have played in 130 games this year, so a replacement level Blue Jays team would have 34.5 wins. Add the Blue Jays’ team WAR of 28.5, and you get 63 WAR-expected wins. Their actual total is 66, so it appears quite close here. I’ll leave the other 29 teams to you…

      Vote -1 Vote +1

      • MC says:

        Sorry..each baseball *team*

        Vote -1 Vote +1

      • drewcorb says:

        Where did you get the ~43 wins or win rate of 0.265 for a team of replacement players? I’m assuming this has been done to verify the WAR formula, or some other verification method. I’m just wondering what was done. I doubt people just developed a formula that “made sense” and called it good. It must have been verified sometime before you just calculated the Blue Jays’ win total today. How was that verification done?

        Vote -1 Vote +1

      • MC says:

        The amount of replacement level wins is irrelevant honestly. It’s the control group – what is used to compare each player to. It doens’t matter if it’s 43 wins per 162 (which is where the .265 comes from, 43/162) or 22 wins or 18 wins, the baseline WAR player is still the same.

        However, all of their information as to where these numbers come from is here on the website, you just have to look for it..

        Vote -1 Vote +1

      • MC says:

        Here is a quick weblink/article quote

        Second what exactly is replacement level? Where did you come to that baseline?

        All it is is the contributions of players that are not part of the 25-man roster, basically. You can also look at it as the best (non-prospect) AAA players. There’s been many studies on this issue, and the consensus is very close to being 2 wins below average. MGL for example uses 18 runs below average per 150 games. Keith Woolner uses 20 runs per 162 games. I use 2.25 wins below average per 162 games.

        From this weblink here…
        http://www.insidethebook.com/ee/index.php/site/article/mike_silva_chronicles_part_2_war/

        Vote -1 Vote +1

      • drewcorb says:

        I’ve read the reasoning and basis for WAR before. I hope I’m not coming off as though I don’t appreciate its how well-thought out and clever it is. However, I can’t see where it is verified against any sort of standards. I’m not sure how this was done, or can be done. If it has not been done, how can WAR be taken with anything but a grain of salt?

        Also, Fangraphs as 157 offensive players between 7.8 and -2.4 WAR. So an uncertainty of 1 WAR is ~10% of the total range, or roughly 15 players. So 1-1.5 WAR is quite a variance, wouldn’t you say?

        Vote -1 Vote +1

      • jonts26 says:

        Here’s a quick study BtB did a little while back plotting team WAR totals against actual wins.

        http://www.beyondtheboxscore.com/2009/9/18/1035183/team-war-vs-actual-wins

        Vote -1 Vote +1

      • Just looking at that data, I would be shocked if aggregate war is noticeably better than just looking at OPS X playing time aggregates.

        Vote -1 Vote +1

      • drewcorb says:

        So the slope of wins to date vs. team WAR is not very close to 1, and it has a fairly good correlation to show that WAR undervalues players. So does it undervalue all players equally or is it biased against some sort? I think it must have some systematic flaw since it does not appear to be within noise of a slope of 1. So this brings me to my original question, how can we expect to get a real reflection of a player’s value from WAR?

        Vote -1 Vote +1

  15. Garrett says:

    It should be noted FIP does not describe run prevention. It describe predictive “tools” that impact run prevention. It also fails to account for potential pitcher talent beyond its rate stats. (BABIP depression, SLG depression, etc).

    Vote -1 Vote +1

  16. joser says:

    I wonder, though, if GMs are using WAR when negotiating contracts — at least when it works in their favor. (I’m certain agents are using WAR in negotiations when it works in their clients’ favor). Even though players are given contracts in the anticipation of future performance, it seems like past and current performance (aka “track record” and “veteran-ness” etc) matter a great deal; certainly, here at fangraphs you see contracts analyzed in terms of (current and projected) WAR.

    Vote -1 Vote +1

  17. Don Mynack says:

    WAR, what is good for? Absolutely nothing – Buster Olney

    Vote -1 Vote +1

  18. Most reporters ply their craft by creating a narrative and then trying to make it a centerpiece of making sense of the facts as they see them. Olney may find WAR a challenge to that narrative and his associated interpretations, but I’m skeptical about his ‘they don’t use it so it’s useless’ logic (?). Myself, as a Giants fan, have a single example that convinced me of it’s usefulness. I looked at Tejada’s performance via WAR and concluded his better days were way behind him, but more importantly, that 2011 wasn’t likely to represent a Renaissance given the trends in his WAR score. That certainly let me adopt a useful POV on what was to come, I’m sorry to say.

    Vote -1 Vote +1

  19. pft says:

    I like WAR and other stats that simply tell what a player did, as these stats are helpful to MVP discussions (not necessarily best player).

    My gripes with WAR are:

    1. Using unregressed UZR. Single season UZR should be regressed 50% per MGL’s instruction.

    2. Park adjustments. Players do what they do at the park they play at, and the good hitters adjust to the park they play at. Why adjust for park, unless you want to predict what they would do at some other park. Also, unless they adjustments take into account how a park plays for RHB/LHB and RHP/LHP they are useless. I mean, does SAFECO really hurt Ichiro? Is CC really hurt by the Yankees short RF porch (given teams load up on RHB)

    3. My other grip is for pitchers. FIP does not tell what a pitcher did. This is a predictive stat, and one which filters out luck, kind of like BABIP which is not used for offensive WAR. To know how a pitcher actually did, lucky or not, you have to look at actual runs allowed, not theoretical runs allowed.

    So it seems to me that WAR has identity problems. It tries to measure what a player has done, but it also makes adjustments to go past what a player has done to look at context neutral skill, which is somewhat predictive.

    WAR is still useful, but I LOL at those who argue that someone is more valuable to his team because his WAR is 0.5 more than another player. The best indicators of what player has done are the individual counting stats (and rate stats derived from them) that go into WAR, as well as some others that are ignored, like RBI and R scored (adjusted for opportunity) and context dependent stats (performance late and close, RISP. etc). These of course are subject to some interpretation, but that makes for good discussion as opposed to the science is settled approach by some using advanced metrics.

    Vote -1 Vote +1

    • suicide squeeze says:

      Re #3: FIP does tell what a pitcher did. It attempts to base a pitchers performance only on those things we can reasonably attribute to the pitcher. If you don’t agree with that, then rWAR would probably be more to your liking.

      Vote -1 Vote +1

      • Romodonkulous says:

        Your words: “attempt…reasonably…attribute.”

        These are all words/terms/descriptors of something pertaining to predictive function.

        There is no way around it. FIP attempts to normalize fielding factors, which are inherently NOT normalized in the field of play. And how exactly does one normalize said factors?

        …by “attempting to reasonably attribute” x and y in proper variable context of a and b.

        No one is saying FIP is unreasonable…simply that at it’s most BASIC function, unescapably predictive in nature.

        Vote -1 Vote +1

  20. Bill but not Ted says:

    WAR is currently the trendy stat that many like to take out of context. It’s not a perfect stat and needs time to mature and iron out it’s flaws. I see WAR at this point as BJ Upton, great potential but he’s got some things to figure out. So using it for anything besides ultimately pointless debate seems foolish.

    Which, I am afraid, is what Buster’s point is.

    Vote -1 Vote +1

  21. Evan says:

    I’m wondering why WAR doesn’t take WPA or WPA/LI into its equation. From what I understand, WAR is context neutral as it is, which I can appreciate. But when assigning value to past performance isn’t the context important? It would change the way some people look at an MVP race…
    I understand that luck plays into WPA. In WAR a homerun is a homerun, but with WPA putting context to that, it shows that not all homeruns (or any other stat) are made the same. WPA/LI really, in my mind, shows what a player means to a team. Bautista isn’t just the WAR leader but he owns the Win Probability boards.

    Also, what about Stolen bases and Caught Stealing being used in UBR and, subsequently, WAR. I could be mistaken, but I don’t think it is. A stolen base from 1st to second puts a runner in scoring position, which can be extremely important. Look at two WAR leaders in Pedroia and Granderson. Both have 24 stolen bases, but Granderson was caught 4 more times, creating four additional unnecessary outs (while still advancing into scoring position up to 24 times). Shouldn’t these things factor into WPA and WAR, and also the MVP discussions?

    I’m a newbie to FanGraphs and have been doing my best in educating myself in the intricacies of this great site and everything it provides. The lack of SB/CS in UBR and WPA/LI in WAR do bother me though. Please correct if I’m off base.

    Vote -1 Vote +1

  22. WAR is a comprehensive statistic that includes many facets of a player’s performance. So saying teams will look at “OBP, ERA, defense-independent P numbers” and not WAR, when WAR includes base-running and fielding events, is like saying people don’t like to eat cake they only eat things with flower in them.

    Vote -1 Vote +1

  23. noseeum says:

    It’s useless to argue that FIP is predictive, not descriptive, around here so don’t bother.

    But the absurdity of using FIP for WAR is right there on the page in Dave’s “Why our Pitcher WAR uses FIP” post:
    “In the end, we had to choose between two different methods – assuming that the pitcher had no responsibility for the outcome of a ball in play, or attempting to approximate the amount of time that the result was due to the pitcher or the fielder. Ideally, we’d be able to do the latter – which is how Sean approaches it – but I just don’t think we currently have the tools available to make an accurate enough judgment on how to apportion that responsibility.”

    And further in part 2:
    “1. FIP-based WAR, which is what we ended up using, essentially admits that we don’t have enough information about dividing responsibility for the results of balls in play, and so it ignores them.

    The problem is that once you put FIP into WAR and start arguing that Francisco Liriano or Cliff Lee deserved the Cy Young last year or should even be in the conversation, you are going a step too far. You are no longer “just ignoring” the balls in play. You are emphatically stating that they are not the pitcher’s fault. You’ve jumped the shark. Those balls in play resulted in runs scored and lost games, making fWAR problematic in any fair Cy Young discussion.

    You could just as easily have gone the opposite route and used ERA in WAR and placed all of the blame for balls in play on the pitcher. To me, thats a much more honest assessment of what actually happened.

    How can one possibly argue that only taking 30% of what happened is a fair description of what happened?

    Vote -1 Vote +1

  24. jts5 says:

    sorry in advance for my ignorance but does anyone know if the positional adjustments in WAR are calculated scientifically/mathematically or just subjectively?

    Vote -1 Vote +1

  25. Ray says:

    This is probably a good time to point out that there is no such thing as a predictive statistic. xFIP, for example, is simply a transformation of another statistic (FIP) with potentially greater value in terms of building predictive models. The notion of “predictive statistics,” from a frequentist perspective anyways, involves putting together probability models and measuring their performance against some criteria (remember p-values, confidence intervals, etc. from your intro to stats courses?). One of the simpler examples of this might be creating a regression model containing the past three years’ xFIP as a predictor for a statistic in the 4th season (RA9, maybe?). You could then evaluate the utility of that model vs. other models using p-values (or a Bayesian maximization method if you’re really up-to-date on your skills).

    Vote -1 Vote +1

  26. CircleChange11 says:

    I strongly agree with …

    (1) MGL’s suggestion to regress single season fielding Runs — UZR.

    (2) Tango’s suggestion to average fWAR with brWAR for pitchers. Not giving 100 or 0 % credit for thing like LOB% and BABIP, but giving some credit.

    I’m not painting myself as anywhere near the baseball stat minds of these 2 guys. But given the situations of FIP and UZR, the recommendations from their creators/developers just make a lot of sense in regards to single season WAR.

    We put a lot of stock into WAR, and even use it with a decimal point.

    We know that P have less control over BABIP than batters do. But we also know that they experience more BIP (via BF) than do batters. So 10 points less than league average equates to a lot fewer hits allowed, which is important.

    I also think that “luck” on BABIP can be something pitchers influence in a single season but not necessarily for their career or consistently. It is still difficult for me to bekieve that for an entire season a pitcher can just catch all the breaks in regards to balls being hit right at fiekders or the defense only playing outstanding when that guy pitches. I could be wrong on that but experiencing that type of luck over 200 IP just seems so extraordinary that the most likely conclusion is that the P must be doing something (location, changing speeds, etc) that leads to lesser contact, but not necessarily more K’s or swing and misses.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *