Why Our Pitcher WAR Uses FIP

Yesterday, respected scribe Joe Posnanski put out the following message on his Twitter account:

Fangraphs WAR has Cliff Lee at 6.6, best in AL. Baseball Ref WAR has Cliff Lee at 4.2, not even in Top 10. We need a summit.

Since Baseball Reference implemented Sean Smith’s version of Wins Above Replacement, people have been pointing out the differences between the two systems. While they use the same basic framework, they have different inputs which naturally lead to different results. For position players, the defensive system is the main driver of the discrepancies, as we use UZR while Sean uses Total Zone. For pitchers, however, it’s not just a different system, but a fundamental difference of what is being measured, and this is what drives the big gaps in WAR for pitchers like Lee.

Our version of pitcher WAR is essentially based on FIP, meaning that a pitcher is judged by his walk rate, strikeout rate, and home run rate (and, of course, the quantity of innings that he throws and the role in which he throws them). Sean takes a pitcher’s actual runs allowed, then makes adjustments to try to compensate for the defense behind him. The two systems might have the same goal, but they’re measuring two different things. Over the last few months, I’ve seen plenty of comments about how some people don’t like our FIP-based version of pitcher WAR for various reasons, so today I thought I’d explain the thought process of why we decided to build it like we did.

There is essentially one big problem that anyone evaluating pitchers has to deal with: how do you separate responsibility for hits allowed? There are usually four or five variables at work on any ball that is put in play – the quality of the pitch, the quality of the swing, the quality of the defensive play, the effects of the ballpark that game is being played in, and sometimes weather. All of these factors influence whether that ball ends up as a hit or an out, and, obviously, they don’t all have to do with the pitcher. So, in creating a statistic that attempts to isolate just the pitcher’s contribution, you have to figure out how you want to deal with the other factors.

Pretty much everyone just assumes that the quality of the opposing hitter essentially evens out over the course of a season, and it likely does most of the time, at least enough to where it doesn’t make a huge difference whether that is factored in or not. The two environmental factors, park and weather, are usually lumped together in one estimate of how a park plays, which is part of the adjustment in pitcher WAR for both our version and Sean Smith’s version. I have some issues with using a single park factor for every player, and I think before too long we’ll have a better way of adjusting for those things, but that’s another post entirely.

The last issue – quality of defensive contribution – is where we do something quite different. When we sat down and talked about how to handle this issue, I essentially realized that there is no right way to do this, based on the statistics we currently have available. There are compromises that have to be made, as we simply don’t have the tools available to really allow us to correctly assign responsibility between a pitcher, hitter, or defender on each hit or out in play. So, no matter what we did, there’d be a problem, and we’d just have to acknowledge that issue.

In the end, we had to choose between two different methods – assuming that the pitcher had no responsibility for the outcome of a ball in play, or attempting to approximate the amount of time that the result was due to the pitcher or the fielder. Ideally, we’d be able to do the latter – which is how Sean approaches it – but I just don’t think we currently have the tools available to make an accurate enough judgment on how to apportion that responsibility.

The why for that last paragraph deserves its own post, so I’ll have a follow-up here on the site in a few hours, where we’ll go through the problems with using both defense-adjusted RA and FIP for pitcher WAR, and show why I think that the FIP model is preferable for now.




Print This Post



Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.


94 Responses to “Why Our Pitcher WAR Uses FIP”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Donny says:

    Would it not make sense than to take the best approach at defense-adjusted RA and weight 50/50 with the FIP model.

    Vote -1 Vote +1

  2. Albert Lyu says:

    Looking forward to the afternoon article, Dave.

    Vote -1 Vote +1

  3. Nat says:

    Have you given any thought to a pitcher WAR calculation that uses tERA?

    Vote -1 Vote +1

    • Antonio Bananas says:

      is tERA the one that basically looks at the “non luck’ (bullshit, ks, BBs, and HRs are nearly as luck driven) balls in play too? Like I believe it looks at line drives, grounders, and flyballs, and assigns value to them? Also, WHY DON’T PEOPLE USE IT INSTEAD OF FIP? FIP bugs the shit out of me. As a former pitcher in high school, high school, I can tell you that with a runner on first, you throw more sinking pitches like 2 seams or sinkers or whatever you have to induce a ground ball. You have control. If in high school, I could try (not usually all that successful because obviously, I’m not Greg Maddux), to achieve a batted ball result, why can’t Roy Halladay or Mark Buehrle?

      Vote -1 Vote +1

  4. noseeum says:

    I understand the approach, but to me this just makes pitcher WAR rather useless when it comes to measuring Cy Young, etc. We know for a fact that a pitcher has some level of input into what happens on balls that are hit against him. If a guy gives up 5 doubles in a row with now outs, he’s having a bad day, but not by FIP standards. The vast majority of hits a pitchers gives up are singles, doubles and triples, not home runs.

    FIP, to me, is good for measuring skill and to use to gauge what to expect from a pitcher in the future, i.e. “this guys hardly walks anyone, he strikes out a lot of guys, and he gives up few homeruns. I expect he will be awesome next year.” WAR, to me, is meant to assess who actually contributed the most to his team’s success in a year, i.e. “Hmm, this guy gives up few home runs, walks very few guys, and strikes out a lot, but geez, he gave up a lot of hits this year, resulting in a pretty high ERA, so he’s not as valuable as Mr. No Hit over there. I think he’ll be good next year. Whether he was unlucky or what, this just wasn’t his year.”

    If you take singles, doubles, and triples out when assessing a pitcher’s year, that’s just way too much. You’ve now moved away from reality IMO. Maybe I’m missing something, but that’s how I see it right now.

    +31 Vote -1 Vote +1

    • Jamie says:

      thats how i see it. i always here how fangraphs WAR is “descriptive” and not “predictive.” i always thought it was the other way around though, for just what you stated. i’ll look at fip/xfip in relation to era and see if a guy is going to come back down to earth(good or bad) and adjust in my head how good a guy is. to me that means that fangraphs is predictive. bbref war is more descriptive because he takes into account sequencing/hr’s. basically describing how he pitched this year.

      Vote -1 Vote +1

      • Rich says:

        FIP is neither descriptive nor predictive, it just is.

        The idea that balls put in play are under no control of the pitcher is absurd.

        yes, there is a huge element of luck, but FIP goes way too far trying to eliminate it. tERA is much nicer, but still isn’t great.

        Vote -1 Vote +1

    • GhettoBear04 says:

      Ok, so let’s say that a pitcher can control the type of hit they give up. Then why not use a tRA based system?

      Vote -1 Vote +1

    • Kevin S. says:

      A double can be a no-doubter or a ball the defender should have gotten to, and in many cases the difference between the double and the out is a slight change in horizontal trajectory. Many would say it’s highly unlikely that a pitcher could give up five doubles in a row and have them all be balls down the line (i.e. near-groundouts) and balls that slow outfielders would have gotten if they weren’t so slow, but isn’t the idea of five consecutive doubles in the first place fairly unique? FIP doesn’t exist to evaluate extreme events in small samples, and to use those to show it’s broken is an egregious misuse of the stat.

      Vote -1 Vote +1

    • Spoilt Victorian Child says:

      WAR isn’t meant to assess how a player contributed. It’s meant to assess how well he played. If it were the former it would just be WPA. Since it is the latter, it should not include luck. So it doesn’t (or rather, it tries not to, insofar as it is capable). I honestly don’t see the problem here.

      Vote -1 Vote +1

  5. Mike Green says:

    Could you do both?

    For 1 year purposes, there is much to be said for “FIP- WAR”; over longer periods “Defense-Adjusted RA” becomes more important. BP had a version of both posted on its DT cards a few years ago, but I haven’t take a look recently to see what they do.

    Vote -1 Vote +1

  6. Will says:

    Ultimately, do we really need an all encompassing statistic? Is it enough to look at ERA for actual performance and then rely on FIP and park factors to help make assessments and predictions? Perhaps, by trying to cram everything into one box, we are getting an unusable end product?

    Vote -1 Vote +1

    • Bill@TPA says:

      ERA doesn’t tell you “actual performance.” It tells you acutal performance plus defense and luck. FIP is a lot closer to “actual performance” than ERA is.

      Vote -1 Vote +1

      • Will says:

        I am sorry, but ERA does tell you about “actual” performance, as in what really happened. Just because luck might be involved doesn’t mean something didn’t really happen. Now, we might want to filter luck from a result, but that doesn’t change the reality.

        Vote -1 Vote +1

      • Nadingo says:

        There’s a big difference between what happened and pitcher performance. The difference between an earned run and an unearned run should tell you all you need to know about why ERA is a poor measure of pitcher performance.

        Vote -1 Vote +1

      • Sky says:

        ERA tells you what really happened (well, discounting the supposed effect of errors) for the whole defense. You don’t judge a hitter by how many runs his team scored in an inning. And while a pitcher usually is a larger contributor to his team’s prevention of runs in an inning than any one offensive player, he’s only part of the puzzle.

        Just because ERA is exactly representative of (earned) runs allowed doesn’t mean it’s better at judging pitcher performance or value than a metric that isn’t precise.

        Actually, that’s a good analogy. ERA is very very precise. But not so accurate (for pitchers). FIP (or whatever) is much less precise, but tends to be more accurate (depending on the question/context).

        Vote -1 Vote +1

  7. Rich says:

    “Pretty much everyone just assumes that the quality of the opposing hitter essentially evens out over the course of a season, and it likely does most of the time”
    It really doesn’t at all.

    Vote -1 Vote +1

    • Levi says:

      Uh, really?
      This is starting to sound a lot like the argument for why Price should get the Cy Young. Certain pitchers might throw a few more games against playoff teams than others, but over the course of 30+ starts, the difference is marginal, at best.

      Vote -1 Vote +1

      • Rich says:

        No, its not.

        The spread of OPS faced by a pitcher over the course of the season is as high as .050 across the leagu.

        Eseentially, the average batter Pitcher 1 faces may be a .750 OPS guy, which Pitcher 2 is facing an average .800 guy.

        Thats not marginal at all.

        Vote -1 Vote +1

      • NEPP says:

        Well, by its very nature, an unbalanced schedule makes it so it doesn’t even out.

        Vote -1 Vote +1

      • Wally says:

        Rich, what if you use a park adjusted stat? Many of those hitters are nested in home parks that can swing OPS ~.050.

        And of course the difference is marginal, Levi. I think you’re looking for something like insignificant, not marginal.

        Vote -1 Vote +1

      • Antonio Bananas says:

        You can’t just say “Pitcher A pitched against an average OPS of .750 and Pitcher B pitched against an average OPS of .800″ because, let’s say you pitched against Dan Uggla in May 2011, you likely got him out, if you pitched against him in August 2011, he probably smoked you. If you REALLY want to see how tough his competition was, I suggest looking at how well the hitters were hitting say 10 games before and 10 games after the start. Hitters have a lot of variation because of the streakiness of baseball. Of course, you should also adjust their 10 games prior and 10 games after the game for the quality of competition they were against. So if you were hitting against poor pitching, of course your OPS is going to go up, so you should adjust it.

        I guess in other words, if we’re going to nit pick, let’s not half ass it and dive all the way in.

        Vote -1 Vote +1

  8. Will says:

    That’s a good point too. Are we sure that it evens out over the course of one season? Also, what about platoon advantages that lefties usually enjoy. I wonder if southpaws benefit inordinately from this effect?

    Vote -1 Vote +1

  9. Xeifrank says:

    Seems like a pitcher with a really low HR/FB rate is going to get overrated (Lee 6.3%) and a pitcher with a really high HR/FB rate will be underrated. Perhaps move from FIP to xFIP with an adjustment for HR based on park?
    vr, Xei

    Vote -1 Vote +1

    • Kevin S. says:

      I am glad you mentioned the park adjustment, because one of the things that really bothers me about xFIP is that we try to neutralize HR rates but then sort of half-ass it.

      Vote -1 Vote +1

      • Mike K says:

        I think it is purposefully left unadjusted. Pretty sure that FIP and xFIP by definition are an estimate of what the pitcher would do in a completely neutral setting.

        Vote -1 Vote +1

      • Kevin S. says:

        If they want to estimate what they’d do in a neutral setting, then they’d have to adjust for park factors. Pitchers don’t pitch in neutral settings, so FIP/xFIP don’t give you neutral-setting results.

        Vote -1 Vote +1

    • Rich says:

      Which is silly, because HR/FB rate correlates heavily to the OPS of players faced.

      Vote -1 Vote +1

    • Dave Cameron says:

      We’re trying to separate out defense and pitching, not luck and pitching. HR/FB seems to be mostly luck and park, so I don’t think we want to adjust for it beyond the park factor.

      Vote -1 Vote +1

      • Xeifrank says:

        Wouldn’t xFIP seperate out
        …defense, pitching and (some of the) luck
        and FIP,
        defense and pitching.

        FIP and xFIP are estimators, neither is perfect – it just seems like xFIP would be more accurate than FIP of a pitchers true talent level (larger the SS the better). If you don’t mind living with players with a lucky HR/FB having an inflated WAR then yeah who cares which one we use. But it seems like that is where some common ground between the two pitching WAR systems could be found. Just an idea, not trying to be argumentative. :)
        vr, Xei

        Vote -1 Vote +1

      • Joe says:

        Trying is the key word here. Liriano has played for middle of the pack defenses during his career, yet his BABIP is extremely high and likely why his FIP and ERA differ so much. Does his defense just play poorly for him? How do you reconcile the difference here?

        Also, and this is minor but still important, do you adjust strikeout and walk rate for batters faced? Liriano gives up tons of hits, therefore he artificially faces more batters and therefore strikes out (and walks) more per 9 since formula only uses K and BB/ IP and not PA. Shouldn’t that be addressed? It’s minor I’m sure, but the fact that you haven’t even thought of it is surprising.

        Vote -1 Vote +1

      • Born in DC says:

        So Babe Ruth got lucky 714 times?

        Vote -1 Vote +1

      • Antonio Bananas says:

        Born in DC, Babe Ruth got lucky because he didn’t have to face black or latin pitchers. So yea, he sort of did get very lucky.

        Vote -1 Vote +1

      • dominic0627 says:

        Dave, out of curiosity does pitcher WAR include the pitcher’s defense?

        Vote -1 Vote +1

  10. Cuban Bee says:

    If only field/fx were implemented in every ballpark I think this whole issue could be resolved eventually. Dave pointed out in yesterday’s chat that it’s going to be too expensive to implement in every ballpark and to extrapolate the meaningful data out of it for it to be realistically usable by sites like fg and bbref, but if it were a possibility to get it funded by MLB and made public to places like this i really think it could provide amazing insight into what a player’s defensive worth really is and also how much each batted ball in play that lands for a hit can be attributed to the pitcher, the hitter, and/or the defender to whom the ball is hit.

    Vote -1 Vote +1

  11. keithr says:

    interesting that there’s a discussion going on right now @ the book blog on how to better implement pitcher WAR here at fangraphs.

    Vote -1 Vote +1

  12. WilsonC says:

    One of the issues I have with a FIP-based WAR is that it ignores the degree of control a pitcher has over the game’s context.

    As an example, for a hitter, if he walks in one PA and homers in his next PA, he has no control over whether or not there’s runners on base when he homers, or whether the hitters after him are able to drive him in when he walks.

    For a pitcher, a BB followed by a HR is certainly worse than a HR followed by a BB. Those are two defensively independent events, but the combined value of them is dependent on the order in which they occur. Regardless of whether that’s a repeatable skill or not, the pitcher is responsible for creating the context in which the HR was more damaging, so that needs to be reflected in his value. Even without getting into how you account for defense, a FIP-based system fails to adjust for the order in which events occur, which limits its use when looking at it as a value metric.

    Vote -1 Vote +1

    • Spoilt Victorian Child says:

      I mean, that’s actually one of the advantages of FIP. We know that sequencing is mostly luck. Since the player isn’t inherently lucky (or unlucky), why do we want to include this in a measure of his value?

      Vote -1 Vote +1

      • WY says:

        “I mean, that’s actually one of the advantages of FIP. We know that sequencing is mostly luck.”

        I think that is a bit of a copout. Certainly, the way that pitchers and teams approach hitters in an opposing team’s lineup isn’t based on this principle.

        Vote -1 Vote +1

      • Will says:

        I don’t think it’s mostly luck. Context dictates pitch selection. While King Felix can get away with throwing one down the pipe after a walk, for example, someone like Javier Vazquez can not. That is not luck.

        Vote -1 Vote +1

      • Donny says:

        “I think that is a bit of a copout. Certainly, the way that pitchers and teams approach hitters in an opposing team’s lineup isn’t based on this principle.”

        Definitely! An opposing pitcher does not pitch to Miguel Cabrera the same way he pitches to Brandon Inge

        Vote -1 Vote +1

      • WilsonC says:

        If you mean value in terms of “how much would I offer a player of this guy’s true talent,” then I would agree that it makes sense to neutralize the sequencing, but if that’s your aim, why not go all the way and use xFIP to neutralize HR/FB luck as well?

        There’s value in neutralizing luck, but when we’re looking at past value, it makes no sense to say that a pitcher has control over BB, K, and HR, and then to absolve him of all responsibility in terms of the sequencing of those events. A pitcher could go through an inning with 3 BB, 3 K, and a HR – all events that take the defense out of the equation – and depending on the order, he could allow anywhere from 1 to 4 runs. In both cases the pitcher started with a blank slate, so does it really make sense to absolve him of responsibility for a mess of his own creation because sequencing’s not a particularly repeatable skill?

        Vote -1 Vote +1

      • Spoilt Victorian Child says:

        It’s not a copout and it doesn’t really have anything to do with the way you pitch to Cabrera vs. the way you pitch to Inge. It has been studied, e.g., here: http://www.insidethebook.com/ee/index.php/site/comments/halladay_v_lee_does_sequencing_count/

        Vote -1 Vote +1

      • Rich says:

        That article doesn’t prove what you’re stating at all.

        in fact, the author goes on in the comments to say that there actually is skill in these splits, and its not luck based.

        Vote -1 Vote +1

    • Dave Cameron says:

      I agree, this is a flaw in FIP. But it’s a flaw that can be adjusted for by looking at split data and determining whether this actually happened. In Lee’s case, we can see that it didn’t – his lack of runner stranding was all BABIP related.

      This is the kind of flaw that can be accounted for fairly easily. To me, its not a deal breaker.

      Vote -1 Vote +1

      • WilsonC says:

        Can it really be accounted for that easily, though?

        I agree that Lee’s strand rate is BABIP related, but the question is whether it’s more accurate to start with an assumption of a normal distribution of balls in play, or to start with a normal distribution of defensive performance. I think we’d agree that both assumptions are flawed, but is there there real evidence that shows that BABIP fluctuation is more attributable to defensive performance than it is to a variance in how and where the ball is hit?

        Either way we’re making mental adjustments based on assumptions, which is fine. Looking at Lee again, if I start with FIP, I’d mentally adjust his WAR downward, based on the assumption that some of the BABIP split is probably a result in his own batted ball splits, and if I start with Sean’s number I’d adjust his WAR upward based on the assumption that some of the variance has to do with a difference in how the defense played. Without knowing with more confidence just how much of a role the defense played in a given pitcher’s BABIP, we can’t know which starting point is closer to a pitcher’s true value.

        The core point, which I think you addressed well in your two posts, is that regardless of which method you prefer as a starting point, it’s important to understand the method’s limitations and adjust.

        Vote -1 Vote +1

  13. Shane says:

    Somewhat new to the saber stats, but as a cardinal fan its confusing to see Carpenter and Wainwright have ERA’s way lower than FIP for the last 3 years, while the cardinals team UZR has been right around zero. Is it possible that they have an ability to pitch better than FIP suggests, or have they been lucky over the 1000 or so innings?

    Vote -1 Vote +1

    • WY says:

      I think their relatively high ground ball rates might have something to do with this? They’ve both been in the Top 15 or 20 in GB% for each of the last two seasons.

      Vote -1 Vote +1

    • suicide squeeze says:

      Could be the park, could be ground ball tendancies, could be luck, could be skill….and it’s probably a little bit of each.

      Vote -1 Vote +1

    • kbertling353 says:

      It seems like pitchers who get a ton of groundballs outpitch their FIPs.

      Vote -1 Vote +1

    • Dave Cameron says:

      It’s more likely that they’ve been lucky. Carpenter’s career FIP is 3.89 compared to a 3.79 ERA. It’s not like he’s been able to do this his entire career.

      Vote -1 Vote +1

      • Dave Duncan says:

        It seems like the years he was on the Blue Jays his ERA was a lot higher than his FIP and his years in Lou have been the opposite. I don’t believe it was just the stats evening out. He did not have a 50% GB rate in Toronto. He changed. We know the pitching philosophy of Dave Duncan and ground balls.

        Vote -1 Vote +1

      • WY says:

        “It’s more likely that they’ve been lucky. Carpenter’s career FIP is 3.89 compared to a 3.79 ERA. It’s not like he’s been able to do this his entire career.”

        Well, there’s no way to compare his GB rates with the Blue Jays to his GB rates with the Cardinals. But considering that the Cardinals are always near the top of the league in that stat, and also that a bunch of pitchers have seen their GB rates rise after joining the team (Joel Pineiro comes to mind), I would bet the Dave Duncan effect has something to do with these pitchers outperforming their FIPs.

        We know that Carpenter and Wainwright have excellent GB rates, and it at least makes sense that GB pitchers would have a tendency to outperform their FIPs. It seems like a bit of a copout to just assume the difference is just a matter of “luck.”

        Vote -1 Vote +1

      • misc says:

        I wonder if artificial turf may also play a role in the Toronto / St Louis data, as it seems like a GB-heavy pitcher wouldn’t do as well on astroturf.

        Vote -1 Vote +1

  14. Max says:

    The experiment failed, guys. It’s time we go back to ERA, wins, and saves to measure a pitcher’s true talent.

    Vote -1 Vote +1

    • Jason B says:

      Wins? Saves? *Grimace*

      Yes, Felix at 13-12 is pretty marginal. Totally his fault that the M’s have a historically inept offense. He should know how to win better. His174 ERA+ be damned.

      On the other hand…Mike Williams, 2003 NL All-Star, 28 saves, 6.14 ERA, 69 ERA+? Awesome. Just look at that gaudy saves total!!

      The “wins and saves” experiment has failed. Miserably.

      Vote -1 Vote +1

    • Dave Duncan says:

      There was a lack of heart and scrappiness in this post.

      Vote -1 Vote +1

    • CFIC says:

      ahahahahhahahah

      Vote -1 Vote +1

  15. Guy says:

    “In the end, we had to choose between two different methods – assuming that the pitcher had no responsibility for the outcome of a ball in play, or attempting to approximate the amount of time that the result was due to the pitcher or the fielder….I just don’t think we currently have the tools available to make an accurate enough judgment on how to apportion that responsibility.”

    The tools may not be accurate as we’d like. But we know that far more than 50% of the outcome can be explained by the trajectory/location/speed of the ball, factors that the fielders cannot possibly influence. And fielders obviously have no responsibility for the sequence in which pitchers give up their Ks, BBs, and HRs (and little over the timing of BIP hits). So we know that pitchers have more than 50%, and probably more like 80-90%, of the responsibility for the elements that FIP ignores (leaving the hitter out of this, of course). I don’t see how it can possibly make sense to round 80% down to 0%. Your logical choices are 100% or some estimate of the true responsibility.

    Ultimately, this is as much a philosophical question as a technical question. The question is, when a batter hits a ball that clearly can’t be fielded by any fielder (assuming normal positioning), which describes most hits, whose responsibility is that? I think that question was settled long ago: it’s the pitcher’s responsibility. If it weren’t, it would make no sense to talk about a pitcher throwing a “2-hitter.” And we would talk about 9 players sharing responsibility for a no-hitter. We don’t do those things. Basically, we assume it’s the pitcher’s job to prevent hits. If a fielder makes an especially bad play, we call it an error and it’s his responsibility. If he makes a great play, we call it a web gem and put it on TV, and mentally credit him with a hit prevented. But 90% of the time, we say it’s the pitcher’s success or failure.

    The same logic applies even more clearly to sequencing. Who are you going to hold responsible for the fact that one pitcher did this: BB-BB-HR-out-out-out, while another pitcher did this: HR-BB-BB-out-out-out? Pretending this didn’t happen just isn’t a viable option.

    A FIP-based WAR challenges fans’ basic assumption about pitchers’ responsibility for what happens to the pitches they throw. I won’t say this is “wrong,” but I will say there is zero chance of your persuading most fans to see things this way. And so if only for pragmatic reasons, you should make a change.

    Page 1 of 1 pages

    Name (required)
    E-Mail (optional)
    Website (optional)

    << Back to main

    Vote -1 Vote +1

    • Mike K says:

      I don’t think that’s a good reason to make a change. If they’re right (and I’m not claiming they are), they should keep publishing the numbers and keep explaining why they are right. In fact, I hope they keep doing it because it forces us to keep examining this stuff.

      Now, if the Daves (Cameron and Appleman) feel they can do it better – tERA, SIERA, or something closer to what Rally does – great! We’ll get a better stat, and LOTS of information about why it is better. But changing just because people don’t understand?

      Vote -1 Vote +1

    • suicide squeeze says:

      1. FIP doesn’t ignore batted balls, it just scales everything relative to them. The coefficients of K, BB and HR come from how “valuable” those outcomes are relative to a ball in play.

      2. I’m pretty sure we’ve determined with at least some certainty that pitcher skill is a relatively small part of babip variation, so no, the pitcher is not more than 50% responsible for what happens to balls in play.

      All that being said, I would like to see some sort of FIP where we adjust for the amount of babip fluctuation that we can reasonably assign to the pitcher. How on earth we would do that with the current data we have, I do not know. I know SIERA works on this some, but through different methods.

      Vote -1 Vote +1

      • Rich says:

        “2. I’m pretty sure we’ve determined with at least some certainty that pitcher skill is a relatively small part of babip variation, so no, the pitcher is not more than 50% responsible for what happens to balls in play”

        We haven’t determined that at all.

        Vote -1 Vote +1

      • suicide squeeze says:

        I think there’s an article on Baseball Prospectus about assigning BABIP variation, but I don’t have a subscription.

        Vote -1 Vote +1

      • Wally says:

        Rich,

        Yes we have. BABIP moves maybe what +/- .010 between pitchers with long careers outside Mariano and knuckballers?

        Vote -1 Vote +1

      • delv says:

        Wally,

        You’re conflating BABIP and ‘OPS-IP’.

        Vote -1 Vote +1

    • Dave Cameron says:

      If we knew which plays could certainly be fielded, then this would be a lot easier. We don’t – that’s the problem.

      FIP measures exactly what it says it measures, and is pretty simple to understand. RA based WAR measures something that didn’t happen and that even I can’t really explain in less than 1,000 words. I don’t see how that is preferable.

      Vote -1 Vote +1

  16. Gio says:

    David Appelman should buy Baseball-Reference.

    Vote -1 Vote +1

  17. AdamM says:

    While the explanation for the different WAR calculations is informative, this post does not address Posnanski’s point. If two competing systems can produce such widely differing numbers for a statistic meant to convey overall value, then a summit is needed so that consensus can be reached.

    What we need are suggestions for compromise, not arguments over why you think your calculation is better.

    Vote -1 Vote +1

    • Dave Cameron says:

      I agree that it’s less than ideal to have two different stats calling themselves WAR and being calculated differently. I don’t know what the solution to that is, though. I don’t think its fair to either site for one to ask the other to change the name of their metric. If there’s a solution to this, i haven’t heard it yet.

      Vote -1 Vote +1

    • batpig says:

      excellent post — this drives right at the heart of the issue.

      bottom line, FIP-based WAR has too many “weird outcomes” like Cliff Lee that simply don’t pass the smell test!

      Vote -1 Vote +1

      • batpig says:

        I meant to say that the DIFFERENCE between the two WAR’s produces too many “weird outcomes”…. there needs to be some reconciliation here if both are claiming to measure the same thing (total value for the season, stated as wins-above-replacement).

        Vote -1 Vote +1

  18. The Nicker says:

    I think using FIP is fine, but it doesn’t really vibe with the fact that wOBA, which FG uses to calculate hitter WAR, ignores luck.

    If you think using FIP is the best method, then shouldn’t we be adjusting BABIP to xBABIP and recalculating wOBA?

    Vote -1 Vote +1

    • Dave Cameron says:

      We’re not trying to strip out luck, we’re trying to strip out defense. The quality of the defenders behind a pitcher have a much larger influence on the results of a pitcher than they do on the results of a hitter.

      Vote -1 Vote +1

    • Hark says:

      This would make a ton of sense if wOBA were calculated using BABIP, but it’s not. Tango’s formulation for wOBA describes the run values of all outcomes of a given plate appearance–out, non-intentional walk, intentional walk, reaching on error, single, double, triple, home run–and then averages out the player’s sum against his plate appearance, to give you the average value of a given player’s plate appearance.

      The problem in wOBA’s formulation, with all apologies to Tango, is that all outs are, as far as wOBA is concerned, created equal. A strikeout is equally bad to an infield fly, which isn’t true, as runner’s can advance via stolen base, past balls, wild pitches, et cetera on a strikeout (rare, but it happens) and cannot on an infield fly. Likewise for fielder’s choice forces at first that advance a runner to second, which isn’t as bad as grounding into a double play.

      Over the course of a season, the luck involved on balls in play normalizes the non-out outcomes of a plate appearance, so some would-be doubles are outs and some would-be outs are doubles and we expect the difference to be negligible.

      As Dave says, however, we do not have the tools available to us to properly measure the value of individual balls in play. In the case of tERA, we weight outcomes according to both out and run values not by result but by batted ball type. A comparable system could conceivably be developed for hitters.

      However, this still doesn’t ingore luck as it doesn’t account for fielders’ abilities to reach said balls in play. tERA removes luck while accounting for performance for pitchers, but there’s still basic randomness at play. Imagine that for hitters: player A is the leadoff hitter for the White Sox. He’s fast and hits lots of ground balls. Due to the imbalanced schedule proposition, a good deal of those ground balls are hit in the direction of one Yuniesky Betancourt. Good for him. Player B is the leadoff hitter for the Oakland Athletics. He is also fast and hits lots of ground balls. Playing in the AL West, a good deal of those ground balls go in the direction of either Elvis Andrus or Jack Wilson. God that’s lame.

      tERA says these groundballs are of equal value, but it doesn’t account for the fielders or parks involved. For hitters, xBABIP gives us the expected value of a ball put into play for the average of the entire league. But it doesn’t account for the fact that Ichiro is really fast or that flyball hitters in the NL East got to hit to both Jonny Gomes and Raul Ibanez in left field.

      Equally weighting FIP and tERA would be an inelegant solution, but it’s still not context-neutral. We can’t just take BABIP and call it a day, because that’s not how wOBA is calculated, and it still doesn’t normalize for variable luck.

      Field F/X and Hit F/X can go a long way to helping us, but the technologies aren’t widely available.

      Vote -1 Vote +1

      • WY says:

        “But it doesn’t account for the fact that Ichiro is really fast or that flyball hitters in the NL East got to hit to both Jonny Gomes and Raul Ibanez in left field.”

        Gomes plays in the Central!

        Vote -1 Vote +1

  19. Lee Panas says:

    I think it’s important look at both RA and a FIP type stat in evaluating a pitcher’s season. We know that a pitcher has less control over hits allowed and sequencing of events than he does over K/BB but we don’t know how much less, especially for individual pitchers. I would suggest taking some kind of weighted average of rWAR (or pitching runs if you don’t trust the TZ adjustment) and fWAR.

    How much weight you put on each factor depends on the purpose of your analysis. If you are doing projection, it makes sense that FIP carry more weight. For something like the Cy Young award, I’d start off looking at a 50/50 split.

    Vote -1 Vote +1

  20. tommie says:

    I’ve always felt the umpires behind the plate is another variable that needs to be adjusted when it comes to evaluating pitching performances.

    Vote -1 Vote +1

  21. NotDave says:

    “We know that a pitcher has less control over hits allowed and sequencing of events than he does over K/BB”

    You KNOW that? Don’t think so. You assume that. Or think that. Or agree with some study that suggests that.

    But you absolutely do not KNOW that.

    Vote -1 Vote +1

    • suicide squeeze says:

      Yes, we do know that. That’s the basic premise of DIPS.

      Vote -1 Vote +1

      • Joe says:

        Less control maybe, but completely ignoring it only masks the problem. It’s not understood well enough so the solution is to blind oneself and pretend it doesn’t exist? Until it’s fully understood, you can only put so much stock into evaluations like FIP based WAR.

        Vote -1 Vote +1

  22. Rick says:

    What I question is why pitchers with bad relief corps are not given more consideration. If a pitcher is pulled with two outs and a man on first and the next pitcher comes in and gives up a double/triple/homer, that affects the starters ERA when the man on base scores. A better reliever would not give up that run, and the starter’s ERA would be lower. If this happens frequently over a season, a pitcher’s overall numbers could be seriously skewed.

    I also think Wins/Losses in the MLB is to arbitrary. A guy pitches eight shutout innings and leaves the game tied, and then the reliever on the mound gets the win when his team scores, for maybe one shutout inning, or even an inning that he gives up less runs than the opposing pitcher. Weird.

    I also hate that On-Base Percentage doesn’t include on base by error, or even by strikeout. Faster guys are going to affect that. If you got on, you should get credit. I also think all defensive indifference stolen bases should be counted as steals and returned to the players stats. There are several notable comebacks from large deficits that staying out of the double play may have helped in sustaining. And if the defense is indifferent to them taking second, why don’t they let him take third too? They care, they just make a decision to focus elsewhere.

    Vote -1 Vote +1

  23. Antonio Bananas says:

    Something else to consider. Should we adjust pitcher performance based on manager skill? Or bullpen ability? Don’t get all crazy, all I’m saying is that if you have a bad bullpen and/or a bad manager, a guy can be left in the game to give up 2 home runs in like the 7th or 8th inning when to the rest of the civilized world he should have been pulled after the previous inning. His stats look bad and it’s not really his fault, just bad luck. Likewise, some pitchers may have great bullpens/managers that make them seem better than they are. If you only throw 100 pitches through 6 innings and then you’re taken out because your pen is awesome and your manager knows it, you will give up less HR and your FIP looks better. Always thought the same thing with bullpen pitchers. Let’s say a hitter destroys righties who throw hard mostly up in the zone, then the manager puts in a righty who throws hard up in the zone, the ball gets crushed, that’s luck driven too right? Or if you have a good manager and your only skill is a 86MPH sinker and that’s the one pitch a hitter can’t hit, you have a good result, not because you’re really good, but because your manager is smart.

    Vote -1 Vote +1

  24. Jorge Posada says:

    I disagree with using FIP on fangraphs for WAR.

    I object to it for the same reason I object to if we started using BABIP and LD% to start measuring how many HR and what an offensive players OPS should have been for WAR.

    WAR should be a measurement based on actual achievements not based on imaginary numbers and what we believe the players performance should be (removing luck etc).

    If x offensive player got extremely lucky with his BABIP does that mean that those hits shouldn’t count towards his WAR calculation?

    I guess what I’m getting at is WAR a predictive stat or a descriptive one and If fangraphs is choosing the predictive model why isn’t it doing so for hitters?

    Vote -1 Vote +1

  25. kevin says:

    Wouldn’t this be a simple and fairly accurate way to take out defense for pitchers WAR?

    What if you take the UZR of the pitchers defense. You would deduct runs if their cummulative UZR is positive and add runs/WAR for negative UZR. I think assuming an average defense has a UZR of 0 this would take out most of the defensive ability aspect of balls in play while leaving the pitchers influence in balls in play.

    I believe taking balls in play (often over 60% of a pitchers total PA outcomes) out entirely goes too far. Some pitchers give up the occasional HR but are can be very difficult to hit. IE a knuckleball that doesn’t knuckle, a curveball that doesn’t curve, etc. It seems counter intuitive to judge pitchers by only their extremes. HR and strikeouts. This is exactly opposite of how golfers figure their handcap. They throw out their best and worst scores and use their “average” scores to predict their next round’s score.

    FIP can get easily skewed example 9 IP 2 ER 5 hits 2 HR 0 BB 13 K is worse FIP than 9 IP 4 ER 10 hits 0 HR 4 BB and 7 K. The second pitcher might have had 2 balls off the wall (maybe even uncatchable 20+ wall or something) but it has no effect. Plus FIP will always leave out the “meltdown” inning where the opponent gets 4+ hits. It happens all the time and often caused by the pitcher failing to locate pitches and leaving balls over the middle. FIP quickly discards this because it is difficult to measure.

    FIP is a very good predictive tool but I still believe WAR should show what a player actually produced over replacement level.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>