Pitcher Win Values Explained: Part Two

As we announced yesterday, win values for pitchers are now available on the site. As before, we’re going to go through the process of explaining the calculations that lead to the values you see here on FanGraphs and lay the foundation for understanding what these win values represent.

To start with, let’s take a look at the main input that goes into the win value calculation – a pitcher’s FIP, or Fielding Independent Pitching, which calculates a pitcher’s responsibility for the runs he allows based on his walks, strikeouts, and home runs allowed. The FIP formula is (HR*13+(BB+HBP-IBB)*3-K*2)/IP, plus a league-specific factor that scales FIP to match league average ERA for a given season and league. For the win value purposes, we modified the league specific factor to scale FIP to RA instead of ERA.

Why did we use FIP? I know this a popular question, and it’s something I wrestled with myself. However, what I couldn’t get away from is that we wanted the context sensitivity for the position player and pitcher win values to be as close as possible. wRAA, the offensive input into Win Values for position players, is context-neutral – a hitter does not get credit for his situational performance, such as hitting well with runners in scoring position. Since we aren’t giving hitters credit for situational performance, we can’t give it to pitchers either, in order to maintain the same situation neutral scale.

This is going to lead to some questions – we’re aware of that. Claiming that Javier Vazquez was a +5.2 win pitcher in 2006, when traditional metrics will tell you that he went 11-12 with a 4.84 ERA, is going to be a tough sell. We know.

However, the tangled web of responsibility for run prevention is not accurately unraveled by simply giving pitchers credit and blame for all earned runs and fielders credit and blame for all unearned runs. As most of you know, there are so many extra variables that go into a pitcher’s ERA that the pitcher himself simply doesn’t have control over. We have to try to extract the pitcher’s responsibility from his team’s run prevention while he’s on the mound. Using ERA or RA simply adds too many non-pitcher factors into the equation to the point that we’re no longer just evaluating the pitcher.

FIP removes defense from the equation by only looking at three factors that a pitcher has demonstrable control over – walks, strikeouts, and home runs allowed. By using FIP, we’re isolating the pitcher’s core abilities and evaluating him based on those skills. Now, we’re not claiming that FIP captures everything a pitcher is responsible for. It is not the perfect context-neutral pitcher run modeler – we know that. But when confronted with a choice of including way too many non-pitcher inputs or leaving out a few minor actual pitcher inputs, the latter was the better choice. You will get more accurate win values for a pitcher using FIP than you will ERA or RA.

Getting back to Vazquez for a second – his 2006 FIP was a full run lower than his ERA. The driving forces behind his struggles were a .321 BABIP and a 65.8% LOB%. Most everyone would agree that we don’t want to penalize him for poor defense played behind him, but how do we untangle the responsibility for the lack of stranded runners? Vazquez was horrible with men on base in ’06, but most of that was BABIP related – a .343 BABIP with men on versus a .284 BABIP with the bases empty. If we’re going to say that he’s not responsible for his high batting average on balls in play, and the batting average on balls in play was responsible for the lack of runners stranded, than how do we remove the former but not the latter? This is what I mean by a tangled web of responsibility in terms of run prevention.

If you wanted to make the argument that the context-sensitive stuff, such as how often a pitcher leaves runners on base, should be included, then you also need to be prepared to fight for WPA/LI as the offensive metric of choice for hitter win values. And honestly, I won’t put up much of an argument – there’s a case to be made for context-sensitive win values as a useful metric, and I’d imagine there will be a day that those are publicly available too. But, there’s a more compelling argument for context neutral win values, which is what we’ve decided to present here. What most of us are interested in knowing is how well a player performed in helping his team win, regardless of the performance of his teammates. To answer that, we have to strip out as much context as we can.

Think of FIP as the pitcher version of wRAA. wRAA doesn’t include non-SB/CS baserunning or situational hitting. FIP doesn’t include batted ball data or situational pitching. Neither are perfect, but but both give us the vast majority of the context-neutral picture.

That doesn’t mean that we’re set in our ways and that these win values will never be improved upon. If and when a new metric like tRA is proven to be significantly more effective in valuing pitchers (and I’m hopeful that it will be, given more data exploration on the topic), we won’t be standing here as guardians of the infallibility of FIP. We want to get to the truth, and do so as quickly and as accurately as possible. I will encourage you (especially those of you in the “tRA is awesome/FIP sucks” camp), though, to not let minor differences cause you to miss the fact that FIP and tRA lead to very similar results.

This afternoon, we’ll talk about replacement level for pitchers, how it differs for each league and role, and how we tackled the issue.




Print This Post



Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.


59 Responses to “Pitcher Win Values Explained: Part Two”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. snowshoe says:

    Pitcher win value is another nice addition to the site and you guys should be commended for your efforts. The rate of innovation here at Fangraphs is very impressive.

    That’s said, I do have concerns over the validity of the pitcher win value. FIP was a nice advancement at its time, but it is fundamentally flawed statistics that doesn’t represent the underlying construct it is intended to. It is just not conceptually sound and no statistic can make up for a lack of conceptual grounding. Again, not a knock on Tom Tango – it was a huge step forward. But it’s not a high quality statistic and in that regard it is not like wRAA at all.

    It simply leaves too many important variables unaccounted for.

    Right now tRA is simply a much better statistic. It’s much better conceptually rooted and allows for a more valid appraisal of a pitcher’s intrinsic skills. Don’t know if Fangraphs has had any conversations with StatCorner but replacing FIP with tRA here and incorporating it into pitcher win value would be an enormous step forward in the quantitative analysis of pitching.

    Vote -1 Vote +1

    • Dave Cameron says:

      I have to be honest – comments like this bug the crap out of me.

      I like tRA. I like Graham. I like Matthew. We all get along really well. But those of you who have picked up the tRA is awesome/FIP is useless torch are doing them a major disservice, because you’re simply not reflecting the reality of the differences between the two.

      There’s no war between FIP and tRA. Stop trying to create one.

      Vote -1 Vote +1

      • snowshoe says:

        I honestly have absolutely no stake in any “war” and am not trying to create one. I also have no past experience with any “war” and have no reference for it. I’m not a sabermetrician so perhaps this is the kind of insider issues that come up in any field. But I’m not aware of them.

        I’m simply stating my opinion. Not sure what the cause of the antagonism is.

        It’s my opinion that FIP is not conceptually well grounded as its missing important sources of variance. It’s not an adequate model of the underlying reality it is trying to represent. That’s a conceptual issue more than it is a statistical one. Based on my understanding of the game that’s my take on FIP as a model. I hardly think I’m the only one that feels its inadequate to try to describe something as complex as a pitchers intrinsic talent/performance through 5-6 variables, among which line drive rate, GB rate, etc. aren’t included.

        You may disagree but I feel that the distribution and kind of contact a pitcher allows is rather important in analyzing his effectiveness. That’s not captured by FIP so I find that to be a serious conceptual limitation to the statistic.

        Again you may disagree. But that has nothing to do with me trying to stoke some “war” I have no knowledge of. Rather, it’s just a point of concern.

        tRA is also limited in certain way – but less so than FIP as its face validity is higher. That’s my opinion.

        If you want people to simply agree with you and say what you’re doing here is beyond reproach that’s fine.

        But most vigorous analytic communities aren’t like that. Ones that are stultified and narrowing as they become dogmatic often are.

        Vote -1 Vote +1

      • snowshoe says:

        Also, I should add nowhere did I say FIP was “useless.” Those were your words. Not mine. I said the statistic is flawed and limited. Big difference.

        Vote -1 Vote +1

      • Dave Cameron says:

        Also, I should add nowhere did I say FIP was “useless.”

        Quoting your first comment:

        FIP was a nice advancement at its time, but it is fundamentally flawed statistics that doesn’t represent the underlying construct it is intended to.

        You can spin it all you want, but something that “doesn’t represent the underlying construct it is intended to” is useless. Using different words is just semantics.

        I’m simply stating my opinion.

        An opinion is something like “I like chocolate over vanilla.” You’re asserting an argument that can be factually proven or disproved. That’s not an opinion – you’re making a claim to the validity of a product, yet you offered zero evidence to support your claim.

        You may disagree but I feel that the distribution and kind of contact a pitcher allows is rather important in analyzing his effectiveness.

        Prove it. You might be right, but there’s no reason we should take your word for it.

        But most vigorous analytic communities aren’t like that. Ones that are stultified and narrowing as they become dogmatic often are.

        If you want to contribute to a discussion and bring actual evidence to the table, awesome. If you want to throw stones, then I’m not interested. So far, you’ve only done the latter.

        Vote -1 Vote +1

      • The way I see it, FIP gets you most of the way there, and is more or less linear weights on HR, BB, SO, and HBP. If you think these are the things a pitcher can control, great, if you think there are more, (like GB and FB, or whatever else), that’s fine too.

        It’s my understanding that tRA adds the GB/FB and LD (regressed) and does some other stuff, but it’s more or less linear weights too.

        This is why in the other thread I considered FIP a middle ground (though it might not be), because HRs are a pretty important part of a pitcher’s real world performance and FIP considers them, while tRA I don’t believe does.

        Vote -1 Vote +1

      • Samg says:

        Why not use wOBA against? That way hitters and pitchers are exactly equal.

        Vote -1 Vote +1

  2. Ichiro says:

    Dave,

    This is a good start. Three things to consider regarding pitchers having control over baserunners scoring while still in the game:
    1) Ability to pitch from a stretch vs a windup
    2) Ability to hold runners close to the base, preventing steals or big leads
    3) Pitchers inducing more ground balls getting more double plays

    The ground ball rate is known. Preventing steals is sort of known, though it also depends on the catcher so this is tough to isolate. Stretch vs windup is more or less known through ba/ob/slg with bases empty, runners on.

    I do not have a forumula to use, though it seems like something could be constructed to account for these.

    Vote -1 Vote +1

  3. mymrbig says:

    Dave, I would be interested to see you work the anti-Javier Vazquez (Chris Young) into some of your discussions. I know Young benefits from his home park, but at this point I think it is safe to say that he is a guy who can consistently out-produce his FIP while Vazquez is a guy that will consistently under-produce his FIP. Not sure if it is the right topic for fangraphs, but I would love to hear some smart person really delve into this type of pitcher and try to really figure out what something like FIP doesn’t work for them (or why they can consistently get above or below average BABIP, which is generally the underlying cause).

    Good addition, but it is weird to see Vazquez at +5.2 wins in 2006 when he was 11-12 with a 4.84 ERA while Young was worth +0.3 wins in 2006 when he was 11-5 with a 3.46 ERA. Two extreme examples, to be sure.

    Vote -1 Vote +1

  4. TangoTiger says:

    What bothers me about snowshoe’s comment is not that he has an opinion, but rather than he presents the conclusion of his opinion with no evidence whatsoever. Look at the use of these adjectives:

    ===
    … it is fundamentally flawed statistics that doesn’t represent the underlying construct it is intended to.

    … It is just not conceptually sound and no statistic can make up for a lack of conceptual grounding.

    … it’s not a high quality statistic

    … It simply leaves too many important variables unaccounted for.

    … tRA is simply a much better statistic.

    … It’s much better conceptually rooted

    … allows for a more valid appraisal of a pitcher’s intrinsic skills.

    … replacing FIP with tRA here and incorporating it into pitcher win value would be an enormous step forward in the quantitative analysis of pitching.

    ===

    That was his first post. That is alot of non-evidenced based opinion. It is purely a summary opinion, without evidence, and, that means it must be totally discarded. How else could I not? What if someone says the exact opposite of snowshoe, who also doesn’t provide any evidence? How is a third party supposed to decide between two non-evidenced contrary summary opinions?

    I already take issue with the very point brought up. Indeed, if snowshoe can first tell me what was my intent of creating FIP, I’d be glad to hear it. And then, I’d like to know how he thinks that my intent isn’t represented by its construction.

    ***

    tRA is a good metric. David Gassko did something similar at THT. MGL presented it in his DIPS Primer. I’ve created something similar as well. Studes does a fantastic presentation along these lines every year in the THT Annual.

    tRA does exactly what it was intended to do, and that is assign league average success rates to the distribution of allowed contacted balls, without looking at the actual success rates for each pitcher.

    FIP does exactly as it was intended, and that is look only the subset of a pitcher’s performance that is not influenced by the skills of his fielders.

    Whether you use WPA, or WPA/LI (or some offshoot), or tRA, or RA, or FIP, or Studes’ batted ball index, or whathaveyou, there is simply no way that someone can say one is “much better” than any other, not unless they provide evidence to the contrary. As it stands, you look at the career register for all these stats, and you will find how much they all give you the same answer. That, by itself, tells you how much overlap you have.

    Vote -1 Vote +1

    • snowshoe says:

      Tom,

      Your use of the word evidence here is puzzling. I’m making an argument about the face validity of FIP and in turn how it is being used in pitcher win value.

      This is an argument based on theory. It’s qualitative in nature and has nothing to do with analyzing career registries.

      So if you are familiar with a widely accepted, conceptually well grounded theory that addresses the intrinsic value of pitcher I’d be very interested in seeing what that is as I’m not aware of any standard theoretical work in that regard. I would appreciate learning more about what I don’t know about the underlying conceptual theory and models of the game.

      But as far as I do know I’ve never seen any widely accepted conceptual theory or conceptual models with regards to a pitcher’s’ value.

      If there is and FIP is an operationalization of that then I take back my criticism.

      If there is no theory then the face validity of the metric’s use with respect to pitcher win value has to be largely based on opinion. It’s how one views the implicit model FIP represents that needs to be evaluated.

      FIP certainly may have had a certain intention when you created it. But that’s not the issue here. The issue isn’t the original intent – it’s the operationalization of that statistic to create another statistic – pitcher win value. And regardless of the original intent FIP is being used for another discrete purpose in pitcher win value. That’s what I was commenting on. If I wasn’t clear enough about it I apologize.

      FIP very well may have perfectly represented what you originally intended it to do. But it’s now being used for a secondary purpose – constructing win value. This happens with metrics all of the time. And whenever they are used to create new metrics the underlying constructs each represents needs to be taken into consideration.

      And for those purposes I don’t find that it represents the underlying reality of pitcher win value as well as tRA would.

      Again if there was widely accepted standard theory to base this evaluation on it would be that theory that would drive my understanding. This is what people do in sociology and psychology when metrics are constructed for instance. But I’m not aware of any kind of standard theory in this regards with respect to pitching.

      And that’s a general issue in sabermetrics. Many of the metrics developed are atheoretical. That’s not that big a deal – it’s simply a limitation. It’s part of what it means to be a developing field. The lack of theoretical models upon which metrics are in turn constructed is a difficult thing for any field to coalesce around and to agree to.

      Given that lack of theory yes it’s largely up to individuals to make assessments of issues like face validity. If you want to call mere opinion I’d agree with you. It is only opinion. But that doesn’t mean it’s not “evidence.” It is qualitative, low grade evidence that largely derives from deductive first principles. But again, if there is no theory then that’s what you’re stuck with. That’s why I said it was my opinion. That’s a major limitation to my critique and that’s why I was upfront with it.

      You seem to be suggesting that the only allowable evidence is quantitative. That it has to come from the career register. In metric construction qualitative, theoretical evidence is the central foundation. If there is no accepted theory then it falls back to reasoned opinion.

      But this isn’t an area where quantitative evidence is applicable or available unless rigorous psychometric analysis has been done during metric construction. If it has then I apologize as then my critique misses the mark. But I don’t know of any psychometric work here.

      I’ve been involved with complex metric construction in other fields so believe me I know how difficult that can be. The theoretical and psychometric issues in particular. But that doesn’t change the fact that those are issue that are necessary with respect to constructing valid metrics.

      I use FIP all of the time and think it’s useful. All compex statistics have limitations and trade offs.

      At the same time in most fields there is always an attempt to improve. With respect to pitcher win value I feel that tRA is an improvement and a significant one.

      If there were theory available to back this up I certainly would incorporate it. I don’t think there is so you’re right I’m left with opinion. But this isn’t a quantitative question.

      Finally, I’m a bit surprised at the vehemence and hostility in both your comments and Dave’s. I’ve been involved with analytic pursuits for sometime and in general critique’s are usually welcome. No one’s work is methodologically perfect.

      The only way any field, especially a relatively new one, advances is for its work to be challenged. But instead of explaining to me why you disagreed or how I might be wrong you simply said, “so what this is your opinion.” That’s rather disappointing.

      I have no particular horse in this race. But I do have experience in metric construction, scale development and similar issues so I was making some observations. They weren’t intended as insults though I gather you took them as such. If I phrased my critique poorly I apologize.

      Vote -1 Vote +1

      • Bearskin Rugburn says:

        Wow there turbo.

        Finally, I’m a bit surprised at the vehemence and hostility in both your comments and Dave’s

        Not to step on anybody’s toes here, but vehement is entirely inappropriate to describe those responses. Dave said comments like yous bug him, and Tom pointed out that you made a lot of claims without providing any proof. If I say you could be more succinct would you think I’m being vehement and hostile too?

        Vote -1 Vote +1

      • mymrbig says:

        Longest comment ever…

        Vote -1 Vote +1

      • TangoTiger says:

        “You seem to be suggesting that the only allowable evidence is quantitative. ”

        I made no such suggestion.

        ***

        “But it’s now being used for a secondary purpose – constructing win value. ”

        I did not read your original criticism as being based on its secondary use by Fangraphs. I will presume that you intended to, and that either you didn’t make it clear, or that I completely misread your post.

        ***

        “But that doesn’t mean it’s not “evidence.” ”

        Yes, it is not evidence, because there is no discussion point. Nothing about what you originally said can be rebutted, because you provided no evidence for your summary opinion.

        ***

        “Finally, I’m a bit surprised at the vehemence and hostility in both your comments and Dave’s.”

        Again, you presume alot. With the written word, especially in a comment section, you should presume good faith responses. There is no negativity emanating from anything I’m writing. You may read it as such, but you should stop. Would it help if I used an exclamation point. Ok! That better? :) Just presume smiley faces after every third sentence.

        ***

        “But instead of explaining to me why you disagreed or how I might be wrong you simply said, “so what this is your opinion.” That’s rather disappointing.”

        That’s my point! You gave me zero points of reference to discuss anything. I’ve got some 1500 threads on my blog, and countless other replies all over the web world. I am not shy about engaging in a discussion. But, you gave me zero evidence, nothing at all for me to challenge you on.

        Present something, anything, that has any evidence whatsoever, and I’ll either agree with you, or refute it.

        ***

        “They weren’t intended as insults though I gather you took them as such. If I phrased my critique poorly I apologize.”

        I am not insulted. I’m just annoyed that the thousands of readers who have read this exchange have spent ten minutes learning absolutely nothing.

        (Insert smiley faces in indirect proportion to your mood.)

        Vote -1 Vote +1

  5. Anonymous says:

    Let me ask another question… all over again. Why can’t we use wOBA against… somehow converted into runs saved/prevented/created versus a replacement player?

    Generally, we accept that wOBA is a good measure of hitting ability. If we could take some type of hybrid of FIP, wOBA against, and… something else, average the three, and try to create a run conversion for pitchers, I think we’d be far better off.

    I’m not against FIP — I like FIP. But, can anyone tell me why wOBA against (since you all already have the stat on the site in the first place) would be a bad idea?

    Batting average against is not bad… I currently use GPA against as a good indicator of pitching… why wouldn’t wOBA against work? Thanks.

    Vote -1 Vote +1

    • Dave Cameron says:

      Because then you’re penalizing the pitcher for things that are out of his control. DIPS theory, basically.

      We’re already counting defense in the win value calculations of position players. wOBA against for pitchers would double count defense, since you wouldn’t be removing the defensive value from the run prevention performance.

      Vote -1 Vote +1

      • Bearskin Rugburn says:

        I was going to ask a similar question but with an added comment. While using wOBA would indeed penalize the pitcher for his defenders’ faults, it would also not penalize the pitcher for making bad pitches. These two are hard to dissociate, which is why you have things like FIP and tRA in the first place.

        However, as you have access to pretty good defensive numbers based on play-by-play data, I wonder if there was any discussion of using wOBA against and then correcting for team defense. Team-wide defensive numbers are pretty good even over a single season, so this correction could be quite accurate.

        Again, I don’t want to claim this would have been better, I just wonder if it was brought up, and if so, why it was decided against.

        Vote -1 Vote +1

  6. studes says:

    I also don’t get the fascination with tRA. Sabermetrics is trendy, probably given that there is always new blood coming on the scene and each “generation” wants to create their own thing. tRA, or metrics like it, have been around for ages — as Tango notes. But tRA is the flavor of the month, I guess. So be it.

    By the way, I’m a huge fan of Pitching Runs Created, which uses strikeout rate to approximate the “fielding-independent” impact of batted balls — a relationship that DIPS work like MGL’s, Voros’ and Tippett’s have found. I personally think that’s the most elegant solution to the “fielding independent” question. And it’s the reason that FIP works, too. In these equations, strikeouts are more than just outs — they’re estimates of what the fielding independent factor should be.

    But saying that tRA is “better” than ERA, FIP, DIPS, LIPS, PRC or many of the other versions that have been created — well, I haven’t seen it, and it really all depends on what you measure against. If you’re goal is to measure against “true” fielding independence, how are you going to find that, exactly?

    Vote -1 Vote +1

    • mymrbig says:

      I think it is fair to say FIP, tRA, DIPS, LIPS, etc. are all better than ERA. The only thing ERA does better is tell you how many earned runs a pitcher gave up per nine innings pitched. All the others are better for predicting future performance, providing a defense-independent measurement, providing a luck-independent measurement, etc.

      Vote -1 Vote +1

      • studes says:

        In my experience, all those other stats are only marginally better than ERA at predicting future ERA. To me, we’re talking about very small differences here.

        Vote -1 Vote +1

  7. Tony says:

    I believe FIP is a very good statistic for looking at how a pitcher did minus his defense (obviously) and is better to look at when looking ahead than ERA, to try and predict how the pitcher will pitch in the coming years. However, I do not like using it when looking back. If there is a pitcher who is constantly under preforming according to FIP (Vasquez) and someone who is constantly overachieving according to FIP (Wang) wouldn’t you say that the former is doing something wrong and the latter is doing something right? This is why I prefer tRA when looking back (or anything similar to it). It gives weights to GBs and FBs, and I believe that a pitcher has control over whether he induces GB or FB, but does not have control whether or not the balls is turned into an out. I really don’t think it’s the “flavor of the month”, it just suites my interests more.

    Vote -1 Vote +1

  8. TangoTiger says:

    In an all-or-nothing, FIP or tRA, FIP presumes that the pitcher is nothing special in any balls that are fielded by fielders. tRA presumes that the pitcher is nothing special in HR allowed.

    Is one better than the other? You can argue for one over the other. You can try to make it more complicated by using Studes’ batted ball info, and regressing some components more than others, this way you acknowledge SOME impact to the pitcher, but not totally.

    Whatever you end up with will not be satisfactory to everybody. Or even anybody for that matter. The easiest solution is to provide multiple “Win Values” and let the reader choose the one he prefers, be it ERA-based, WPA-based, FIP-based, or tRA-based. And that reader will decide to weight it as he sees fit.

    Vote -1 Vote +1

    • Anonymous says:

      Or, you could weight it BCS-style. If you have 6 metrics… and you drop the highest and lowest scores, and average the remaining four, you’ll be left with a pretty good idea of the “true Win Values”.

      Vote -1 Vote +1

    • Bryan says:

      I like choices. I agree if I can see all of them, I will be happier. Clutter is the only problem with this that some people will have an issue with this.

      Vote -1 Vote +1

    • Sky says:

      I may be wrong, but I believe tRA holds pitcher accountable for HRs just as much as FIP does (park adjusted). It’s tRA* (and xFIP) that start doing some regression.

      Vote -1 Vote +1

  9. Tony says:

    “Whatever you end up with will not be satisfactory to everybody. Or even anybody for that matter. The easiest solution is to provide multiple “Win Values” and let the reader choose the one he prefers, be it ERA-based, WPA-based, FIP-based, or tRA-based. And that reader will decide to weight it as he sees fit.”

    Agreed.

    Vote -1 Vote +1

    • mymrbig says:

      Agreed. The fangraphs guys seem to be running out of things to do anyway. Lets get the context-based win values on all the player pages.

      As long as we are having an evidence-free discussion, why do I like context-based stats quite a bit for RP, somewhat for SP, and not at all for hitters? I think the fact that a RP serves up a go-ahead HR in the 8th means something, but I don’t think we should distinguish between a batter hitting a HR in the 2nd inning versus the go-ahead shot in the 8th.

      Vote -1 Vote +1

  10. studes says:

    I think an important point to remember is that we’re trying to estimate a pitcher’s contribution to his team’s wins. It’s a retrospective stat. We’re not trying to predict future performance. So whether a particular stat does a better job of predicting future performance is irrelevant here.

    Isn’t it?

    Vote -1 Vote +1

  11. Bearskin Rugburn says:

    I understand the need for defensive neutrality when assigning value to pitchers because you don’t want to diminish their value on account of poor defenders. On the other hand, since hitters are evaluated based on what actually happened rather than what should have happened it seems unfair to write off all balls hit in play for the pitchers. Maybe it’s not the pitcher’s fault that a grounder went for a double, but then again maybe he hung a curve and it was absolutely his fault. And if a pitcher does this on a regular basis his value assessment ought to reflect it.

    Since this site publishes some very accurate (as far as these things go) defensive numbers I wonder if it using wOBA against and adjusting for park and team defense was considered as an alternative method, and if it was why it was rejected.

    Sorry for the repeat post, but as the previous one was a response to Dave in particular I wanted to be able to hear what anyone at all has to say about it.

    Vote -1 Vote +1

    • TangoTiger says:

      The difference between a hitter and pitcher is that, given a large enough sample, the quality of opposition pitchers and fielders for a hitter will be fairly random or even.

      For a pitcher, while he will face a random set of hitters, he is stuck with his fielders. So, the important to distinguish between a double allowed, or a hit snared for the pitcher becomes important.

      However, given that you do have UZR, then you can, for example, figure out the team UZR on GB, FB, and LD, and then remove that impact from a pitcher’s distribution of GB, FB, LD. So, if a team has lots of great IF and bad OF, and if they are +50 runs on 2000 GB, then we remove .025 runs per GB allowed by a pitcher.

      That’s alot of work though, compared to the alternatives.

      Vote -1 Vote +1

      • Bearskin Rugburn says:

        It does seem like a lot of work, and if the consensus is that the effort is not worth the improvement then so be it. But it also seems odd that at a time of exponential improvements in quantitative defensive statistics DIPS metrics are still dependent on assuming that every ball in play is created equal (whether they correct for GB/FB tendencies or not).

        But I don’t want to come off as being ungrateful for the tremendous work done on this project by you, the Davids, and anyone else who contributed. This is a great start and whatever improvements are made can only be incremental.

        Vote -1 Vote +1

  12. Sky says:

    Still a proponent of PZR for this type of thing. Just saying…

    Vote -1 Vote +1

  13. bikozu says:

    I’m not sure if this is the right place to ask, but why is there no positional adjustment for pitcher’s batting values?
    Certainly there is some added value for a pitcher who can hit, but it’s rarely discussed when talking about win values of a pitcher. Those 70-80 PAs count for something, and they aren’t totally random.

    Vote -1 Vote +1

  14. Samg says:

    Why not add projections for win values? In part to help estimate team performance, and in part because it would be awesome. And why not use a combination of Marcel, BJ, and CHONE to do that, and for all other projected measures on the site?

    Vote -1 Vote +1

  15. David says:

    I’m not sure if I have this right, but I think that Statcorner has tRA* that regresses the appropriate amount for batted balls.

    Anyway, studies by MGL, THT, and others show that pitchers have limited abilities in the outcomes of batted balls, but have control over the percentage of batted balls that are LD’s, GB’s etc. tRA adjusts for this, FIP doesn’t. It seems to me that one is better than the other.

    Would it be possible for you guys to maybe include a seperate win value calculation that uses tRA? That way, everyone would be happy.

    Vote -1 Vote +1

  16. Jake says:

    I mean no disrespect to FIP–I use it all the time to determine pitchers’ relative values, remove past luck to evaluate fantasy pitchers, etc.–but I agree that including batted ball data would make a more universal metric; groundball pitchers like Wang and Webb and flyball pitchers like Perez and Wakefield are underestimated by FIP. I don’t mean to oversimplify things, but couldn’t one just add in linear weights values for the various batted ball types to the formula? Or perhaps IP could be replaced with
    ([league average of outs/FB]*[FBs allowed]+[league average of outs/GB]*[GB allowed]+[league average of outs/LD]*[LD allowed]+K)/3
    or something similar? Innings pitched is a measure of outs, and most outs are made by the defense behind the pitcher.

    Also, I have been pondering whether HR or [league average HR/FB]*[FB allowed] would be more appropriate, to try and cancel out luck and park factors. Has anybody analyzed yet whether pitchers have significant control on HR/FB, beyond park factors of course?

    Vote -1 Vote +1

  17. Mike says:

    Just a thought on why some pitcher’s FIPs may be noticeably different to their ERA over a given period. I’ve been looking at the UZR for different teams in the available data on Fangraphs separated into IF and OF. Obviously some teams have several years of outstanding numbers and several years of poor numbers (e.g. the Braves’ OF was generally brilliant while Andruw Jones was roaming CF). If a pitcher were to be with the same terrible fielding OF for several years (if the OF was the cornerstone of that team’s offense/franchise type players), or presumably a poor IF, then considering that FIP is Fielding Independent Pitching, you might see a consistent difference due to fielding.

    I have no idea if this is the case for the example of Vazquez given above, but it does show how and why the difference might occur.

    Further, re-snowshoe’s comments, in any academic field comments like ‘bug the crap out of me’ wouldn’t be considered good scholarly work. I realise Fangraphs is not a journal and Dave can react in any way he chooses to comments left on a blog entry, but sabermetrics are struggling to be well accepted by the general public. Any move towards friendlier dispositions and a more academic approach would help the field greatly.

    Lastly, snowshoe’s point suggests that he wishes the experts on these issues (guys like Dave, Tango and others at THT Statcorner etc) to choose between FIP or tRA based upon their conceptual grounding. He doesn’t need to provide evidence for this opinion as he’s raising it as a concern and asking the authority to check it. He refined this point in his second post. Surely, the aim of sabermetrics (and any statistical anaylsis in general) is to find causation behind the numbers. Snowshoe’s ultimate concern, which can surely be answered, is which measure gives us the better explanation behind what causes the numbers. I have no idea what that may be, but it’s a question that can be considered by the writers of Fangraphs and others rather than shot down as in these comments.

    Vote -1 Vote +1

    • Sky says:

      There ARE some pitchers with significantly different abilities to allow/prevent runs than just their K, BB, and HR rates would suggest. You just need a ton of data to show differences in ERA (park-, league-, and defense-adjusted ERA, naturally) and FIP are significant. One, two, even, three seasons aren’t enough to label a pitcher as someone with significantly different secondary skills than their FIP (or tRA* or xFIP or whatever) would suggest.

      Vote -1 Vote +1

    • TangoTiger says:

      Mike: again, just so many summary conclusions here. Take for example your statement:

      “…but sabermetrics are struggling to be well accepted by the general public. ”

      You are providing no evidence here, just a summary opinion. What am I supposed to do with that? I take the opposite opinion. We both just made useless opinions.

      Struggling? You make it seem as if it is barely surviving, when I see it as thriving. I’m at the point where I can’t even keep up with the literature, and I devote alot of time on this stuff. It’s a slow movement, but the public will eventually catch on.

      ***

      “He doesn’t need to provide evidence for this opinion as he’s raising it as a concern and asking the authority to check it. ”

      He NEVER asked a question.

      A person doesn’t need to provide evidence for his opinion. But the worth of that opinion is directly proportional to how much relevant evidence he can provide. If you provide zero evidence, and all you offer is a summary opinion, that opinion has no value whatsoever.

      If you want me to look into something, then ask a question. Then I’ll answer it as best I can, and do whatever work I think can do on the subject or offer suggestions as to how to do that work.

      Making an opinion, as if its something that I need to refute, is not the way I proceed. I’d be spending 30 hours a day refuting people’s opinions with evidence if that’s the case.

      ***

      “Snowshoe’s ultimate concern, which can surely be answered, is which measure gives us the better explanation behind what causes the numbers. I have no idea what that may be, but it’s a question that can be considered by the writers of Fangraphs and others rather than shot down as in these comments.”

      Yes, I agree, that is his ultimate concern, if you strip out all the irrelevant passages. (Can I be honest without being a jerk in saying it that way? Insert smiley faces, then.)

      And no one shot down any concerns, as demonstrated by this thread itself. All metrics have their biases, they all have their assumptions, and all need to be used based on the question being asked, and the objective being targetted. There is no one right answer, because there are several dozen questions that can be asked.

      Ask the question, I’ll give you the answer. All the other stuff, let’s spare the readers the uselessness of half my posts in this thread, as it means absolutely nothing to them, other than them witnessing a philosophical sparring, of which I can’t fathom anything being more boring to them.

      Vote -1 Vote +1

      • There are people who gave reasons that they believe tRA is better than FIP (and as Baseball Prospectus showed earlier this year, they were correct), but you never addressed it. You only reply to the comments that don’t offer “evidence”.

        If a pitcher pitches in a small ballpark and gives up a fly ball, that’s an undesirable outcome. If a pitcher pitches in a big ballpark and gives up a ground ball (with no one on), that is an undesirable outcome. The face that FIP excludes certain fly balls, but not others, makes it a flawed statistic.

        You say that FIP is best to use because it doesn’t look at the performance of fielders. It however, also excludes the ballpark, and most pitchers pitch half their games at the same ballpark.

        Vote -1 Vote +1

  18. Matt B. says:

    This certainly has the feel of throwing stones. Albeit with vast vocabularies being used, this has lost any ‘baseball’ feel… Can anybody tell me what the actual difference in Win Value would be for example Javy Vazquez with FIP and then with tRA? My guess is not enough to justify this debate! Although its been fun to read to be honest…

    Vote -1 Vote +1

    • It’s about half a win difference, so in other words, not a whole lot, at least for 2008.

      Maybe in some extreme cases you’d see a 2 win difference, but typically you’re looking at much less than that, probably averaging considerably less than a 1 win difference between tRA and FIP based win values.

      Vote -1 Vote +1

  19. Colin Wyers says:

    I’m late to the party, I guess.

    There has not been a study that shows the superiority of tRA to FIP. We’d like to think it’s so – it’s more complex and a little “smarter,” which is nice. Until that time, it’s a Pepsi/Coke arguement. If someone wants to bring data and have an arguement we can do that.

    It’s arguing over table scraps, though – they’re both linear run estimators applied to pitching, which I confess distresses me out of proportion to its impact, but either one is going to distort the impact of pitchers on the high and low end of the scale.

    Vote -1 Vote +1

  20. Mike says:

    ‘You are providing no evidence here, just a summary opinion.’

    That’s a fair point to which I have two responses. Firstly, whether or not sabermetrics is ‘struggling’ to be well received (I’ll accept the criticism of my word choice there) is not something easily assessable. You might suggest this is reason alone for us not to consider the question but surely that there is any question of this at all suggests that sabermetrics is not as well received as it could be. I could quote Keith Law’s or Rob Neyer’s comments surrounding their inductions into BBWAA as evidence of this or comments from a writer from the NY times (I forget his name) quoted in a recent King Kaufman article. Surely we agree there is hostility towards sabermetrics and surely a more tempered, academic-quality approach would help the (as you say) inevitable surge towards near universal use of sabermetrics. To pose this point as a question: would sabermetrics benefit from refining their approach along academic guidelines?

    ‘If you want me to look into something, then ask a question.’

    My point is that snowshoe did raise a question. He may not have done it particularly well (as he apologised for) and he may not have raised it as a direct question, but he addressed concerns with whether or not FIP or tRA were conceptually a better statistic (where conceptual refers to the statistic that is more likely to answer questions about causality; the ultimate concern of statistics). I would argue you’ve violated the principle of charity in argument in asserting that no question or no issue (whatever word you want to use) was raised by snowshoe. However, as you say, you are clearly busy and everyone at this appreciates the time you put in, so you’re more than welcome to leave these questions unanswered or answer later in some manner of your choosing. But to suggest that no question was raised is, I argue, unreasonable.

    ‘And no one shot down any concerns, as demonstrated by this thread itself. ‘

    Feel free to prove me wrong here (and I feel I may be) but I don’t recall any comments in this thread addressing the question of whether FIP or tRA better consider what is causally relevant to successful pitching. Consequently, I would argue this particular concern was not addressed here.

    Lastly, while I think if these conceptual concerns were addressed it could have been of interest to readers, I will agree that these questions are largely philosophical ones and you’re clearly busy, I won’t bother you any further with this argument. Feel free to reply or let this argument wither.

    Though, calling philosophical sparring ‘boring’ seems unfair. I quote Matt.B’s comment as evidence.

    Vote -1 Vote +1

    • TangoTiger says:

      Thank you for framing the post so that we can have it as a discussion.

      “would sabermetrics benefit from refining their approach along academic guidelines?”

      No, I don’t think so. The advancements in sabermetrics benefits far more from a “wisdom of crowd” approach than any other approach I’ve seen. There are two types of peer-review: peer in terms of the researcher’s academic qualifications, and peer in terms of the subject matter experts (SME). The best peer review, for sabermetrics, is mostly SME, with some technical expertise. The alleged rigidness of academia on sabemetrics research is not all it’s cracked up to be. On my blog, we go through alot of these academic papers, and, more often than not, the lack of SME puts a damper on the paper’s conclusions.

      ***

      “But to suggest that no question was raised is, I argue, unreasonable.”

      I don’t think it is unreasonable at all. He approached the issue completely backwards. He gave his final opinion, and did so repeatedly, with lots of adjectives. There was no meat at all to what he said. He could have said the complete opposite (just switch tRA for FIP), and it would have been equally meaningless.

      He used far too many words to say something that could have been said in one line. If he made a one-line assertion, I would have ignored it. By using his sledgehammer to drive a tiny nail required an equally ridiculous response from me.

      ***

      “addressing the question of whether FIP or tRA better consider what is causally relevant to successful pitching. Consequently, I would argue this particular concern was not addressed here.”

      There are many questions addressed in this thread. We addressed alot of those. This particular question was not asked, so of course, it was not addressed.

      To answer your question: I don’t know, but I would say that one barely beats the other.

      ***

      “Though, calling philosophical sparring ‘boring’ seems unfair. I quote Matt.B’s comment as evidence.”

      It is boring in the context of this thread. I would think the readers here come to this site to get information via David’s fantastic interface. And I would think readers come to the blog on this site to get further enlightenment, in data or players.

      Philosophical issues, as snowshoe and I have engaged in them, are boring and irrelevant on this blog. Specific issues within the broader philosophical issues, issues that have relevance to the reader, those are not boring.

      But, I will stand by the position that you can delete everything snowshoe has said, and virtually everything that I have said in this thread, and the reader is no worse off. And indeed, may be better off.

      ***

      My time is no more valuable than you or the reader. When I say my time is limited, I mean that it is limited in terms of the complete waste of it, which I think I have done alot in this thread.

      In terms of the 15 minutes I have spent typing in this thread, the ROI was zero. And the same for the readers reading my b.s.

      Vote -1 Vote +1

      • TangoTiger says:

        And insert more smiley faces, so that one not need infer my mood, which many will surely try to conclude in a negative light.

        Vote -1 Vote +1

      • pounded clown says:

        I believe snowshoe commits the fallacy of petitio principii (begging the question) which occurs when “an attempt is made to evade the burden of proving one of the premises of an argument by basing it on the prior acceptance of the conclusion to be proved.”

        While I enjoy the “wisdom of the crowd ” or SME approach and I think it works best for this, it lacks the convience of an a codified structure that academia provides. I don’t think proofs are in order, but just showing how the formulas for more complicated metrics are dervived might help a sap like me. It’s not a question of validity. At first it was but as i stumbled around the site the last few months i concluded that, just by the site’s upkeep and the data volume, all done out a passion for quantifying baseball, you know what your doing. For me knowing how it ticks makes learning easier. Like how was FIP derived? Is there one book that I can buy, a sabrmetricians bible so to speak? Also it can be daunting when you read about one metric and find a slew of other metrics its based on…for a beginner its frustrating. It would be great to have a “beginners start here section” which is something the aforementioned structure provides albeit with all the tedium and rigidity academia can pile on.

        Vote -1 Vote +1

  21. ? ?? ???-??? ??????? ??????? ??, ?? ?? ???? ??????? ???.

    Vote -1 Vote +1

  22. cymbaliner says:

    Dave Cameron and TangoTiger come off fairly poorly here. Snowshoe and Mike come off as much more open to discussion and generally friendly. Snowshoe’s tone throughout, although critical, was nothing but congenial. Dave and Tango’s tone might be best described as “snarky,” although any choice of adjective is seemingly taken to town on this comment board.

    There’s no point in being a jerk on FanGraphs. There’s plenty of sites that cater to jerks, but a sabermetrics site shouldn’t be one of those places. Dave Cameron’s work on this site stands on its own; he has nothing but my respect and that of countless others.

    That said, Dave / Tango, if you don’t think you’re being a jerk here, then you might want to reassess what being a jerk means. If you don’t care about being a jerk then I have nothing to add and please retract these comments. The thing that bugged me here is that both Dave and Tango seemed oblivious to their own condescending tone, as if they really had no clue that they were coming off as something less than friendly.

    +6 Vote -1 Vote +1

  23. ludger says:

    Can you tell me what is xfip? Whatdoes the x stand for?

    Vote -1 Vote +1

  24. Adam says:

    Couldnt FIP be modified to include a line drive/ groundball metric if you think that sort of thing is necessary? How hard would it be to subtract line drive % from groundball %, or simply include both.

    Vote -1 Vote +1

  25. Adam says:

    and I agree with Cymbaliner, Dave and Tango are most certainly being “snarky”… It seems like they’re thinking, “This guy just doesn’t understand my wonderful statistic, and he is wrong!” rahter than thinking, “Well, how could this stat more accurately reflect the things that the pitcher has control over?”

    I say include ball/strike ratio, not just BB, HBP and K, and find a way to successfully include the line drive/grounder numbers too.

    Vote -1 Vote +1

  26. Adam says:

    Also, I dont think that a walk is 1.5x as bad as a K is good…
    4Ks for a run, but 3ks gets you out of the inning… If a pitcher had a 1/1 K/BB ratio, and walked 3 and K’d 3 every inning (assuming an unrealistic no contact rate) he would never allow a run. And if we assume a league average contact rate and a BABIP against of around .300, then that pitcher would have an ERA around 3 if a ball got put in play every inning (Dont jump all over the math its just a thought experiment).

    Vote -1 Vote +1

    • Matthias says:

      “If a pitcher had a 1/1 K/BB ratio, and walked 3 and K’d 3 every inning (assuming an unrealistic no contact rate) he would never allow a run.”

      That’s a limitation of linear weights metrics, but you addressed the problem yourself, I think. No pitcher records 3 Ks and 3 BBs per inning. Pitchers are generally found in ranges that can be modeled very well by linear regression. I think there have been some conjectures as to how pitchers at the extremes might not be valued as well by linear scaling, but I haven’t seen a full study/article. Extrapolation (going outside the range of the bulk of the data) can always present problems. Perhaps there is a non-linear regression out there somewhere that fits pitchers even better…

      Vote -1 Vote +1

  27. Dylan says:

    Would it make sense to give strikeouts a little more value since if a pitcher is missing that many bats, it would result in a lower BABIP, for which the pitcher is not collecting any bonuses in this formula?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>