FanGraphs Baseball


RSS feed for comments on this post.

  1. I’m surprised you didn’t mention the difficulty of competition. Clearly the Rays and Jays had a more difficult schedule than the Brewers. With the stiffer competition in the AL East, it stands to reason that a very valuable team wouldn’t finish with as many wins as they are “worth.”

    Although I’m a die-hard A’s fan, I can’t explain how WAR overvalued them by as much as they did. They had wild swings in starting pitching, outstanding relief pitching, terrible hitting, and slightly below-average fielding. Is there a way on the site to look at WAR values for all players on a team without going to the player page for each player?

    Comment by dustin — April 4, 2010 @ 3:03 am

  2. Try this, Dustin.

    Comment by sorry your heinous — April 4, 2010 @ 3:07 am

  3. Why bother, David? Boswell actively chooses to pretend that, fifty years ago, the apex of baseball metrics was frozen in time, and anything done since to improve those metrics is heresy. Those who respect his opinion on these matters are believers in the same.

    Comment by Dan — April 4, 2010 @ 9:10 am

  4. Well, I think from time to time it’s worth dispelling blatant misconceptions that the mainstream media is throwing out there about some of the stats on FanGraphs. Whether or not anyone pays attention is another story, but no reason not to try.

    Comment by David Appelman — April 4, 2010 @ 10:11 am

  5. Boswell makes up his own stats. The fact that his runs created stat (Runs Scored + RBI – HR) is worthless and nonsensical and called as such has hurt his feelings.

    Comment by Rob — April 4, 2010 @ 10:31 am

  6. I must be looking in the wrong place, but I see the brewers at $116 too. Where are you getting the $128?

    Comment by James — April 4, 2010 @ 11:03 am

  7. or, rather, $117.

    Comment by James — April 4, 2010 @ 11:06 am

  8. I think many, many people misunderstand how to use the Dollar Values. I see them misused and abused everywhere.

    Comment by bookie — April 4, 2010 @ 11:21 am

  9. Someone help me out: I’m not clear on who those dollar values are calculated. Could AL teams be “worth” (or “cost” or whatever) extra because they have one extra starter-quality hitter (called a DH?). Would that make a difference? That and, of course, the quality of the two leagues?

    Comment by Brian — April 4, 2010 @ 11:55 am

  10. Remember when Boswell was the brain-child behind Total Average?

    Yeah, now he’s written some of the more intellectually dishonest crap ever.
    Check out his infamous 2008 MVP boner of an article

    Comment by Joe R — April 4, 2010 @ 12:42 pm

  11. Where an AL lineup has an edge over an NL, NL pitchers make up the difference.

    With obvious exceptions/tweaks, but still.

    Comment by Joe R — April 4, 2010 @ 12:43 pm

  12. Wait, wait, wait…
    Boswell actually PENALIZES players for hitting HR?
    And somehow, in 2009, despite MLB players scoring 22,419 runs, they “created” 38,741. Great stat, Bos

    Comment by Joe R — April 4, 2010 @ 12:50 pm

  13. You need to add in pitching to that too. That’s just for position players.

    This was just a simple oversight on his part, because he got the other teams correct, but figured it was worth pointing out anyway.

    Comment by David Appelman — April 4, 2010 @ 1:11 pm

  14. The “sic” part cracked me up, because I got a letter to the editor accepted (but not published because I didn’t see their email asking for me to confirm that it could be published) requesting that I could use one of his articles in my English classes as an example of how many, many usage errors undermines authorial integrity (not that he has much, but anyway…)

    Comment by William — April 4, 2010 @ 2:34 pm

  15. Joe, it not a penalty. Since a HR counts as both a run and a RBI, it is subtracted so that one HR only equals one run produced. And that stat shouldn’t equal runs scored, since it includes both runs and RBI.

    Comment by Zach — April 4, 2010 @ 2:59 pm

  16. Actually it is a penalty, since it credits a run twice (once on each side of the R/RBI ledger) except when the run is scored on a home run. That’s penalizing a home run.

    Comment by Kevin S. — April 4, 2010 @ 3:10 pm

  17. C’mon guys… As Michael Scott would say, “You don’t call a retarded person retarded; you call your friend who is acting retarded, retarded.” Thomas Boswell is nobody’s friend.

    Comment by Pat — April 5, 2010 @ 12:44 am

  18. Wow,

    I remember reading his stuff back when I was in college in the late 80’s and he was considered “cutting edge” back then. What happened??

    That Pujols-Howard article is just plain embarrassing. I love how he’s trying to make Howard point by his RBI count yet totally discounting the .100 point swing in BA.

    I love old-time writers like this who say that “stats” like Vorp or War or Ops are meaningless and then make their case by using RBI and HR, as if RBI and HR aren’t stats. Give me a break.

    And I don’t even want to bring up the K-Rod for MVP point.

    Comment by John Q — April 5, 2010 @ 6:51 am

  19. Now, I love WAR, it’s a great stat. And 0.86 correlation is good with actual wins. (Although, as a systems engineer, for predictive purposes I would like to see it a bit higher.)

    However, it is worth mentioning that there at 30 major league baseball teams. If you have SIX outliers (Rays, Brewers, A’s Jays, Strohs and Pads) out of 30 data points, that’s not a great curve fit. I would venture to say that, in general, if 20 percent of your data points are outliers, there’s a better fit to be found.

    However, it isn’t clear from the discussion that that’s the case: where do I go to see the graphs?

    Comment by AnyEdge — April 5, 2010 @ 8:55 am

  20. Oh c’mon, this is clearly performance art. Or a sociological experiment.

    Comment by Cosmo — April 5, 2010 @ 10:20 am

  21. “And 0.86 correlation is good with actual wins. (Although, as a systems engineer, for predictive purposes I would like to see it a bit higher.)”

    It’s all relative, though. I could give you a correlation of 1.00 on a game level simply using runs scored and runs allowed, that doesn’t make it informative. If WAR is explaining 86% of a teams wins, and you don’t have a model to point to that’s clearly better, I don’t see the objection. So there’s more random error in wins in a sample size of one season than you like, so what?

    ” If you have SIX outliers (Rays, Brewers, A’s Jays, Strohs and Pads) out of 30 data points, that’s not a great curve fit. I would venture to say that, in general, if 20 percent of your data points are outliers, there’s a better fit to be found.”

    Again, it might just be because of random error. WAR assumes context neutral – it’s purposely leaving things out, because knowing those things is not useful. Your R^2 might go down by excluding variables, but that’s fine, because the data you’re leaving out (things like “clutch performance”) that explain past wins actually make future predictions worse, since the added error they bring is greater than any additional predictive power they may have. Random error happens, it doesn’t make a model bad.

    I wonder if guys like Boswell do the things they do because they know it generates interest/hits, or they really are idiots who feel some sort of need to protect the way things were done in the past from superior ways for some odd reason….?

    Comment by B — April 5, 2010 @ 1:36 pm

  22. Seems more like a scatological experiment to me.

    Comment by B N — April 5, 2010 @ 5:06 pm

  23. Who knows?
    Got butthurt because better analysts stole his fire, got older and more senile, maybe he’s ironically trying to sound stupid to increase his page hits.

    I just like how people are so mad that you can’t invent value anymore.

    (random note: I was going to use baseballprospectus and the 2001 AL MVP vote as an example of invented value, but I think BP’s math is broken at the moment: apparently Orlando Cabrera was the best major in major league baseball in 2001 with a 58.0 WARP.).

    Comment by Joe R — April 5, 2010 @ 6:12 pm

  24. Also being an engineer (but admittedly not having a phenomenal stats background as probably some, or most, here have) – in many other areas a correlation of .86 would not be considered accurate or high enough to be useful for predictive purposes.

    And before I get flamed at – yes it might be the best available model, but let’s say the best predictive SABRmetric model instead had a 0.5 correlation factor…. would you still use it and rely on it simply because you have nothing better? WOuld you still be calling points that don’t fit outliers…

    …so the question is where do you draw the line, and when do outliers stop really being outliers and are simply part of normal variation in a marginal correlation? (And I;m not saying that is the case here, but before dismissing this guy as an idiot, has anyone done this study?)

    I don’t think .86 should be dismissed, but it would be nice if people would at least consider the other side (even if this other author is an ‘idiot’). It would be nice to use the statistics to determine how many outliers are needed before they should no longer be considered outliers.

    Comment by Hank — April 5, 2010 @ 6:33 pm

  25. Well, first I’ll just say that if you want to get technical, calling them “outliers” was probably a poor choice of words. Outliers are generally points that don’t fall within the expected distribution (at least, not if the distribution is roughly bell shaped as we’d expect when talking about team wins v WAR), whereas these observations probably do fall within the expected distribution and are likely exactly what we’d expect. Also, R^2 isn’t a very good metric for “predictive” purposes. In engineering .86 might not be considered very high, but at the same time, in many fields of economics, for example, where you’re studying real life data, .86 would be much, much higher than anything you’d expect to find.

    As for your last concern, I’m not really sure what you’re getting at. Decisions should be made on the best information available, and FG writers, guys at The Book, THT, etc do put a lot of effort into figuring out the best information… the question is about “do you have better a better model?”, and if not, you use the WAR model….

    Comment by B — April 6, 2010 @ 11:09 am

  26. Please do not give me a hard time about using the term outliers (and correct me on ‘technical’ grounds), when the author uses that exact term in the article and I chose to use the same term for consistency – that just tells me right away you are looking to argue or knock me down a peg in people’s eyes. If you feel the need to correct anyone – you should correct David for his incorrect technical use of the word outlier in his article, frankly it seems completely unnecessary and over the top as it is pretty clear what he (and subsequently I) meant by the term.

    My question is not is there a better model…. if the model was terrible, would you use it simply because there is nothing better? That is absurd logic in my view. I appreciate the effort that goes into WAR, but frankly effort is not a useful way of determining accuracy or usefulness. I think trying to understand outliers and whether there are more outliers than expected is the only way you will make the model better. (As opposed to merely saying other models like Pythag have the same issue)

    And I reject the economic model example as you fail to discuss the predictive accuracy of those models – just because they may use lower correlation coefficients doesn’t mean the models are good for predictive purposes. If you had a model that had a correlation of .65 for stock prices would you rely on that model to buy stocks merely because a better model doesn’t exist?

    Getting defensive and saying well do you have anything better is really not a constructive way of improving models – trying to understand the issues with the model and how often they do “fail” vs how often you would expect them to fail (for lack of a better term, I’m referring to the number of ‘outliers’) should be looked at constructively, not defensively…. The best way to discredit folks like Boswell is to put his ‘issues’ in context – is what he’s pointing out ‘normal’, a one year anomaly, related to one specific component of the model failing in one or two very specific circumstance, etc?

    Comment by Hank — April 7, 2010 @ 4:32 am

  27. I think you misinterpretted a lot of my comment. I wasn’t criticizing you for using the term “outlier” – I fully understood you were just being consistent with the author, so as you said, it was probably not the best choice of words on David’s part.

    As for the rest of it, I understand your concerns, and I’m not saying they’re not valid….but there are a lot of really smart people who understand what they’re doing who raise those types of concerns, and it’s hard to summarize all their research in a comment, but if you look around, you’ll find a lot of these things are discussed and addressed at Fangraphs, at The Book, at THT, BtB, and all the other saber sites out there. They’ll probably figure out some improvements to make in the future, but what they have right now is pretty good – whether you’re testing how FIP/wOBA and then WAR perform, I think you’ll end up agreeing that they’re quality models for what they’re designed to do – put a market value on players based on performance. Seeing a strong correlation between actual wins and WAR is one piece of evidence in that puzzle that they’re doings things well. So I guess I’ll apologize for not putting in the effort to give you some links to supplement that?

    As for Boswell himself, for me personally, it just gets tiresome seeing people completely misinterpret stats and make ridiculous claims that just don’t make sense….there are just a lot of people out there to correct. For instance, you look at a comment like this:

    “For example, they value the whole 84-win Rays team in ‘09 at a salry of $229M, but they think the 80-win Brewers, just four less wins, are “worth” only $116M using their stat methods.”

    It seems he doesn’t understand a couple of things – WAR is free agent dollars, which clearly doesn’t apply to all the players on a team, and actual wins and WAR wins are different – since WAR ignores context aspects that are generally thought of as uncontrollable in the future (since a team should be paying for future performance), it leaves out some important factors that contribute to actual wins on purpose, because it makes sense to do so. There’s research out there on these aspects (the biggest one being “context”) to support why they’re excluded from the model – something like clutch hitting, which affects wins, isn’t a repeatable skill in any meaningful way going forward.

    Anyways, I don’t think I took a defensive attitude, it’s just hard to sum up so much that’s been done in a comment, and I’m more than willing to have a reasonable discussion on the issue if there are any concerns you want to flesh out….

    Comment by B — April 7, 2010 @ 11:06 am

  28. Based on a comment from AnyEdge above I plotted the ’09 data (win totals vs WAR dollars) – there were 9 points that were noticeably off the curve… with all points the R was ~.84 (which is inline with the .86 that David quoted as the correlation for everything since ’02). Without the 9 ‘outliers’ points (I’ll refer to them as outliers purely for simplicity), the R of the remaining 21 points was ~.95 (extremely good/tight) – yeah I know statistically this is not the best thing to do. However, despite a “good” R of .84 the outliers were fairly obvious and was 30% of the data set (9/30 points).

    Randomly (or maybe not randomly?), 6 of the outliers were the bottom win total teams and 8 of the 9 outliers were in the bottom 11 in team win totals. (The only outlier with a relatively high win total was the Rays)..

    Also maybe of interest the 9 “outliers” had a very high correlation with each other (over .95)…. the ’09 plot almost looked like 2 separate sets of data. At the risk of earning the wrath of others, 9 points out of 30 seems high – I don’t know if it’s just a 2009 thing, but might be worth looking into? I don’t have as much history with Boswell as apparently many have on this site, but at least in terms of the ’09 data it does not appear he’s merely ‘cherrypicking’ outliers if 30% of the data set might fall into that category.

    Comment by Joe — April 7, 2010 @ 5:08 pm

  29. Actually run created is totally useless as a predictive metric for future results. However, it is not a bad measurement of past outcome. The not what you hit, but when you hit it school of thought. Yes the best way to score and drive in runs over the long term is to excell at all of the components that are used by more sophisticated metrics. Players who over produce their core performance can not maintain this over a number of years and will regress to their mean. This does not change the fact that they had a great year. A player that created 200 runs in 150 games played was most likely more valuable (offensively) in that year than one who produced 180 in 150 games. If the the more sophisticated metrics favor the latter player and they are the same age, I would want the latter player going forward. However, if you told me at the start of the season I could have the 200 or 180 runs produce, regardless of the player’s name which would you take. The largest flaw is not factoring in the cost of outs and the value of bases added that indirectly contribute to runs. As simple measurements go, it does cut through some noise. It emphaticly does not penalize home runs. I have been having that arguement with haters of runs created for at least 20 years. Think it through in any given plate appearance, hitting a home run “creates” the most runs of any outcome, and is not dependent on the hitters to follow you.

    Comment by Frank Schuh — April 9, 2010 @ 10:24 am

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Close this window.

0.250 Powered by WordPress