FanGraphs Baseball


RSS feed for comments on this post.

  1. Couple of suggestions to possibly remove some noise:

    1) Seems to me like you should be controlling somehow for team performance, even if it’s just a dummy variable.

    2) Also, future analysis might want to consider improvement relative to recent performance (e.g., weight three-year average or something), as anecdotally it seems that voters are more likely to reward a good player having a great year than a great player having an average year.

    Comment by walt526 — December 15, 2010 @ 4:06 pm

  2. About your conclusion: Maybe the heart of it is that voters were better than we think 10+ years ago. I know the big clunkers get thrown around a lot (Bob Welch, etc…), but we probably focus too much on what the voters got wrong, and not enough on the fact that they often got it right.

    Comment by suicide squeeze — December 15, 2010 @ 4:10 pm

  3. As I said in the last thread, non-parametric statistics. %of award vote is not normally distributed and so it is *incorrect* to use correlations.

    Comment by mettle — December 15, 2010 @ 4:14 pm

  4. I concur, this isn’t the appropriate way to be evaluating vote percentage. Non-parametric statistics are preferable when dealing with a variable of a non-normal distribution or with particularly small sample sizes.

    Comment by Hugh — December 15, 2010 @ 5:23 pm

  5. You’re still not eliminating the bias. Players who have high WAR’s or high FIP’s also will have high conventional statistics. You need to run a regression on ERA – FIP per innings pitched or something.

    Comment by vivaelpujols — December 15, 2010 @ 5:52 pm

  6. The problem with the evaluations when using FIP or WAR, is that essentially, the stats are made up. There’s no way to really prove that a player’s WAR is accurate. Or that the statistic is even accurate. For this reason, the voters & writers pay no attention to it. Heck, UZR was an inaccurate stat for the first year the masses paid attention to it. OBP, average, RBI, etc. are easily measurable and easily proven, that’s why people pay attention to them. If someone can find a way to break down FIP and PROVE that it is correct, people might listen.

    Comment by heartbreakid — December 15, 2010 @ 6:26 pm

  7. Sounds like you’re arguing against the utility of WAR and FIP themselves. How do you ‘prove’ that FIP is ‘correct’? I dunno, in what way? ERA is ‘accurate’ in that it does exactly measure Earned Runs / Inning Pitched. Is it ‘accurate’ in terms of ‘pitcher quality’? We already know with a fair degree of confidence that FIP and xFIP are both better measures of pitcher true talent than ERA. What more do you want?

    Comment by NBarnes — December 15, 2010 @ 7:32 pm

  8. For pitchers, I prefer rWar over fWar because the reason Dave chose FIP was ultimately because he felt it necessary to use a context-independent stat since that was also the case for batting.

    My problem is that, especially with the elite pitchers, not all apparent luck is, well, “luck”. Felix, for example, has always had very good control of LOB and LD%, and has consistently dropped his BAPIP and HR/FB, and I don’t think it’s sufficient to just assume that in all cases that FIP or xFIP or whatever is a better indicator of true talent than ERA… mostly, yes, but again, please tell me why we should base the majority of fWAR on something that denies Felix’s apparent ability to “make his own luck”…

    Comment by William — December 15, 2010 @ 8:40 pm

  9. I disagree, I think the voters are getting smarter. The reason the correlation was just as high in the early 00s is because the choices were easier. No one was going to deny a 10+ WAR Barry Bonds the award in 2001-2004, but now the choices are less clear and they are making the right picks. The study is skewed by the fact that the races were much less close in the early part of the decade, making it easier for writers to choose without using much advanced analysis.

    Comment by gnick55 — December 15, 2010 @ 8:43 pm

  10. While I agree that a Spearman’s rho would be better, a pearson’s correlation coefficient will likely yield similar results if the sample is very large (i.e. close to sampling the entire population).

    Comment by Eric Feczko — December 15, 2010 @ 9:51 pm

  11. To me, it is perfectly reasonable that the correlation between WAR and MVP voting is stable; Most Valuable Player can be interpreted as the most valuable player relative to their teammates, as opposed to the league. A player with a WAR 3 standard deviations from the mean of their teammates may get more votes for MVP than a player 2 standard deviations from the mean of their teammates, even if the first player has a lower WAR than the second. It would be interesting to me to see if such a metric correlates better with MVP voting than WAR itself, and if said correlation shows improvement over time or not.

    Another interesting idea would be to see how predictive RBI and WAR are together. Is the variance in MVP voting explained by RBI the same variance explained by WAR? My hunch is no.

    Comment by Eric Feczko — December 15, 2010 @ 10:05 pm

  12. I would also be interested to see something like highest percentage of team’s total WAR vs. MVP vote. There were definitely years where stacked Yankee teams had ideal MVP candidates, but the votes were split based on how deep those teams were.

    As for the guy who was saying that maybe the voters knew what they were doing 10 years ago… um, sure, that was easy with Bonds, but Tejada won in 2002 with the 3rd highest WAR on his own team; Ichiro in 2001 was a full 1.7 WAR below his teammate Brett Boone; in 2000 Giambi won despite having 1.8 fewer RAR than A-Roid did on a Seattle team that finished .5 games behind the A’s; Pudge won in 1999 with a WAR lower than 3 players on other 1st place teams.

    So, I respectfully disagree.

    Comment by dte421 — December 15, 2010 @ 11:20 pm

  13. Like my fellow parrothead above me, I’m not following this. What more do you wish to be proven? In FIP they use HRs given up, Ks, BBs, and many other traditional stats to form a sabermetric stat. It’s not “made up”. The only number out of the norm is the number used to make FIP resemble ERA, so that it’s easier to understand. Like wRC+ is on the 100 scale. We could prove that it is correct, but is it necessary to tell every Tom, Dick, and Harry individually what it is, or should we create a page that goes over a stat like WAR in depth that people can discover for themselves? Like this:
    Or this:

    Comment by My echo and bunnymen — December 15, 2010 @ 11:51 pm

  14. agreed

    Comment by Kevin Yost — December 16, 2010 @ 5:28 am

  15. the problem is the difference between b-ref WAR and fWAR. here on fangraphs, they use the made-up FIP, and you can’t argue that its not made up, they look at the rates things happen at yes, but assign arbitrary values to each thing to make the people they like look better than they actually are ie francisco liriano; whereas on b-ref their WAR results from i believe tRA or tERA which you know, measures what happened other than walks, strikeouts, and home runs because go figure, more happens on a baseball diamond than those things. this would be interesting to see done for bWAR rather than fWAR, i believe you would see higher correlation values.

    Comment by fredsbank — December 16, 2010 @ 12:04 pm

  16. there’s more to winning an MVP than simply posting the most WAR, you could post 15 WAR on a team of floating jackasses but if your team wins 6 games who gives a sh!t. WAR, and team wins might be good to look at, or % of MVP winners who were on a playoff team.

    Comment by fredsbank — December 16, 2010 @ 12:09 pm

  17. If I told you how easy it is to get a job in this recession, you wouldn’t believe me. But the truth is more employers are going online to find people just like you and me who are ready to work at a good job (one that pays good!). The only thing that makes sense is to stop wasting time driving around all day filling out a dozen applications and going from one boring low paying job to another. I found this site that pretty much matches you up with your dream job that is available in your city right now. I have found it very helpful. Go to

    Comment by CarlosM7 — December 17, 2010 @ 9:43 am

  18. The OP is not wrong. How do we know that HR should be *13, BB should be *3 and K should be *2? We could easily weight those occurrences in other ways and come up with another result. So in that result, it is made up. Who’s to say we won’t come up with a better version of FIP/xFIP going forward that more closely approximates true talent.

    Comment by AJS — December 17, 2010 @ 7:03 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *

Close this window.

0.128 Powered by WordPress