MVP and Cy Young Voting Revisited

The discussion from my last post about MVP and Cy Young voting, there were a few good suggestions to improve the analysis, so I decided to go back and revisit the data. I’m going to give the baseball writers the benefit of the doubt, try some different methodology, and see if I can find any evidence that they are doing a better job at filling out their awards ballots today than they were ten years ago.

One piece of clarification from last post: the correlations are based on the percent of points a player received out of the maximum possible. This adjusts for ballot differences between the leagues and years.

I also think it is important to restate the prism through which these graphs should be viewed. The level of correlation is not the target of the analysis, but rather the trends of the series. If the writers truly embraced the statistics revolution that took place this decade, we should see some positive trend in the correlations between award votes and advanced statistics.

Looking Only At The Top Five

It is possible that voters are making statistics-driven decisions at the top of their ballots, but then reverting to qualitative or geographic factors at the bottom. The players receiving a vote or two could be throwing the correlations off. By looking at only the top 5, we may see different results.

By reducing the sample size, the graph is much more erratic and volatile than it was previously. Through the volatility, there is still no evidence of a positive trend for either award.

It was very surprising to see a strong negative correlation in 2000. I double-checked the numbers, and it’s no mistake, it is just a fallacy of only looking at five players. That led me to the next iteration of this analysis:

Looking At All Qualified Players

Instead of looking at just the top five players, the correlations may be improved by including all qualified players of each season into the correlations. By adding data points of players who put up mediocre statistics and, in turn, received no awards votes, it adds more variance to the data set and might help hone in on the relationship between votes and WAR that I’m looking for.

This change was a definite improvement in methodology. The correlations were much more stable, not fluctuating wildly year to year. Also, there is a slight positive trend for the Cy Young over the decade, particularly over the past five years. This may be the only evidence that I have found on any level that the voters are improving, but it is hardly significant or conclusive.

Using Different Statistics

One assumption that has been made up until now is that WAR is the best statistic to judge the voters’ assimilation of advanced statistics into their balloting. I again want to make it clear that this analysis is not making the argument that WAR is the only statistic that writers should be using. It one of many advanced statistics that could be utilized, but it is particularly convenient for this purpose as it is both a hitting and pitching stat.

By breaking hitters and pitchers up, we can examine some position-specific statistics that may shed better light on the situation. I again used all qualified players in the correlations, but there is one major change to the pitcher data set: I only included starting pitchers. This was done in an effort to look at the pitchers on an apples-to-apples basis. For example, a starter with a 3.00 ERA is a Cy Young candidate, but a reliever with the same ERA probably isn’t.

For pitchers, I examined ERA and FIP (xFIP is not available for the whole decade). Given, these are highly collinear statistics, but they are no more similar in 2000 than they are in 2010. And the slight differences in these statistics are the exactly what separate conventional statistics from sabermetrics.

For this graph, remember that the further negative, the more significant the correlation, because smaller ERAs and FIPs signify better pitchers.

The results are truly counterintuitive. When looking at linear trend lines of these series, it shows that the correlation with FIP grows slightly weaker over the decade while the correlation with ERA grows slightly stronger. The writers were better at pairing Cy Young votes and FIP in the pre-Moneyball era than they are today.

This conclusion is the same when looking at hitter data.

When comparing the correlations between MVP votes and wRC+ and RBI, there is simply little or no change between the turn of the century and today. In fact, there is again a slight negative trend for wRC+ over the time span.


In revisiting this analysis, I concur with my previous conclusion: writers are no better at picking MVPs and Cy Youngs today than they were ten years ago. The statistics revolution has, no doubt, changed the landscape of baseball. However, when it comes to filling out award ballots the baseball writers have yet to truly embrace advanced statistics.

Print This Post

Jesse has been writing for FanGraphs since 2010. He is the director of Consumer Insights at GroupM Next, the innovation unit of GroupM, the world’s largest global media investment management operation. Follow him on Twitter @jesseberger.

18 Responses to “MVP and Cy Young Voting Revisited”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. walt526 says:

    Couple of suggestions to possibly remove some noise:

    1) Seems to me like you should be controlling somehow for team performance, even if it’s just a dummy variable.

    2) Also, future analysis might want to consider improvement relative to recent performance (e.g., weight three-year average or something), as anecdotally it seems that voters are more likely to reward a good player having a great year than a great player having an average year.

    Vote -1 Vote +1

  2. suicide squeeze says:

    About your conclusion: Maybe the heart of it is that voters were better than we think 10+ years ago. I know the big clunkers get thrown around a lot (Bob Welch, etc…), but we probably focus too much on what the voters got wrong, and not enough on the fact that they often got it right.

    Vote -1 Vote +1

  3. mettle says:

    As I said in the last thread, non-parametric statistics. %of award vote is not normally distributed and so it is *incorrect* to use correlations.

    Vote -1 Vote +1

    • Hugh says:

      I concur, this isn’t the appropriate way to be evaluating vote percentage. Non-parametric statistics are preferable when dealing with a variable of a non-normal distribution or with particularly small sample sizes.

      Vote -1 Vote +1

      • Eric Feczko says:

        While I agree that a Spearman’s rho would be better, a pearson’s correlation coefficient will likely yield similar results if the sample is very large (i.e. close to sampling the entire population).

        Vote -1 Vote +1

  4. vivaelpujols says:

    You’re still not eliminating the bias. Players who have high WAR’s or high FIP’s also will have high conventional statistics. You need to run a regression on ERA – FIP per innings pitched or something.

    Vote -1 Vote +1

  5. heartbreakid says:

    The problem with the evaluations when using FIP or WAR, is that essentially, the stats are made up. There’s no way to really prove that a player’s WAR is accurate. Or that the statistic is even accurate. For this reason, the voters & writers pay no attention to it. Heck, UZR was an inaccurate stat for the first year the masses paid attention to it. OBP, average, RBI, etc. are easily measurable and easily proven, that’s why people pay attention to them. If someone can find a way to break down FIP and PROVE that it is correct, people might listen.

    Vote -1 Vote +1

    • NBarnes says:

      Sounds like you’re arguing against the utility of WAR and FIP themselves. How do you ‘prove’ that FIP is ‘correct’? I dunno, in what way? ERA is ‘accurate’ in that it does exactly measure Earned Runs / Inning Pitched. Is it ‘accurate’ in terms of ‘pitcher quality’? We already know with a fair degree of confidence that FIP and xFIP are both better measures of pitcher true talent than ERA. What more do you want?

      Vote -1 Vote +1

    • My echo and bunnymen says:

      Like my fellow parrothead above me, I’m not following this. What more do you wish to be proven? In FIP they use HRs given up, Ks, BBs, and many other traditional stats to form a sabermetric stat. It’s not “made up”. The only number out of the norm is the number used to make FIP resemble ERA, so that it’s easier to understand. Like wRC+ is on the 100 scale. We could prove that it is correct, but is it necessary to tell every Tom, Dick, and Harry individually what it is, or should we create a page that goes over a stat like WAR in depth that people can discover for themselves? Like this:
      Or this:

      Vote -1 Vote +1

    • fredsbank says:

      the problem is the difference between b-ref WAR and fWAR. here on fangraphs, they use the made-up FIP, and you can’t argue that its not made up, they look at the rates things happen at yes, but assign arbitrary values to each thing to make the people they like look better than they actually are ie francisco liriano; whereas on b-ref their WAR results from i believe tRA or tERA which you know, measures what happened other than walks, strikeouts, and home runs because go figure, more happens on a baseball diamond than those things. this would be interesting to see done for bWAR rather than fWAR, i believe you would see higher correlation values.

      Vote -1 Vote +1

    • AJS says:

      The OP is not wrong. How do we know that HR should be *13, BB should be *3 and K should be *2? We could easily weight those occurrences in other ways and come up with another result. So in that result, it is made up. Who’s to say we won’t come up with a better version of FIP/xFIP going forward that more closely approximates true talent.

      Vote -1 Vote +1

  6. William says:

    For pitchers, I prefer rWar over fWar because the reason Dave chose FIP was ultimately because he felt it necessary to use a context-independent stat since that was also the case for batting.

    My problem is that, especially with the elite pitchers, not all apparent luck is, well, “luck”. Felix, for example, has always had very good control of LOB and LD%, and has consistently dropped his BAPIP and HR/FB, and I don’t think it’s sufficient to just assume that in all cases that FIP or xFIP or whatever is a better indicator of true talent than ERA… mostly, yes, but again, please tell me why we should base the majority of fWAR on something that denies Felix’s apparent ability to “make his own luck”…

    Vote -1 Vote +1

  7. gnick55 says:

    I disagree, I think the voters are getting smarter. The reason the correlation was just as high in the early 00s is because the choices were easier. No one was going to deny a 10+ WAR Barry Bonds the award in 2001-2004, but now the choices are less clear and they are making the right picks. The study is skewed by the fact that the races were much less close in the early part of the decade, making it easier for writers to choose without using much advanced analysis.

    Vote -1 Vote +1

  8. Eric Feczko says:

    To me, it is perfectly reasonable that the correlation between WAR and MVP voting is stable; Most Valuable Player can be interpreted as the most valuable player relative to their teammates, as opposed to the league. A player with a WAR 3 standard deviations from the mean of their teammates may get more votes for MVP than a player 2 standard deviations from the mean of their teammates, even if the first player has a lower WAR than the second. It would be interesting to me to see if such a metric correlates better with MVP voting than WAR itself, and if said correlation shows improvement over time or not.

    Another interesting idea would be to see how predictive RBI and WAR are together. Is the variance in MVP voting explained by RBI the same variance explained by WAR? My hunch is no.

    Vote -1 Vote +1

  9. dte421 says:

    I would also be interested to see something like highest percentage of team’s total WAR vs. MVP vote. There were definitely years where stacked Yankee teams had ideal MVP candidates, but the votes were split based on how deep those teams were.

    As for the guy who was saying that maybe the voters knew what they were doing 10 years ago… um, sure, that was easy with Bonds, but Tejada won in 2002 with the 3rd highest WAR on his own team; Ichiro in 2001 was a full 1.7 WAR below his teammate Brett Boone; in 2000 Giambi won despite having 1.8 fewer RAR than A-Roid did on a Seattle team that finished .5 games behind the A’s; Pudge won in 1999 with a WAR lower than 3 players on other 1st place teams.

    So, I respectfully disagree.

    Vote -1 Vote +1

    • fredsbank says:

      there’s more to winning an MVP than simply posting the most WAR, you could post 15 WAR on a team of floating jackasses but if your team wins 6 games who gives a sh!t. WAR, and team wins might be good to look at, or % of MVP winners who were on a playoff team.

      Vote -1 Vote +1

  10. CarlosM7 says:

    If I told you how easy it is to get a job in this recession, you wouldn’t believe me. But the truth is more employers are going online to find people just like you and me who are ready to work at a good job (one that pays good!). The only thing that makes sense is to stop wasting time driving around all day filling out a dozen applications and going from one boring low paying job to another. I found this site that pretty much matches you up with your dream job that is available in your city right now. I have found it very helpful. Go to

    Vote -1 Vote +1