FanGraphs Power Rankings — Crowdsourcing Changes

When we created the FanGraphs Power Rankings this year, we didn’t know how they would play out. At the outset, there was scorn over the Indians’ ranking. As the season wore on, that changed to scorn about the Rockies’ ranking. By the end of the season though, things seemed to work out pretty well. Eight of the top nine teams — with the Red Sox being the one exception — reached the postseason. That in and of itself is not a justification for the Rankings mind you, but it seemed to show that we were on the right track.

Being on the right track doesn’t necessarily mean that things can’t improve, however. So what I’ve done is taking a smattering of the more constructively critical comments from this season and compiled them — unedited — below, broken up by date of article. After you read the selected comments, there are a few polls for your voting pleasure.

Before you read the comments though, I ask that you take a minute and refresh your memory on the Rankings’ methodology, which you can find here.

From May 2nd:

Terence says:
May 2, 2011 at 3:44 pm
I guess in some measure of support for the The Rhino, I like the idea of these rankings, but the methodology is very flawed. I like balancing previous success with realistic future expectations, but FAN% is a very poor tool for these purposes. In the case of the Twins, which The Rhino was quick to observe as overrated, the .537 FAN% reflects a team that doesn’t even exist on the feild anymore. The Astros FAN% at .370 (.050 lower than any other team in the feild) is absurd, and they will be #30 by this methodology for the entire season. A better system would use ZiPS(R)%, Marcel%, Tango% to balance FAN% (if not completely replace it).

VivaAyala says:
May 2, 2011 at 4:08 pm
I really like this idea overall, but I agree with others that Fan% is probably not the best way to balance out WAR%. Perhaps Zips (ROS) or some equivalent would be preferable.

Nik says:
May 2, 2011 at 5:11 pm
PS, some credit should also be given for overall record and especially the team’s performance during the most recent time period. These current rankings wont have much credibility if in the middle of August a team that has accumulated a bunch of WAR but then gets hit by some major injuries shouldn’t be floating near the top of the rankings for weeks on end.

glassSheets says:
May 2, 2011 at 6:02 pm
To help alleviate some of the concerns people have about the weighting… A square root rule could be used to assign the weighting, or credibility, of the games already played. For example a team who has played 26 games would receive 40% from WAR (=sqrt(26/162)) and 60% from preseason fan rankings. This accelerates the weight given to performance in the season without going bonkers. Love the idea of this. I actually like the current weighting structure more, just trying to throw out some alternatives to help others in the comment section who want the Indians higher.

Chris says:
May 3, 2011 at 9:24 pm
Very cool idea for rankings, certainly preferable to the standard power ranking, wherein they list the teams in order of records, with few differences.

Any thought about some kind of nonlinear function for the weights? I feel like a linear function underweights the WAR% – for example, I feel like halfway through the year, WAR% should be something closer to ~75% than 50%.

From May 9th:

odditie says:
May 9, 2011 at 4:35 pm
It isn’t that the fan standings should be replaced with ZiPS, it is that what we thought 2 months ago is not a proper way of evaluating how good a team is.
It’d be like combining the November election results with the Gallup polls so we didn’t overrate one of the cadidates based on how they were doing today. Things change over time.
Baseball has a large portion of luck involved in the results and I’d hope that fangraphs would find a way to rate a team based both on their actual results and their expected results by stripping luck out of their performance instead of using some irrelivent rankings from before the season started to help keep teams grounded.
There also MUST be some sort of evaluation on how a teams roster is made up. The Yankees are in a better situation then their performance if they trade for Felix Hernandez. It cannot be power rankings without thought or it is no better than the rest.

From May 16th:

futant462 says:
May 16, 2011 at 8:32 pm
To be clear, I’m generally a fan of this ranking method. I like how it adjusts the %’s throughout the season, and doesn’t overreact. However I do feel like there was something missing, and a comment above illuminated that I think. The missing element is “banked wins”. I know I know, WAR% covers that right?
Almost, but if WAR % is out of whack with actual win total to date, that team IS more likely to make the playoffs than these rankings give them credit for. Something like: (% of Season Played*Current Real Winning %) + (% Remaining * [Existing Calculation]). That would weight games that have already happened appropriately and use your existing calculation to estimate winning % going forwards, thus arriving(and eventually converging) to an estimated end of year W-L record.
Perhaps I’m wrong, but I believe this calculation would better embody the spirit of this exercise.

From May 30th:

John says:
May 31, 2011 at 3:09 am
Sorry for the long post but I have one suggestion that I think would improve the rankings and eliminate many complaints:
Give WAR% a weight of: (% of season played)*2 instead of
(% of season played)*1 (giving WAR% double the weight it has now)
This is very similar to how Football Outsider’s phase out the preseason projections for their DVOA rankings. This method completely eliminates the projections by the halfway point of the season. This makes more sense to me because the projections don’t reflect injuries (Mauer/Twins), and teams are often very different than their projection by this point. FO and FG are both great, but I think FO’s method makes more sense.
Giving WAR% more weight would move the Indians up and the Twins down, but not too far. I agree that it doesn’t make much sense to have them at 20 and 21, even if it’s only May 31st. Using WAR % *2 would hopefully put the Indians in the 10-15 range, and the Twins in the 25-30 range.
I would love to see the results with this tweak, and I think a lot of FG readers would agree that this method makes more sense. I think it would make the rankings more credible and the best out there much like FO’s DVOA rankings. Anyone else agree? Thanks

From June 13th:

VivaAyala says:
June 13, 2011 at 4:52 pm
Because these rankings are meant to dig deeper than win-loss record. We also got outscored while splitting the series against the Tigres.
The fairly pessimistic preseason expectations for the team are affecting the ranking, but also our current team fWAR is pretty low. However, our pythag win% is right around .500, with rWAR more or less agreeing (we have 14.3 rWAR).
With the current UZR samples being so unreliable, I wonder if they could be usefully regressed to zero based on the amount of season played (as MGL suggests in his UZR primer). Or, perhaps pythag could be incorporated into these rankings as an additional factor.

From June 27th:

matt w says:
June 28, 2011 at 1:38 pm
Here’s an attempt at a constructive criticism: Even if Cleveland isn’t the second-best team in the division, and Minnesota isn’t the first, there’s something wrong with their rankings. At this point in the season, should the system really rank the team with the worst WAR% ahead of one whose WAR% is over .500?
I’d like to hear more discussion of the methodology, how it should be arrived at, and how much preseason performance should be weighted and whether more recent performance should be weighted more as well; but I think the current formula pretty clearly overweights the preseason poll.
(For the record, I don’t have a dog in this fight; I’m a Pirates fan, and I’m not going to complain about a formula that ranks them 28th instead of 25th.)

From July 4th:

matt w says:
July 4, 2011 at 7:37 pm
I’m not going to complain about any specific rankings anymore (and I’m definitely not going to argue that my particular team hasn’t been somewhat lucky), but aren’t the WAR winning percentages rather Lake Wobegonish? I haven’t run the math, but I’m pretty sure that they average out to something well above .500.

From July 11th:

Thomas says:
July 11, 2011 at 10:35 pm
I did some power rankings by ranking teams on WAR%, winning percentage against teams above .500, an average of SOS and expected winning percentage, and winning percentage and averaged them all out in one stat, i gave each of the stat’s weights, like since WAR% is the most important, it’s worth 40%, average of SOS and expected winning percentage is worth 30%, winning percentage against teams >.500 is 20%, and winning percentage is 10%, these are the rankings

From August 1st:

Viliphied says:
August 1, 2011 at 5:22 pm
@Paul that may be the way it is, but it’s a little asinine, don’t you think? If these rankings are meant to measure ONLY performance so far this season, then fan% should be 0 from week 1. If they’re meant to predict future performance, then the contributions of a player acquired through trade should come with him. Subtract out the contributions of whoever that player is replacing if you must, but the fact of the matter is Beltran will spend the rest of the year playing RF in SF, Pence will spend the rest of the year in PHI, and Jiminez won’t be pitching for COL in another game this year.
I can understand why you wouldn’t necessarily want to just go adding and subtracting WAR willy nilly, but unless you want the rankings to be a straight WAR leaderboard, you need to have some subjective element, and pre-season rankings are not nearly good enough.

From August 8th:

Jay Gloab says:
August 9, 2011 at 9:26 pm
Elsewhere I suggested the idea of a fully objective power ranking based on a weighted average, with each game weighted by its ordinal number during the season, i.e. the first game of the season is given a weight of 1, the second is weighted 2, and so forth to whatever game the team is on.
I would have done it simply by winning percentage, but WAR% works too for a kind of “second order” power ranking.

From September 26th:

MK says:
September 26, 2011 at 11:06 pm
One thing that has bothered me about the power rankings is that it does not reflect the best team NOW. By using WAR, which is closey tied to wins, we shoud not be surprised that the teams with the most WAR get into the playoffs. A thought for next year, what if it were the cumuative WAR of players on the active roster, instead of accumulated WAR by the team. So when the Phils trade for Hunter Pence they get his WAR in there power rankings. Or when Cay Buckholtz goes down for the year, the Red Sox lose his WAR. Obviously, there are flaws with the system as well, but I think it would be an interesting take (at least for the second half of the season).

Okay, now that we’re done rehashing why everyone thinks I’m an idiot, we’ll finish with some polls. I didn’t include a poll for FAN% versus an objective system because I already know the answer to that one. I realize it wasn’t a popular choice, and we will look at other options for regressing the in-season standings next season.

Aside from that, we still want your input, so let’s get to the polls. I tried to make these as comprehensive as I could, but by all means please elaborate in the comments below on these questions — particularly on the “‘X’ days” in the second question — and anything else you would like to see changed (or not changed, I suppose). Finally, I would like to thank you for taking the time to read Power Rankings this year, and for caring enough to make suggestions.

Print This Post

Paul Swydan is the managing editor of The Hardball Times, a writer and editor for FanGraphs and a writer for the Boston Globe. He has also written extensively for ESPN MLB Insider. Follow him on Twitter @Swydan.

17 Responses to “FanGraphs Power Rankings — Crowdsourcing Changes”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Terence says:

    In retrospect, the methodology is not “very flawed”, only slightly flawed. Thanks for sticking with this project all year, and the humility and ability to make it better.

    Vote -1 Vote +1

    • Yirmiyahu says:

      Indeed. Most of the criticism came early in the season, and the stubbornness of the rankings proved right in the end.

      As of May 23rd, Cleveland had the best record in baseball by a large margin, but was ranked #18 by the Power Rankings. In response to posters complaining about Cleveland’s positioning, I commented, “I’ll betcha, from here on out (i.e., rest of season), that Cleveland has closer to the 18th best record in baseball than the #1 record in baseball.” Three commenters accepted that “bet”.

      Turns out, over the rest of the season, Cleveland went 50-67, good for the 23rd best record in baseball over that stretch.

      Vote -1 Vote +1

  2. Husker says:

    Change the name. “Power Rankings” is so Sports Illustrated.

    +9 Vote -1 Vote +1

    • glassSheets says:

      Many of the criticisms came from differing interpretations.

      It would help if the “Power Rankings” had a stated goal.
      1) Predict entire season wins
      2) Predict rest of season wins
      3) Predict entire season quality of team (so a BAL gets a boost for playing the rest of the AL East, Cardinals get dinged for playing the NL Central and the Royals, etc)
      4) Predict rest of season quality (like in #3)

      It would also help if the goal had a stated measurement
      A) Squared error
      B) Absolute error
      C) Try to split the over/under ranking at 50/50 regardless of size (like a default on a mortgage model)

      Pick a goal, do some science to it, and make an objective choice. You’ll have a better response for critics than “well a plurality prefered the 50% weighting to occur at Game XX”.

      Vote -1 Vote +1

  3. MikeS says:

    It doesn’t really matter. You can not create a system that will totally avoid some knucklehead complaining that you are hating on their team. Just like you can not convince that same knucklehead that what some guy on the internet thinks of his team has nothing to do with whether or not they actually make the playoffs.

    Vote -1 Vote +1

    • kylemcg says:

      That’s a fallacy. Even if people still will complain (and nobody said otherwise) you can still have a better justification for your decisions.

      A better system is possible, and I think a lot of the criticism above was legitimate and came from a thoughtful place. Not all knuckleheads.

      Vote -1 Vote +1

    • jross says:

      If you build it, they will come

      Vote -1 Vote +1

  4. jcxy says:

    first, i think it’s great that you’re taking feedback on this, and have clearly thought about this throughout the year. i think this is a great component to the fangraphs site and would be sad if it went away.

    IMO, i think the fan% does what it’s supposed to do for certain teams–yankees, phillies, red sox etc–teams where payroll acts as a normalizing factor through july 31st. however, is making sure those teams are “correct” in the rankins worth the cost of over/undershooting colorado or cleveland?

    …which is why i hestitate to vote for a previous 2-week sample to be included as a weight. “banked wins”, as a commentator above suggested, either by record or one of the pythags (my preference is actual record) standing would seem to correct best for the changing landscape for the first 1/3 through 1/2 of the season and in a way that WAR does not. to paraphrase DC’s phenomenal sox post from april: wins/losses in april, even in small samples, absolutely have an effect on the standings in september.

    the other suggestion i have relates to UZR in team WAR. is there anyway to strip the current year UZR from WAR and replace it with a 3 year WAR avg for players where that data is available or a 0 UZR for players without that experience (I say 0 because of Marcel’s success predicting rookie performance compared to other methods)? my concern is that when calcing a team WAR, the error associated with team UZR is the sum of the individual UZR errors. uzr might be a describer of what has happened, but it’s still a model, right? as a consequence, there is a pretty big error associated with team UZR.

    Vote -1 Vote +1

    • Yirmiyahu says:

      I’ve long wished that fangraphs in general would use a regressed 3-year sample of UZR when calculating WAR.

      But the system you’re describing is far too complicated for something like these Power Rankings. Every week, you’d need to calculate the regressed UZR for every player in baseball, and then add them together to get each team’s total.

      And consider that we’re talking about team stats. Most of the error should quickly correct itself just based on the larger sample size. And, if you really wanted to tweak the WAR equation for this exercise, it’d probably be better/easier to just toss UZR and use something like defensive efficiency (percentage of balls in play the defense was able to turn into outs). Philosophically, it’s a more pure complement to FIP anyway.

      Vote -1 Vote +1

      • jcxy says:

        two thumbs up on wholesale regressed UZR data included in WAR.

        and you’re probably right about feasability, but…if you will it, dude, it is no dream.

        Vote -1 Vote +1

  5. Jordan says:

    Why not replace FAN% with zips(R) (or equivalent) for all but the initial rankings? Continuing to rely on preseason projections when our best guess about future player performance, and perhaps more importantly future roster construction, changes as the season goes on, seems quite silly.

    Using actual record, zips(R), and some weighting of pythag records (perhaps giving more weight to more recent performance) seems the best way to go. This would basically mean using actual record as a starting point and then projecting ROS record using some combination of zips(R) and pythag. Sounds right to me.

    Vote -1 Vote +1

    • Eminor3rd says:

      Why not use ZiPs even on the initial rankings?

      Vote -1 Vote +1

      • Jordan says:

        My point was that it doesn’t make sense to keep relying on projections that may well be based on now false information (injuries, trades, unexpected performance). Your point that there are other problems with FAN% is well taken, but it is a separate point.

        Vote -1 Vote +1

  6. Eminor3rd says:

    I think you ditch FAN% altogether. That way, when some Indians fan gets upset, the explanation is simple: the numbers show they are playing over their heads and they are regression candidates. FAN% introduces bias, and causes more problems than benefits.

    Vote -1 Vote +1

  7. Jack says:

    Despite thinking these power rankings are flawed, I almost voted for the top option due to the great Seinfeld reference.

    My suggestion would be to use a few different components: Projections, WAR%, Pythag, and Reality in varying weights.
    Projections-I’d try an approach similar to the computers in the BCS. Use ZiPS% Marcel%, Tango%, and Fan%, dropping the highest and the lowest (so the average of the two middle projections) so that way crazy stuff like Colorado’s Fan% doesn’t completely ruin the rankings
    WAR%-The Main component. No adjustments for recent changes in the team make-up, in order to calculate past events fairly as well. Should be the primary number fairly early on.
    Pythag and Reality-This number would be a weighted average of the pythag win % (as a predictive tool) and real win&. Both would be weighted toward more recent production to account for trades, signings, and injuries. True Win% or something like that.
    Then for the weighting–True win% and WAR% would be combined for the reality number and then they would be combined for a weighted average with the projection%. I voted for the projections% to be at 50% at about the 50 game mark so obviously I support that here.

    Vote -1 Vote +1

  8. adohaj says:

    for a third metric I would go with season record (actual wins) but also weight the last 14 games to give more of an impact to recent performance.

    something like .4*(season win%) + .6*(last 14 game win%)

    I chose actual wins because Pythagorean wins usually don’t vary far from actual wins. Why not make the results match up with the newspaper?

    Vote -1 Vote +1

  9. Tom says:

    I would like to see an impact to signifcant roster changes incorporated (injury and/or trade). Not quite sure of the ‘how’ though (and it wouldn’t be easy). Some (probably bad) thoughts:

    1) Adjust the WAR % of fan projection % for a key player move (either to or from team)…. could use ZIPs? or simply prorate/extrapolate the current WAR over the rest of the season (though that might introduce signifcant noise if a player is performing well over/under expectation and gets traded).

    2) Injury would also be subjective as I think you have to account for expected performance minus the projected replacement performance (shoudn’t just assume replacement level) but I think this should adjust the Fan expectation portion. For example the Card lose Pujols 2 days into the season… obviously fan projections did not likely account for that so it should be adjusted by the expected loss. While in the end the power rankings will reflect injuries (fan projection starts going toward 0), along the way the rankings will probablty appear inflated for a team hard hit by injuries.

    I guess the main thought is that if the fan projections are way off due to a somewhat unforeseeable circumstance the rankings will likely not reflect it until you get toward theend of the year when the results become 100% of the rankings.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>