Narratives From Formulas

Bill James’ discovery of the Pythagorean Win Expectation is one of the cooler findings of sabermetric research. You can read up on the details by following the given link. In short, what James found is that one can get a pretty good approximation of a team’s winning percentage given only their runs scored and allowed by using the following formula:

It works remarkably well, and more recent versions like PythagenPat are even more accurate. I won’t repeat the basics, which can be looked up elsewhere. Instead, I want to address the occasional misuse of the formula for building narratives of teams being better or worse.

Graham MacAree actually addresses this problematic use of the Pythagorean Win-Loss formula in the previously-linked entry. I want to elaborate on Graham’s point from a slightly different angle, given that this is the time of the year when fans of teams, especially those that are rebuilding, are looking for hope and faith for the following season. A run differential that indicates a better-than-actual winning percentage is sometimes seen as a reason for hope (or, if the differential is worse than the team’s record, despair) for the team going forward.

This is problematic for several reasons. Leaving aside (controversial) ideas held by some regarding certain teams and managers possessing skills allow them to outplay their Pythagorean expectation, run differential is itself subject to a great deal of random variation. Even if deviation from the expected win percentage is simply the product of “luck” (the usual sabermetric shorthand for random variation), runs scored and allowed are themselves products of events subject to variation from the true talent of the players involved.

While this may seem obvious, deviation from Pythagorean expectation is cited enough as intrinsic evidence for how talented a team really is that it is worth showing what sort of silly narrative can be constructed by simply relying on a team’s run differential as an indication of its quality and likely future performance. With that in mind, let’s look at five consecutive seasons of an actual team’s expected winning percentage according to PythagenPat and see what sort of “narrative” it implies.

Year 1: .454
Year 2: .442
Year 3: .403
Year 4: .395
Year 5: .460

A (fictional) person who thinks run differential is the golden road to a team’s true talent would think that this team was mediocre for a couple of years, then absolutely dreadful for a couple more, then returned to about its original level in the fifth season. While that might be a good description of this team’s observed performance in terms of actual won-loss records, I don’t think many would think it would be an adequate, or even accurate description of the state of this “randomly selected” team, the 2007-2011 Kansas City Royals. The simplistic “run differential narrative” would tell you (if you compare it to the actual records) that, e.g., the 2011 team is better than its record shows this year (they’ve underperformed their PythagenPat win expectation by about six games so far), but that the Royals haven’t made any real progress over the last five seasons.

Of course, no informed observer would say that. Indeed, those who use run differential to say the Royals have improved this season know as well as anyone that there is a huge difference between the situation of the 2007 and 2008 teams, which were cobbled together from previously available “talent” and some free agent signings of varying wisdom, and the 2011 team, which is mostly cost-controlled, very young, and has more talent waiting for it in the minors. Indeed, the composition of teams themselves, changes from one season to the next, (which relates to the mistake of taking the current season to be a constant for the future, or even the playoffs), so why would one think the Pythagorean expectation from one season should apply to a different group of players?

In fact, I do think the Royals have made progress this season and that their future looks brighter than it has in a long time. But the Royals are just an example for the purposes of making the larger point — that run differential doesn’t add much, if anything, to our measuring of that progress. We know that observed performance of players varies from their true talent. We know that not only the composition of teams, but the true talent of players themselves (particularly those at the extremes of the aging curve) generally vary a great bit from season to season. All of those factors are important in projecting how teams will perform in the future.

But what does examining a team’s past run differential contribute to the job of projecting a team’s future performance beyond those factors? Nothing, really, by itself. It does have its uses: for example, when one has projected runs allowed and scored, one can use the Pythagorean formula to estimate how many wins can be expected for the team. But let’s stop using the formula in combination with past run differential as a proxy for projection.

Print This Post

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

17 Responses to “Narratives From Formulas”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. So what you’re saying is that Pythagorean Expectancy is a flawed sign of hope because it only measures how well the players actually played rather than how cost-controlled and young the organization is. Well, yeah, okay. Who is the fictional person arguing that a Pythagorean Record is the be-all end-all? It tells you how luckily a team spaced out its runs, not how great your minor league system and payroll scheme is supposed to be.

    +5 Vote -1 Vote +1

    • Blob says:

      But how well you spaced your runs isn’t particulalry relevant to how well you will do next year, unless you are that rare team with very little player turnover. And those teams tend to be winners anyway.

      For the poor non-contending sub .500 team, much more than recent performance has to be taken into account because so many variables change each year- unlike the perrenial division toppers.

      Vote -1 Vote +1

  2. Telo says:

    In summary:

    1. Wins are dependent on Runs Scored, Runs Allowed and luck/randomness.

    2. RS and RA are dependent on True Talent and luck/randomness.

    3. Don’t make dumb assumptions.

    Good luck to anyone who didn’t know that.

    Vote -1 Vote +1

  3. Jason says:

    I agree that the Pyth W% should not be used as the expectation for next year’s results. I do think that it is a better starting point for evaluating your future than actual W-L is though. However, you need to adjust from your Pyth W% based off of players lost in the offseason and expected results from both incoming and incumbent players. Yes your team may have scored that many runs, but are you really expecting Joe Average to repeat his career year of .320/.390/.500 or will he hit like the .280/.320/.430 hitter he is? Maybe Joe Average’s career year made up for Adam Allstar’s down year and it’ll even out next year. Gotta look at all 25 roster spots to see if your expecting more or less production.

    Long story short, you’ve gotta plug in your projected runs scored and runs allowed to get your projected Pyth W%. Otherwise, you’re just telling last year’s story.

    Vote -1 Vote +1

  4. Yirmiyahu says:

    Wait. Are there people who were using Pyth W% to predict next season’s results?

    +6 Vote -1 Vote +1

  5. RC says:

    Why do we keep using 2 for the exponent? I was under the impression that it was supposed to be closer to 1.8.

    Vote -1 Vote +1

    • KJOK says:

      The correct exponent actually varies depending on the run environment, although 1.83 works best for an ‘average’ run distribution season.

      Vote -1 Vote +1

  6. Daniel says:

    Can Fangraphs not be for everyone? Sure, a lot of people know the fallacies behind win expectancy and run differentials but there’s no harm in it being pointed out again.

    Do people want this kind of thought to become prevalent in baseball commentary or do they prefer to rail against Joe Morgan and J. Morgans because it makes them feel a bit special?

    Of course people are still constructing arguments around bare-bones run differentials. People are still constructing arguments around RBIs and Wins.

    Good article. Repetition is not harmful to the argument. We didn’t all arrive here on the same day.

    +15 Vote -1 Vote +1

  7. R_Magillicutty says:

    I don’t know many that would form an opinion or outlook on a team’s upcoming season by way of last year’s win expectation vs actual winning percentage. The change in wins only can deviate so much, to render it unlikely that one can change their entire assessment of a team’s performance over a season due to run differential. The article is directed at nobody.

    Vote -1 Vote +1

  8. CJ says:

    I think that looking at Pythag vs. actual W-L % has some value. I think there have been many teams which mistakenly believed that their W-L record indicated that they are “close” to competing. But, their deviation from Pythag should have told them that the W-L record is misleading. A team might have a .500 or moderate winning percent and think that they only have to add a key player or two in order to be playoff worthy. But, as an example, if their W-L record was 6 or 7 games better than the Pythag prediction, the team may be acting on a faulty premise that it needs to add only one or two key pieces.

    Vote -1 Vote +1

  9. Matt P says:

    I disagree.

    As a fan of a bad team(the Orioles), there are times when people look at the potential of young players on a team and just presume that the team is going to improve as these players gets older. While some players do reach their potential, others do not. This is one reason why the Orioles are fighting to avoid losing a hundred as opposed to trying to break .500 despite having promoted many top hundred prospects over the past three years.

    By using something like PyThag, it allows the user to temper his expectations. Maybe the given team(2012 Royals) does have hope for the future, but so far the young players haven’t improved the team and there’s no certainty that they will.

    Ultimately, PyThag tells you that so far there’s been no improvement at the Major League Level. That does have value.

    Vote -1 Vote +1

  10. Xeifrank says:

    Aren’t two standard deviations for the simple Pyth Wins something like 10? I’m not sure, so I guess I am kind of asking. :)
    vr, Xei

    Vote -1 Vote +1

  11. evo34 says:

    This article is terrible — I mean, bottom 10th percentile of fangraphs terrible. Why would you criticize a metric that adds information — which pyth win exp. clearly does? You are criticizing mythical people who are (apparently?) assuming that pythagorean win pcts. are perfectly clairvoyant. No one actually does this, btw. I can only assume that you also hate the way people use margin of victory based football power ratings to gain insight over raw W-L records?

    Vote -1 Vote +1

  12. ColeHamels1510_740M/770V says:

    Do I lack the mathematical aptitude to understand this article?

    Vote -1 Vote +1