Strong Starts Don’t Mean That Much

Last Friday, I focused my weekly ESPN Insider column (which can also be read here on the site if you are a FanGraphs Plus subscriber) on the predictive power of a team getting off to a strong start in April. We know that at the individual level one month doesn’t mean much, but I wondered whether a dominating start to the season for an entire team might be more predictive of future success.

To do this, we looked at every team since 1974 that won at least 70 percent of their games in April (minimum 15 games), which gave us a sample of 45 teams. We then looked at how these teams performed from May through September to find out how predictive a strong team start actually was. I was pretty surprised at just how little it actually mattered.

To summarize the results, the 45 teams combined for a .743 winning percentage in April but just a .549 winning percentage from May through September. The correlation between April record and May-September record was just .24, and the r squared was just .06, meaning that you could only explain six percent of these team’s record in the final five months by their records in April.

We even broke these 45 teams into quartiles based on ratio of runs scored to runs allowed to see if a pythag method would have done any better, but the correlation was an even weaker .19. In fact, the 12 teams with the worst run differential among the .700+ April clubs performed nearly as well over the remainder of the season as the 11 teams with the best run differential. Even teams that started the year winning games by mauling their opponents regressed heavily over the rest of the season, and knowing a team’s run differential didn’t help identify which teams would sustain more of their strong start than others.

That doesn’t mean April performance is worthless, of course. The fact that these teams won 54 percent of their May-September games shows that the sample was primarily made up of playoff contenders, so we shouldn’t pretend that a strong start to the season is meaningless. As a quick-and-dirty estimate of necessary regression, last week Tom Tango suggested adding 35 wins and 35 losses to a team’s record on any given day.

To test his method against the results of these early season barnstormers, we can add 1,575 wins and 1,575 losses to the April total for these 45 teams, which would bring the total number of adjusted wins and losses to 2,340-1,839, which works out to a .560 winning percentage. That’s just slightly higher than the .549 mark actually posted by these 45 teams over the rest of their season, so Tom’s shortcut seems to work pretty well on this sample of strong starting teams.

Applying that 35-35 regression to the Rangers and Dodgers, who both currently stand at 16-6 to begin the year, would leave you with an expected future winning percentage of .554. This method suggests that we haven’t actually learned all that much about the Rangers, as we were already pretty sure that they were good at baseball. Their first month confirms our preseason expectations, but shouldn’t change it all that much.

For the Dodgers, it’s tempting to say that perhaps they entered the year a tad bit underrated. Rather than regressing to the mean, Matt Kemp has doubled down on his terrific 2011 season, and quality performances from Andre Ethier and their collection of high walk/low power role players (A.J. Ellis, Mark Ellis, and Jerry Hairston have all been particularly good) have pushed the Dodgers out to an early lead in the NL West. Kemp can’t keep this up all year, and the Dodgers pitchers are due for some significant BABIP regression, but the Dodgers may be a little better than they were given credit for.

We should be careful not to overreact to the results of April performances, but also understand that they do carry some meaning, especially when viewed in the right context. A great first month to the season is mostly useful for putting wins in the bank that count in the final standings, but April performance can also help us understand a small part of a team’s expected future performance. April performance isn’t gospel, nor is it worthless. It’s data, and properly regressed, it can have some predictive value.

Print This Post

Dave is the Managing Editor of FanGraphs.

35 Responses to “Strong Starts Don’t Mean That Much”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Chicago Mark says:

    Excellent as usual Dave. But would Kemp be doing as well if he were batting 6th or 7th? ;)
    Ps. So go the next step. WE probably gave the Dodgers a little less credit than they deserved. Do you think they can now make the playoffs in the NLW? I know that’s not exactly the object of the article but your thoughts would be welcome. Take the next step!!! :)

    Vote -1 Vote +1

    • batpig says:

      of course they can! what a silly question! they are already 10 games above .500 and have a 4-game lead in the division, that means if they play .500 ball the rest of the way they will finish 86-76, and the regression “rule of thumb” above pegs them as a better than .500 team.

      I think it’s obvious they’d have to be the front runners right now, this isn’t a division that had a clear-cut favorite coming into the season.

      Vote -1 Vote +1

      • Chicago Mark says:

        You’re a genious batpig! I wasn’t asking your opinion though. I wanted to hear from Dave. That being said, I’d still peg the DBacks as the favorites. So what does DAVE think?

        -9 Vote -1 Vote +1

      • Anon21 says:

        Chicago Mark, you don’t seem to “get” the Internet. You can ask specific people who write widely-read articles to answer your specific questions, but mostly they will ignore you. Then you can either scorn your fellow readers’ answers to those same questions and resign yourself to eternal monologue, or you can try not to be a dick and just engage in conversation.

        +10 Vote -1 Vote +1

      • Sam Samson says:

        My guess is Dave thinks somebody already answered your question adequately.

        Vote -1 Vote +1

  2. Jake says:

    Isn’t the point of getting off to a strong start to help your chances of making the playoffs? As a fan I’m not really too concerned with what the Dodgers or Rangers winning percentage might be for the rest of the year, but whether or not they’ll use that advantage to get to the playoffs. This article doesn’t address that at all. I’d be interested to see how many of those 45 teams made the playoffs and how many would have had they played in the current format (three division winners plus two wild cards.

    Vote -1 Vote +1

    • Nick Lindner says:

      You’re mistaken. He did, in fact, address the importance of the first month in “banking wins” to help make the playoffs.

      Vote -1 Vote +1

      • Jake says:

        I’ve now read this article three times to make sure, and I still don’t see it. The only reference to the playoffs or the post-season comes in the fifth paragraph: “The fact that these teams won 54 percent of their May-September games shows that the sample was primarily made up of playoff contenders, so we shouldn’t pretend that a strong start to the season is meaningless.”

        If you’re referring to the article he linked to, then I apologize. I don’t subscribe to either site, so I haven’t read that. In either case, I think it would be more useful to expand upon that snipet right there than to say that teams are unlikely to maintain a .700 winning percentage over the remainder of the year. It’s more of an impact over correlation thing.

        Vote -1 Vote +1

      • vivalajeter says:

        Jake, it’s in the last paragraph. Dave writes “A great first month to the season is mostly useful for putting wins in the bank that count in the final standings”

        Vote -1 Vote +1

      • bstar says:

        That’s not at all what Jake was talking about.

        Vote -1 Vote +1

      • wahooo says:

        I think the “banking wins” statement is mostly wrong.

        Vote -1 Vote +1

  3. Slartibartfast says:

    Yup. 22 games out of a 162 game season shouldn’t get anyone too riled up.

    Vote -1 Vote +1

  4. Colin says:

    Well yes the correlation is low between hot starts and true talent. However, there is still a somewhat ok positive correlation and there is an effect on the requirements of performance going forward in order to overcome the start. If the team is .554 WP true talent and they start 16-6 that has a big impact as opposed to starting 6-16 because you can only assume true talent performance going forward. They don’t compensate by over performing later.

    Vote -1 Vote +1

  5. BX says:

    This article should be mailed to Orioles fans everywhere.

    Vote -1 Vote +1

    • Oliver says:

      Yeah! With a note that says, “Don’t enjoy the success your club is having after all those years in the wilderness.”

      (Or we could just let them enjoy it for a bit)

      Vote -1 Vote +1

  6. Jason H says:

    The real question is “are the first month or games more predictive than a similar sample of games from other times in the season”. Using a restricted sample size, you’ve shown that a small sample of games is a weak predictor of a teams ability to win games going forward. Honestly, everybody knew this. Its a good part of the reason why the season is so long. The question is are the first 22 games more predictive than 22 games from elsewhere in the season.

    To properly address this question you need to look at the record of all teams (not just ones arbitrarily selected that have won 70%) and calculate the difference between their win expectancy (linear extrapolation) based upon the first 22 games and the actual number of games they win. You then need to compare this to the difference in win expectancy from either samples or 22 games (or windows of 22 games) chosen throughout the rest of the season. Where do the first 22 games fall in this distribution? ….probably smack in the middle.

    Vote -1 Vote +1

    • vivalajeter says:

      Jason, with all due respect, you can decide the *real* question when you start writing your own articles. It’s silly to tell Dave that he’s asking and answering the wrong question, just because you might want there to be a separate article. Dave wanted to see how strong starts translate over the course of the season, so he looked at teams with strong starts. Why would he include all teams in the data, rather than just teams that had strong starts?

      Vote -1 Vote +1

      • Jason H says:


        Ok that is fine. However, I don’t think Dave actually answered any question because he really didn’t compare his data to anything. He basically showed that small samples are not predictive.

        Vote -1 Vote +1

  7. Mike Green says:

    There is one issue with the analysis. Among the clubs that go .700+ early are a disporportionate number that run away and for whom the games late do not mean much. It can be a different game after September 1, and this is particularly so for teams which have a 10 game or more lead. I suspect that the correlation would be a little tighter if you used May 1-August 31. But not much.

    Vote -1 Vote +1

  8. philosofool says:

    I’m curious whether we can improve this regression algorithm using RS and RA. What’s the correlation between April Pythag and May-Sept. Pythag?

    Right now the Dodgers (Pthag = .60) and Mets (.40) are big over-achievers, while the Cardinals (.78) and Rangers (.76) are under performing.

    Vote -1 Vote +1

    • vivalajeter says:

      I don’t put much faith in Pythag records at this point in the year. Over the course of the season it might work out well because things even out, but in small samples they can get out of whack. The Mets gave up 13 runs in one inning in Colorado last weekend. Fluky events like that will have too much of an impact over the course of ~20 games.

      That’s not to say they’re not overachieving – they are, mainly because of their record in one-run games – but overall I don’t pay attention to Pythag until we’re further into the season.

      Vote -1 Vote +1

      • philosofool says:

        The question is which is more reliable, not whether it is reliable in some (non existent) absolute sense.

        Also, the Mets have a -20 run differential, so you can’t chalk it all up to a 13 inning.

        Vote -1 Vote +1

  9. wahooo says:

    I don’t get it–so was the correlation only run with the teams with .700+ winning %? If I understand correctly, then you are comparing the teams that won 80% to the ones that won 70% to see if the ones that won 80% fair better than the ones that won 70% –so we’re talking about 2-3 win difference from the high end to the low end–we shouldn’t be surprised that there isn’t much correlation. If you want to see if there is a correlation between the first month and the other months, why not use all the teams? What am I missing?

    Given that no team finishes the season with a .743 winning percentage, it is also unrealistic to expect the teams to continue this way–and the fact that they won .549 seems (probably only 30% of teams win this many over the year) to say that it is somewhat of a predictor of success.

    Vote -1 Vote +1

  10. JWTP says:

    Did you do any work with slow starts? I only ask because the Angels are curious.

    Vote -1 Vote +1

  11. jim mcAulife says:

    Brilliant! “We should be careful not to overreact to the results of April performances, but also understand that they do carry some meaning,”

    Vote -1 Vote +1

  12. Todd says:

    I fail to understand how a ~.550 winning % going forward “doesn’t mean that much”. Seems to me that it means a lot. .550 is nearly a 90-win pace. Sure, if you expected a team that started hot to do well, you might think you hadn’t learned much. But take a team at random, then learn they did this well in April, and thus could be expected to finish out the year playing .550 ball. Surely that’s a lot of information? I was expecting you to say something like .510 or .520, in which case I’d have agreed with the article title.

    Vote -1 Vote +1

    • wahooo says:

      exactly what I was trying to say above, but better summarized by Todd. I’m not sure what these statistics really tell us. It is unrealistic to think that a team that is winning at a 70% clip will continue to win at that rate–the fact that the teams with that winning percentage do pretty well seems to say there is some correlation.

      Also, I disagree with the premise that the games in the bank are more important. A team with a .743 winning percentage will win 6 more games than a .500 team of 25 games, whereas a .549 team will win 7 more games over 147 than a .500 team. How is it that the banked games are more important?

      Vote -1 Vote +1

  13. Nick44 says:

    Strong teams regress. But by not including weak April records in your dataset, I’m not sure what is going on.

    I can take a glance at the all-time record wins for a season and figure out that you are not going to get a strong correlation (.743*162 = 120.366).

    If you did a Spearman’s Rank correlation of April Standings vs May vs September standings I think one might find a little more meaning in an early strong start.

    Vote -1 Vote +1

  14. Boomer says:

    The 1984 Detroit Tigers started the season something like 25-5 (.833) and finished the season with a 79-53 (.598) record for a total of 104-58 (.642). This was a very good team that trailed off, but not that much to the point that they dogged it coming to the finish line. And to top it off, they won the WS vs. the Padres.

    Vote -1 Vote +1

  15. sportsczar says:

    Does this mean I will stop hearing pundits tell me that the Rangers have the division sewn up? I am SO tired of hearing that the Angels have no chance at the division title now. Is it likely that they win it? Well, it’s less likely than it was on April 4th. I just need to turn the volume off when I have a game on. Former players are the worst. No, I have not contributed anything to this discussion. I’m okay with that.

    Vote -1 Vote +1

    • bpdelia says:

      what are you talking about? It is well known that no team that was ever ten games out at any point has ever won its division. also former players bring much needed insight as to how various clubhouses (both by sq ft.and locker size) as well as a particular city’s restaurant and golf scene impact playoff races.

      Vote -1 Vote +1