Strong Starts Don’t Mean That Much

by Dave Cameron

April 30, 2012

Last Friday, I focused my weekly ESPN Insider column (which can also be read here on the site if you are a FanGraphs Plus subscriber) on the predictive power of a team getting off to a strong start in April. We know that at the individual level one month doesn’t mean much, but I wondered whether a dominating start to the season for an entire team might be more predictive of future success.

To do this, we looked at every team since 1974 that won at least 70 percent of their games in April (minimum 15 games), which gave us a sample of 45 teams. We then looked at how these teams performed from May through September to find out how predictive a strong team start actually was. I was pretty surprised at just how little it actually mattered.

To summarize the results, the 45 teams combined for a .743 winning percentage in April but just a .549 winning percentage from May through September. The correlation between April record and May-September record was just .24, and the r squared was just .06, meaning that you could only explain six percent of these team’s record in the final five months by their records in April.

We even broke these 45 teams into quartiles based on ratio of runs scored to runs allowed to see if a pythag method would have done any better, but the correlation was an even weaker .19. In fact, the 12 teams with the worst run differential among the .700+ April clubs performed nearly as well over the remainder of the season as the 11 teams with the best run differential. Even teams that started the year winning games by mauling their opponents regressed heavily over the rest of the season, and knowing a team’s run differential didn’t help identify which teams would sustain more of their strong start than others.

That doesn’t mean April performance is worthless, of course. The fact that these teams won 54 percent of their May-September games shows that the sample was primarily made up of playoff contenders, so we shouldn’t pretend that a strong start to the season is meaningless. As a quick-and-dirty estimate of necessary regression, last week Tom Tango suggested adding 35 wins and 35 losses to a team’s record on any given day.

To test his method against the results of these early season barnstormers, we can add 1,575 wins and 1,575 losses to the April total for these 45 teams, which would bring the total number of adjusted wins and losses to 2,340-1,839, which works out to a .560 winning percentage. That’s just slightly higher than the .549 mark actually posted by these 45 teams over the rest of their season, so Tom’s shortcut seems to work pretty well on this sample of strong starting teams.

Applying that 35-35 regression to the Rangers and Dodgers, who both currently stand at 16-6 to begin the year, would leave you with an expected future winning percentage of .554. This method suggests that we haven’t actually learned all that much about the Rangers, as we were already pretty sure that they were good at baseball. Their first month confirms our preseason expectations, but shouldn’t change it all that much.

For the Dodgers, it’s tempting to say that perhaps they entered the year a tad bit underrated. Rather than regressing to the mean, Matt Kemp has doubled down on his terrific 2011 season, and quality performances from Andre Ethier and their collection of high walk/low power role players (A.J. Ellis, Mark Ellis, and Jerry Hairston have all been particularly good) have pushed the Dodgers out to an early lead in the NL West. Kemp can’t keep this up all year, and the Dodgers pitchers are due for some significant BABIP regression, but the Dodgers may be a little better than they were given credit for.

We should be careful not to overreact to the results of April performances, but also understand that they do carry some meaning, especially when viewed in the right context. A great first month to the season is mostly useful for putting wins in the bank that count in the final standings, but April performance can also help us understand a small part of a team’s expected future performance. April performance isn’t gospel, nor is it worthless. It’s data, and properly regressed, it can have some predictive value.

35 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Chicago Mark

11 years ago

Excellent as usual Dave. But would Kemp be doing as well if he were batting 6th or 7th? 😉
Ps. So go the next step. WE probably gave the Dodgers a little less credit than they deserved. Do you think they can now make the playoffs in the NLW? I know that’s not exactly the object of the article but your thoughts would be welcome. Take the next step!!! 🙂

-1

batpig

Reply to Chicago Mark

of course they can! what a silly question! they are already 10 games above .500 and have a 4-game lead in the division, that means if they play .500 ball the rest of the way they will finish 86-76, and the regression “rule of thumb” above pegs them as a better than .500 team.

I think it’s obvious they’d have to be the front runners right now, this isn’t a division that had a clear-cut favorite coming into the season.

Reply to batpig

You’re a genious batpig! I wasn’t asking your opinion though. I wanted to hear from Dave. That being said, I’d still peg the DBacks as the favorites. So what does DAVE think?

-9

Anon21

Chicago Mark, you don’t seem to “get” the Internet. You can ask specific people who write widely-read articles to answer your specific questions, but mostly they will ignore you. Then you can either scorn your fellow readers’ answers to those same questions and resign yourself to eternal monologue, or you can try not to be a dick and just engage in conversation.

Sam Samson

My guess is Dave thinks somebody already answered your question adequately.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG