Over the last nine baseball seasons, at least one thing has been constant: the domination of Ron Gardenhire’s Minnesota Twins by the New York Yankees. Between the regular season and the postseason, the Twins have lost a staggering 57 of their last 75 contests against the New York, including a sweep in last season’s ALDS that just seemed all too familiar. These struggles have created a narrative: the little-guy Twins, despite all their regular season success, crumble under the pressure of the big-city Yankees. Is this just the magnification of a small 75-game sample, or is there something substantive in the Twins 18-57 record against the Yankees under Gardenhire?
Using a Bill James method known as log5, we can find an expectation for one team’s performance against another. Log5 predicts Team A’s win percentage against Team B with the following formula:
The A in the above formula is Team A’s winning percentage and B is Team B’s winning percentage. Since 2002, the Twins are, including the postseason, 809-677 for a .544 winning percentage. The Yankees are 914-612 over that same time frame, for a winning percentage of .599. Therefore, the log5 method tells us that we should expect the Twins to win 44.4% percent of the time, or 33.3 games out of 75.
So the Twins won 15.3 fewer games than we would expect, nearly the number of games the team won total, although that may not even sound as shocking as it should. According to binomial distribution, the probability of 18 (or fewer) successes out of 75 tries with a success probability of .444 is only 0.021%. For another measurement of just how outlandish this run has been, I ran 2500 simulations of the 75 games assuming a Twins win probability of 44.4%. In only one of the simulations did the Twins win the 18 games that we’ve observed over the past nine years. The actual result of the games is a whopping 3.6 standard deviations from the mean expectation.
Running these numbers using Pythagorean records since 2002 yields nearly identical results, as nearly nine years of data results in Pythagorean records similar to actual records. It’s also worth noting that the Pythagorean record of the Twins-Yankees game results in a record of 52.5 wins for the Yankees and 22.5 for the Twins – slightly closer, but still only a 0.8% chance according to the binomial distribution and still 2.5 standard deviations from the mean expectation of just over 33 wins. As one extra note, it is important to remember that although the odds of the Yankees dominating at this level are tremendously low, the odds of any one team dominating any one other team like this over one stretch are much higher simply due to the large amount of 75 game head-to-head samples.
Using this data, we can reject the hypothesis that the Twins are actually a .444 team against the Yankees – as we would expect given the log5 method – with 99.9841 certainty, well over the typical 95% level of statistical significance. This highly suggests that some other factors are at play besides each team’s talent relative to the league. For some reason, the New York Yankees play much better or the Minnesota Twins play much worse (or both) when these two teams match up.
The Twins would have to be a roughly .350 team against the Yankees for the 18-57 actual result to be within two standard deviations of the mean expectation. The Yankees “extra advantage” over the Twins is at least in the realm of 100 points of winning percentage, if not more. Now the question is where this extra advantage comes from. I’d like to explore that in later posts, but for that I need some help. Is it something with the way each team distributes their talent? Are the teams built by the Yankees just better suited to play the Twins than the rest of the league (or vice-versa)? Is there something about Ron Gardenhire’s tactics against the Yankees which puts his teams at a disadvantage? Are the Yankees great at advance scouting the Twins? Is there anything else that I’m missing? Put your ideas in the comments and I’ll try to investigate some of them in the coming week.