The breaking of baseball known as the dead-ball era is generally considered a phenomena of the 1919 Babe Ruth season where he hit a record 29 homers for the Red Sox. That was a good year, but not something jaw dropping as three players had managed 25+ homers at that point and Ned Williamson’s record from 1884 was only two behind Babe. The next season was the unprecedented explosion when Ruth redefined power posting 54 home runs doubling up anyone else who had ever played in the big leagues.
It only took a few years for the trajectory of offense, and especially home run production, to change drastically. In 1922 Rogers Hornsby hit 42, Ken Williams 39, and Tilly Walker 37 all besting The Bambino’s paltry 35 that season. Over the next several decades home run production shifted drastically as power re-shaped the game.
Skewness is based on the Excel formula where anything between -1 and 1 is not skewed, and since we have no negatives here we will focus on above 1 to start, or positive skewness (long right tail). As you can see, the peak of skewness in HR production was that 1920 season where Ruth was an extreme outlier, see below:
You can see the skewness, a long right tail, and most of it is being driven by one observation. Positive skewness was always present in early baseball due to the large cluster of players at or slightly above 0, but this took it to a new level. If you go back to the previous chart though, you will see that as the league started hitting more long balls the skewness quickly dissipated, and by the late 40s went away. Only twice since 1949 did we see a skewness above 1, in 1981 and 1981 where the skewness shows up as 1.05 and 1.04 respectively, so right on the dividing line between truly skewed or not. Interestingly, the skewness leaves and stays away shortly after the talent pool widened with an influx from the Negro Leagues which may have cut out some of the lower end that was causing it.
One of the things to keep in mind for all of this is that a lot of people look at the steroid era as another period where baseball was broken with scientifically enhanced freaks blasting way more home runs than should be seen. Yet, in the data we don’t see a large spike in skewness through that period, which of course leads to a lot of ambiguity and no answers as you could read it in multiple ways including the two extreme views:
1) See, EVERYONE was cheating in the steroid era, so the entire distribution shifted enough to prevent even 1998’s home run chase ending with two players breaking the all-time record from becoming a skewed distribution.
2) Despite the cheating nothing was all that greatly affected. There happen to be a couple of cheaters who succeeded, but mostly the cheaters stayed with the pack and thus we see no skewness.
So what did the distribution look like in 1998?
Rather than the highest frequencies being 0 to 4 home runs and then tapering off quickly like 1920, we now see that every qualified batter came up with at least 1 HR and that the largest mass is from 9 to 23 home runs. This means that Mark McGwire’s 70 HRs was about 3.5 times the average and median which were 20.7 and 20 for the year. In comparison, Babe Ruth hit 10 times the average of 5.3 HRs in 1920 and 18 times the median of 3, so you can see how much farther from the pack he was.
Whether or not PEDs broke baseball again is not something I am prepared to answer here, but we can at least say it didn’t break it to the degree that Babe Ruth did when he signaled the end of the dead-ball era. What we can tell from home run production is that it seems to be distributed fairly evenly and has been for more than half a century of baseball in which time we have seen many changes to the game. All that leaves me with is more questions in reality, and that is just fine by me.
Print This Post