Let’s face it, people love the home run. It’s why players like Mark Reynolds can find jobs. These days, we aren’t surprised when we see a couple of home runs in one game. It wasn’t always like this, however. Home runs used to be a rarity among baseball events. In the early 20th century, it wasn’t uncommon for a player to lead his league in baseball by hitting 10-15 home runs. This brings me to the question: how has the home run actually changed? Not in terms of its frequency, but in terms of its value. More specifically, its value in runs. To approach a solution to this question without arduously parsing through hundreds of event files, we must find a way to mathematically frame the game of baseball in a way that encourages simplicity but doesn’t lose the most familiar parts of the game.
The first batter of the game steps to the plate and sees no runners on base with none out. He pops up. The second batter steps to the plate and sees the immediate result of the last at bat: an out. The second batter walks. The third batter then sees the immediate result of the second batter’s at bat: a runner on first base. The stream of batters stepping to the plate and being placed into a state resulting from the previous batter’s at bat exemplifies the nature of a Markov Chain. When a batter steps into the batter’s box, his current state (whether it be an out situation, a base situation, or a base-out situation) is only dependent on the previous batter’s state. This is known as the Markov Property. Using this structure, we can simulate any baseball game we’d like. However, to keep our calculations simple, we should introduce some new rules.
The Rules of the Game
- A batter can only attain a BB, HBP, 1B, 2B, 3B, or HR.
- Outs only occur via a batter getting himself out.
- Anything other than the events from 1) is assumed to be an out.
- When a batter gets a hit, the runners on base advance by as many bases attained by the batter (e.g. a double with a runner on second will score the runner on second).
These are the rules of the game. There are no stolen bases, no scoring from second on a single, and no double plays. We have stripped the game down to only its essentials, while implementing certain changes for our own convenience. For our purposes, we don’t care about Mike Trout‘s 33 stolen bases, only the fact that he mainly attains his bases through the events we allow.
The Out Chain
We assume that the probability of a batter getting a single at any point during a season is the number of singles he gets for the season divided by his plate appearances. We do this for the probabilities of all our desired events. By doing this, we can construct a simple Markov Chain where players step to the plate and find themselves batting with 0, 1, or 2 outs. We find that this chain is irreducible, meaning that each state (0, 1, or 2 outs) eventually leads to every other state. This, and the fact that we are dealing with a finite number of states, leads us to the existence of a probability distribution on our state space of outs. It so happens to be that when a batter starts his at bat, he does so with an equal probability of seeing 0, 1, or 2 outs, i.e. the probabilities of a batter seeing 0, 1, or 2 outs when he comes to the plate are all 1/3. The knowledge that outs are uniformly distributed over our game allows us to construct probabilities for a more complicated chain that should shed light on our original question.
The Base Chain
We now place our focus on the stream of batters who see a certain base situation when they step to the plate. The transitions of base situations are dependent on the out situation, as can be seen when a batter bats hits with 1 out versus 2 outs. Batting with 1 out, if the player makes another out, then the base situation stays the same for the subsequent batter. If he does this with 2 outs, however, then the inning is over and the base situation reverts to the state where no one is on base in the next inning. Fortunately, we know that the probability of a batter seeing any number of outs when he steps to the plate is 1/3. In a similar manner to the Out Chain, we find that every state in the Base Chain leads to every other state. The “runners on the corners” state eventually leads to the “bases loaded” state, which eventually leads to the “bases empty” state, and so on. Since there are finitely many base situations, we are led to a stationary probability distribution on the state space of base situations. That is to say, there is a probability associated with a runner stepping to the plate and seeing the bases empty, and another for seeing a runner on first, etc.
Using this method, a player in our universe who stepped to the plate in 2013 saw the bases empty with an approximate probability of .467. That same batter saw the bases full with a probability of .103 and one runner on first with probability .210. If a team managed to load the bases, they’d find that they generally had to wait about 10 more plate appearances before they next loaded the bases. If they put runners on the corners, they generally had to wait 42 more plate appearances before they did so again. All of this leads us to some of our final conclusions. In the context of our rules, the expected number of runners on base in 2013 was .908, meaning that the expected value of a home run was 1.908 runs. This method generates home run values that are always between 1.8 and 2.2 runs. The following is a table of all of the expected home run values this method generates from the seasons of the last 25 years:
In the last 25 years, we predict that a home run had the greatest value in 1999, at 1.972 runs. This is a reflection of the heavily offensive environment of the season, when big bats such as Sammy Sosa, Mark McGwire, and Barry Bonds were getting on base at staggering rates. The following is a graph of all of the home run values the system predicts from 1884 through 2013:
We see that this system predicts home runs to have been of more value from around 1889 to 1902, when the home run hovers at around 2.00-2.15 runs. While most players of this generation weren’t hitting home runs, they were certainly getting on base often. In 1894, 38 players had on base percentages greater than or equal to .400, compared to 7 players in 2013. When on base percentages are higher, more people are on base, and this increases the expected value of the home run. Under our restrictions, however, the home run hasn’t been worth 2.00 runs since 1950 and these days it fluctuates between 1.90 and 1.93 runs. While these estimates are all under the umbrella of rules and assumptions, this framework allows us to more easily generalize the game of baseball while preserving its most important aspects. It’s this framework that gives us the power of estimating that, while Chris Davis‘ 53 home runs were probably worth 101 runs in 2013, they may have been worth 114 in 1894.