## Regression toward the Mean

In conversations about baseball statistics, the word “regression” is used quite often, but there are essentially two different meanings associated with the word and it’s important to separate them because they mean different things. Colloquially, the word “regress” is often used to mean movement backwards. The dictionary definition of this is something like “returning to a former or less developed state.” You will absolutely hear people use this word to describe baseball players. If a good player gets worse, they can be said to have regressed. That is, their talent has declined.

However, this is not usually what we mean when we are talking about baseball statistics, so it’s important to be precise with your terminology. We are typically talking about the statistical concept known as “regression to/toward the mean.” Regression toward the mean (RTM for clarity in this article) is the concept that any given sample of data from a larger population (think April stats) may not be perfectly in line with the underlying average (think true talent/career stats), but that going forward you would expect the next sample to be closer to the underlying average than the first sample. Observations tend to cluster around the average value, even if the previous value is unusual.

Let’s use a concrete example. Imagine you have a player with a career OBP of .350. Over the last few seasons it’s been .340, .360, .340, .360, and .350. Let’s assume the league’s run environment has stayed the same and the player is around 28, so there is no particular reason to expect his talent level to change or for his OBP to spike due to a clear external factor. He is, as best as we can tell, a true talent .350 OBP hitter.

But now let’s imagine we observe his next 100 PA in which he posts a .300 OBP. What should we think about his next 500-600 PA based on the information we have? In other words, do those 100 PA at .300 OBP alter the way we think about the player and by how much?

Any sample of PA contain potentially useful information. Maybe he’s hurt, maybe he’s aging poorly, maybe the league learned to exploit a weakness. Maybe his true talent has changed. But when we are asked to assess this player, the previous five seasons carry a lot of weight. We don’t just forget about them because our player had a bad April. So to forecast his future performance, we need to consider RTM. It’s more likely that he will perform close to his career average (or some weighted version of it) than the sample of plate appearances immediately preceding the question.

RTM is not a positive or negative. It’s a push toward average. If our player had posted a .400 OBP, the exact same properties would apply. To put it another way, any one small sample is less informative than a must larger sample even if the larger sample is slightly older. So when a player gets off to a hot or cold start, we want to factor in RTM.

Keep in mind there is no “correct” way to account of RTM in baseball. It’s a conceptual framework, and like most conceptual frameworks there are exceptions. Players’ underlying true talent does change from time to time based on a variety of factors. If a pitcher learns a new pitch, their history is still useful, but it’s much less useful than it is for a pitcher who is using their same arsenal.

The idea behind using RTM in baseball is that we can’t directly measure true talent, we simply infer it from observing outcomes on the field. Baseball has a lot of randomness that makes individual observations fluctuate around the player’s true talent. Picture a line drive being caught by a leaping defender and a weak grounder finding a hole. Because we can’t measure true talent directly, we can’t say for sure when it changes and when we are simply observing a set of data points that are different from that talent level for unrelated reasons.

In other words, because of the randomness (factors unrelated to the talent of the player we care about) involved in generating baseball outcomes, it takes a long time for the statistics we create to tell us exactly how good a player is. This means that any one section of data might not be a clear reflection of the underlying average. So going forward, we expect the data to look more like the overall numbers rather than a single, recent sample. We must regress any new data toward the mean.

As I noted, this is not a formulaic rule. Sometimes players talent level changes. But RTM is accounting for the fact that you can observe outcomes that are not in line with a player’s true talent simply due to randomness and that going forward true talent is a better predictor. Think of it this way:

Outcomes = Talent + Randomness

We can only observe outcomes, but we care about talent. We want to sort out randomness by getting the randomness to cancel itself out over a long period of time. Randomness is most likely to confuse you in short samples, so that’s why we use larger samples (i.e. regression toward the mean) to inform our opinions.