Outliers, Breakouts, and the Owl of Minerva

December 2, 2009

As part of “projection week” here at FanGraphs, this post follows Monday’s by discussing two phenomena that are often brought up in relation to projections: “Outliers” and “Breakouts.” Although they contain elements of truth, these notions are often used in problematic fashion to show that projections are “wrong.”

An “outlier” is a season that appears to differ greatly from a player’s usual performance. Some will claim that said season should be ignored when projecting a player, since it “obviously” does not represent his real skill. A “breakout” season is one in which a (usually young) player greatly exceeds expectations and/or past performance. The season is seen as establishing a new level of performance such that prior performance should be weighted much less heavily or ignored.

You may have noticed the potential contradiction. While the “outlier theory” claims that a single season deviating from an apparently established level of performance should be thrown out, the “breakout theory” claims that a single season deviating greatly from earlier seasons means that it should be looked at to the exclusion of the others. This isn’t necessarily a contradiction, as one could hold that there are particular conditions for outliers and breakouts — outliers might only apply to players in their prime, or breakouts to young players. Still, it’s worth noting, as you’ll often see the same person assert both.

The deeper and more important point is that by looking at one-year deviations as establishing a new level of performance that thus takes on a greater weight (breakout!) or as being irrelevant and thus in need of exclusion (outlier), both positions implicitly assume they already know what we’re trying to find out when projecting a player: his “true talent.” Recall the “general formula for player performance” from Monday’s post: performance = true talent + luck. The various methods that projection systems use (regression, weighted averages, age adjustments, etc.) are meant to take the (limited) data we have for a player and filter out luck in order to estimate his current true talent. These methods are predicated on the fact that we can’t pinpoint the player’s true talent given the limited performance samples we have, so we make our best estimate based on probabilities.

Labeling a single season as irrelevant or supremely relevant to estimating a player’s true talent implicitly assumes that one already knows that player’s true talent. One can certainly cite examples of each kind to support the case for a “breakout” or “outlier.” One could just as easily come up with (many more) examples of the opposite — where a perceived “breakout” or “outlier” turned out not to have the (in)significance assigned to it. But to do either obscures the important point. It is true that individual players age differently and deviate from expectations. However, projection systems only obtain the overall accuracy they have by projecting players as a whole based on the data on hand. An apparent “outlier” season from two years ago may weigh less heavily because time passing and/or, say, BABIP being regressed more heavily than other skills. An apparent “breakout” by a young player may have more impact on the projection because of age adjustment, greater playing time, etc. But projection systems do not and should not take these into account beyond their standard adjustments.

A famous German sabermetrician once wrote, “the owl of Minerva begins its flight only with the onset of dusk.” Although in retrospect we can look back on the careers of particular players and identify certain seasons as “outliers” or “breakouts,” this can only be done years later when we have a perspicuous overview of a period of a player’s career as a whole. Projection systems work in the midst of player performance without the benefit of historical perspective, and have to do the best they can based on the information at hand. Doing anything more would revoke the humble presuppositions upon which player projection rests.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG