Outliers, Breakouts, and the Owl of Minerva

As part of “projection week” here at FanGraphs, this post follows Monday’s by discussing two phenomena that are often brought up in relation to projections: “Outliers” and “Breakouts.” Although they contain elements of truth, these notions are often used in problematic fashion to show that projections are “wrong.”

An “outlier” is a season that appears to differ greatly from a player’s usual performance. Some will claim that said season should be ignored when projecting a player, since it “obviously” does not represent his real skill. A “breakout” season is one in which a (usually young) player greatly exceeds expectations and/or past performance. The season is seen as establishing a new level of performance such that prior performance should be weighted much less heavily or ignored.

You may have noticed the potential contradiction. While the “outlier theory” claims that a single season deviating from an apparently established level of performance should be thrown out, the “breakout theory” claims that a single season deviating greatly from earlier seasons means that it should be looked at to the exclusion of the others. This isn’t necessarily a contradiction, as one could hold that there are particular conditions for outliers and breakouts — outliers might only apply to players in their prime, or breakouts to young players. Still, it’s worth noting, as you’ll often see the same person assert both.

The deeper and more important point is that by looking at one-year deviations as establishing a new level of performance that thus takes on a greater weight (breakout!) or as being irrelevant and thus in need of exclusion (outlier), both positions implicitly assume they already know what we’re trying to find out when projecting a player: his “true talent.” Recall the “general formula for player performance” from Monday’s post: performance = true talent + luck. The various methods that projection systems use (regression, weighted averages, age adjustments, etc.) are meant to take the (limited) data we have for a player and filter out luck in order to estimate his current true talent. These methods are predicated on the fact that we can’t pinpoint the player’s true talent given the limited performance samples we have, so we make our best estimate based on probabilities.

Labeling a single season as irrelevant or supremely relevant to estimating a player’s true talent implicitly assumes that one already knows that player’s true talent. One can certainly cite examples of each kind to support the case for a “breakout” or “outlier.” One could just as easily come up with (many more) examples of the opposite — where a perceived “breakout” or “outlier” turned out not to have the (in)significance assigned to it. But to do either obscures the important point. It is true that individual players age differently and deviate from expectations. However, projection systems only obtain the overall accuracy they have by projecting players as a whole based on the data on hand. An apparent “outlier” season from two years ago may weigh less heavily because time passing and/or, say, BABIP being regressed more heavily than other skills. An apparent “breakout” by a young player may have more impact on the projection because of age adjustment, greater playing time, etc. But projection systems do not and should not take these into account beyond their standard adjustments.

A famous German sabermetrician once wrote, “the owl of Minerva begins its flight only with the onset of dusk.” Although in retrospect we can look back on the careers of particular players and identify certain seasons as “outliers” or “breakouts,” this can only be done years later when we have a perspicuous overview of a period of a player’s career as a whole. Projection systems work in the midst of player performance without the benefit of historical perspective, and have to do the best they can based on the information at hand. Doing anything more would revoke the humble presuppositions upon which player projection rests.



Print This Post



Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.


Sort by:   newest | oldest | most voted
Rut
Guest
Rut
6 years 5 months ago

Ew, Hegel. Why?

Carson Cistulli
Editor
Member
6 years 5 months ago

Have you seen Hegel’s projections for this year?! He has all the German guys just crushing. I mean, Ryan Langerhans is a nice player, but I don’t see him posting a .430 wOBA.. And he’s got Ross Ohlendorf posting a negative ERA. Is that even possible?

Daern
Member
Daern
6 years 5 months ago

If I could thumbs up this multiple times, I would.

lookatthosetwins
Guest
lookatthosetwins
6 years 5 months ago

Maybe try a proxy?

Logan
Guest
Logan
6 years 5 months ago

Some people just don’t get enough credit around here. Not talking about Langerhans or Ohlendorf either.

LTG
Guest
LTG
4 years 14 days ago

Negative ERA is the sublation of offense and defense, which are the determinate negations of each other and, so, unintelligible except in so far as they give rise to a further concept that includes both of them together. In other words, negative ERA is pitcher’s RBI > pitcher’s ER.

JoeR43
Member
JoeR43
6 years 5 months ago

/slow clap

Mike Green
Member
Mike Green
6 years 5 months ago

Marx really went ape-woolies on his Cincinnati projections. Jay Bruce to hit a very bourgeois .340/.450/.600? Micah Owings winning both the Triple Crown and the ERA title did seem a bit revolutionary for me. Maybe he’s on the opiate of the masses.

Toddk
Member
Toddk
6 years 5 months ago

So are you saying that Adrian Beltre in’t going to repeat his ’04 season numbers?

arch support
Guest
arch support
6 years 5 months ago

I must compliment your title. At first I thought I’d be reading a sabremetric analysis of an as-yet unpublished Harry Potter novel.

Bill
Guest
Bill
6 years 5 months ago

Sometimes “breakout” seasons are not necessarily unpredictable… for example a player learning a new pitch…

Kevin S.
Member
Kevin S.
6 years 5 months ago

BPro was all over Justin Upton last spring. They probably weren’t alone in that, though.

lookatthosetwins
Guest
lookatthosetwins
6 years 5 months ago

Any and all information should be used when projecting a player. A completely computer-based projection is a good place to start, and adjustments can be made from there. If someone gained 50 pounds the same year he starts hitting bombs, maybe add a little to the HR projection. If a pitcher starts throwing a cutter and gets more strikeouts, add a little to the k total. If scouts are saying a player has improved his lateral movement and his UZR spikes, add a little to the UZR projection.

The problem is when you take those qualitative observations, and use them as an excuse to ignore prior evidence. You still need to expect regression, just maybe less regression than normal.

mickeyg13
Member
6 years 5 months ago

It sounds great and all to use all that other information, but I’m not sure that there is much to gain from that. For instance, PECOTA incorporates information about phenotypes of players like their height and weight, but it performs roughly as well as something like MARCEL which is completely (and intentionally) ignorant to all of that stuff.

Kevin S.
Member
Kevin S.
6 years 5 months ago

Out of curiosity, how do you define performing roughly as well? How do we rate the performance of projection models, period? I’m legitimately curious about these questions.

lookatthosetwins
Guest
lookatthosetwins
6 years 5 months ago

Kevin, look at this. Tom Tango also did a big thing this year, but I can’t find it with a google search right now.

http://www.hardballtimes.com/main/article/so-how-did-tht-projections-do/

Mickey,

Here’s tango evaluating David Gassko’s pre-season “gut” predictions to outperform or underperform their projection. It looks probable that doing this for certain players will improve their projections.

http://www.insidethebook.com/ee/index.php/site/comments/david_gassko_versus_tht_forecasting_system/

As far as PECOTA goes, it never seems to do very well in these comparisons, so even though it is taking these things into account, it doesn’t seem to be doing so very well.

lookatthosetwins
Guest
lookatthosetwins
6 years 5 months ago
Michael
Member
6 years 5 months ago

To be fair, PECOTA struggled this season and may have suffered from some really bad league adjustments from the minors. Otherwise, it seems to do about as well as everyone else.

Tim B.
Guest
Tim B.
6 years 5 months ago

Excellent post.

Nathaniel Dawson
Guest
Nathaniel Dawson
6 years 5 months ago

A more accurate way to state that formula would be “production = true talent + random occurrence”.

MBD
Guest
MBD
6 years 5 months ago

I plan to have my breakout season in 2010.

Bah!
Guest
Bah!
6 years 5 months ago

And here I am reading “Philosophy of Right” like a chump!

dudley
Member
Member
dudley
6 years 5 months ago

It makes sense that one would weight a younger player’s unexpected production more heavily than a more established player, because the sample represents a greater proportion of the total data set for that player. E.g., we weight Aubrey Huff’s great season less than Justin Upton’s great season, because we have more data and therefore know more about Huff’s “true” talent level than we do about Upton.

TCQ
Guest
TCQ
6 years 5 months ago

Well, yes, but I think the whole point of the article is to say that we do that too much. Just because a season makes up a large sample of a young player’s data set does not mean we shouldn’t be taking it with a grain of salt.

Low sample size just means the data is unreliable, not that we should be taking said sample as gospel.

Ben
Guest
Ben
6 years 5 months ago

While there are no truly luck-based statistics, there are statistics that will naturally have higher variation due to the nature of them (thinking BABIP in particular). Similarly, a derived statistic like FIP that tries to take into account only statistics that have a natural low variation allows you to tease apart the true talent vs. luck portions of a player’s production. Obviously, this use of these statistics has a necessary element of subjectivity, and relies on more information than just one or two statistics, but if it’s done correctly (or the numbers work out well) it can be pretty easy to call outliers outliers and breakouts breakouts.

For instance, my personal claim to fame amongst my baseball friends is calling Wandy Rodriguez’s “breakout” season last year. By comparing his FIPs to BABIPs along with information like pitchf/x data it seemed only natural that he’d “regress” to being a very good pitcher (in each of the last three seasons leading up to this one his BABIP increased while his FIP decreased, resulting in his ERA staying roughly the same even though his pitch %s and speeds stayed roughly even) If “talent-based” statistics improve, while “luck-based” statistics don’t, it seems fair to label the increase in production an increase in talent by virtue of that equality. Obviously this isn’t an infallible system as they’re all just numbers derived from performance, but if you look at them right and take all the information you can into account to provide a vivid picture, it can in fact be doable to separate the “breakouts” from “outliers” with at least some consistency.

dxc
Guest
dxc
6 years 5 months ago

1976-77 appeared to be a tremendous Breakout season for me with a AVCS+ of 2600.

Unfortunately the extent to which I wanted to play to the exclusion of all else became an issue and my mother prevented me from a subsequent Super Breakout season.

In no time is was common to see an AVCS+ of 5200, and it became clear that my Breakout season was instead an outlier, despite a brief resurgent Breakout 2000 with the Jaguars.

wpDiscuz