Outliers, Breakouts, and the Owl of Minerva

As part of “projection week” here at FanGraphs, this post follows Monday’s by discussing two phenomena that are often brought up in relation to projections: “Outliers” and “Breakouts.” Although they contain elements of truth, these notions are often used in problematic fashion to show that projections are “wrong.”

An “outlier” is a season that appears to differ greatly from a player’s usual performance. Some will claim that said season should be ignored when projecting a player, since it “obviously” does not represent his real skill. A “breakout” season is one in which a (usually young) player greatly exceeds expectations and/or past performance. The season is seen as establishing a new level of performance such that prior performance should be weighted much less heavily or ignored.

You may have noticed the potential contradiction. While the “outlier theory” claims that a single season deviating from an apparently established level of performance should be thrown out, the “breakout theory” claims that a single season deviating greatly from earlier seasons means that it should be looked at to the exclusion of the others. This isn’t necessarily a contradiction, as one could hold that there are particular conditions for outliers and breakouts — outliers might only apply to players in their prime, or breakouts to young players. Still, it’s worth noting, as you’ll often see the same person assert both.

The deeper and more important point is that by looking at one-year deviations as establishing a new level of performance that thus takes on a greater weight (breakout!) or as being irrelevant and thus in need of exclusion (outlier), both positions implicitly assume they already know what we’re trying to find out when projecting a player: his “true talent.” Recall the “general formula for player performance” from Monday’s post: performance = true talent + luck. The various methods that projection systems use (regression, weighted averages, age adjustments, etc.) are meant to take the (limited) data we have for a player and filter out luck in order to estimate his current true talent. These methods are predicated on the fact that we can’t pinpoint the player’s true talent given the limited performance samples we have, so we make our best estimate based on probabilities.

Labeling a single season as irrelevant or supremely relevant to estimating a player’s true talent implicitly assumes that one already knows that player’s true talent. One can certainly cite examples of each kind to support the case for a “breakout” or “outlier.” One could just as easily come up with (many more) examples of the opposite — where a perceived “breakout” or “outlier” turned out not to have the (in)significance assigned to it. But to do either obscures the important point. It is true that individual players age differently and deviate from expectations. However, projection systems only obtain the overall accuracy they have by projecting players as a whole based on the data on hand. An apparent “outlier” season from two years ago may weigh less heavily because time passing and/or, say, BABIP being regressed more heavily than other skills. An apparent “breakout” by a young player may have more impact on the projection because of age adjustment, greater playing time, etc. But projection systems do not and should not take these into account beyond their standard adjustments.

A famous German sabermetrician once wrote, “the owl of Minerva begins its flight only with the onset of dusk.” Although in retrospect we can look back on the careers of particular players and identify certain seasons as “outliers” or “breakouts,” this can only be done years later when we have a perspicuous overview of a period of a player’s career as a whole. Projection systems work in the midst of player performance without the benefit of historical perspective, and have to do the best they can based on the information at hand. Doing anything more would revoke the humble presuppositions upon which player projection rests.

Print This Post

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

27 Responses to “Outliers, Breakouts, and the Owl of Minerva”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Rut says:

    Ew, Hegel. Why?

    Vote -1 Vote +1

  2. Carson Cistulli says:

    Have you seen Hegel’s projections for this year?! He has all the German guys just crushing. I mean, Ryan Langerhans is a nice player, but I don’t see him posting a .430 wOBA.. And he’s got Ross Ohlendorf posting a negative ERA. Is that even possible?

    +37 Vote -1 Vote +1

    • Daern says:

      If I could thumbs up this multiple times, I would.

      Vote -1 Vote +1

    • Logan says:

      Some people just don’t get enough credit around here. Not talking about Langerhans or Ohlendorf either.

      Vote -1 Vote +1

    • LTG says:

      Negative ERA is the sublation of offense and defense, which are the determinate negations of each other and, so, unintelligible except in so far as they give rise to a further concept that includes both of them together. In other words, negative ERA is pitcher’s RBI > pitcher’s ER.

      Vote -1 Vote +1

  3. JoeR43 says:

    /slow clap

    Vote -1 Vote +1

  4. Mike Green says:

    Marx really went ape-woolies on his Cincinnati projections. Jay Bruce to hit a very bourgeois .340/.450/.600? Micah Owings winning both the Triple Crown and the ERA title did seem a bit revolutionary for me. Maybe he’s on the opiate of the masses.

    Vote -1 Vote +1

  5. Toddk says:

    So are you saying that Adrian Beltre in’t going to repeat his ’04 season numbers?

    Vote -1 Vote +1

  6. arch support says:

    I must compliment your title. At first I thought I’d be reading a sabremetric analysis of an as-yet unpublished Harry Potter novel.

    Vote -1 Vote +1

  7. Bill says:

    Sometimes “breakout” seasons are not necessarily unpredictable… for example a player learning a new pitch…

    Vote -1 Vote +1

    • Kevin S. says:

      BPro was all over Justin Upton last spring. They probably weren’t alone in that, though.

      Vote -1 Vote +1

    • First, it’s worth noting that my post doesn’t talk about _predicting_ breakout seasons, but rather giving them undo weight relative to previous seasons of the player’s peformance.

      That being said, there is something to the notion of looking at possible underlying causes for both “breakouts” and “outliers,” such as new plate approaches, new pitches, injuirse, etc., I wanted to mention this, but I was already going on too long. I wanted to discuss how recent research into “heat zones,” pitch f/x and stuff might alter projections, but I would still say that, at least at this point, we (or at least I) don’t know how those should be incorporated into projections in general, for everyone. How can you separate change in performance to the new pitch, plate approach, injury, etc. so that it’s quantifiable apart from the usual adjustments?

      So I’d say that while one might “predict” or explain a breakout season for a player with e.g, new pitch, what is really needed is a large study of all players who are somehow identified as having new pitches over a number of years, and to see what the effect is. The classification, quantifiaction, etc. of that is beyond me, but could be really useful. But at this point, simply knowing that a player “has a new pitch” is pretty vague without that other stuff; e.g., we’d at least need some sort of analysis of the sort that Mike Fast, Dave Allen, or others gives, and even now I’m not sure how you’d incorporate that into a statistical projections, although I’d love to hear thoughts on that from people “in the know.”

      Vote -1 Vote +1

    • lookatthosetwins says:

      Any and all information should be used when projecting a player. A completely computer-based projection is a good place to start, and adjustments can be made from there. If someone gained 50 pounds the same year he starts hitting bombs, maybe add a little to the HR projection. If a pitcher starts throwing a cutter and gets more strikeouts, add a little to the k total. If scouts are saying a player has improved his lateral movement and his UZR spikes, add a little to the UZR projection.

      The problem is when you take those qualitative observations, and use them as an excuse to ignore prior evidence. You still need to expect regression, just maybe less regression than normal.

      Vote -1 Vote +1

  8. Tim B. says:

    Excellent post.

    Vote -1 Vote +1

  9. Nathaniel Dawson says:

    A more accurate way to state that formula would be “production = true talent + random occurrence”.

    Vote -1 Vote +1

  10. MBD says:

    I plan to have my breakout season in 2010.

    Vote -1 Vote +1

  11. Bah! says:

    And here I am reading “Philosophy of Right” like a chump!

    Vote -1 Vote +1

  12. dudley says:

    It makes sense that one would weight a younger player’s unexpected production more heavily than a more established player, because the sample represents a greater proportion of the total data set for that player. E.g., we weight Aubrey Huff’s great season less than Justin Upton’s great season, because we have more data and therefore know more about Huff’s “true” talent level than we do about Upton.

    Vote -1 Vote +1

    • TCQ says:

      Well, yes, but I think the whole point of the article is to say that we do that too much. Just because a season makes up a large sample of a young player’s data set does not mean we shouldn’t be taking it with a grain of salt.

      Low sample size just means the data is unreliable, not that we should be taking said sample as gospel.

      Vote -1 Vote +1

  13. Ben says:

    While there are no truly luck-based statistics, there are statistics that will naturally have higher variation due to the nature of them (thinking BABIP in particular). Similarly, a derived statistic like FIP that tries to take into account only statistics that have a natural low variation allows you to tease apart the true talent vs. luck portions of a player’s production. Obviously, this use of these statistics has a necessary element of subjectivity, and relies on more information than just one or two statistics, but if it’s done correctly (or the numbers work out well) it can be pretty easy to call outliers outliers and breakouts breakouts.

    For instance, my personal claim to fame amongst my baseball friends is calling Wandy Rodriguez’s “breakout” season last year. By comparing his FIPs to BABIPs along with information like pitchf/x data it seemed only natural that he’d “regress” to being a very good pitcher (in each of the last three seasons leading up to this one his BABIP increased while his FIP decreased, resulting in his ERA staying roughly the same even though his pitch %s and speeds stayed roughly even) If “talent-based” statistics improve, while “luck-based” statistics don’t, it seems fair to label the increase in production an increase in talent by virtue of that equality. Obviously this isn’t an infallible system as they’re all just numbers derived from performance, but if you look at them right and take all the information you can into account to provide a vivid picture, it can in fact be doable to separate the “breakouts” from “outliers” with at least some consistency.

    Vote -1 Vote +1

  14. dxc says:

    1976-77 appeared to be a tremendous Breakout season for me with a AVCS+ of 2600.

    Unfortunately the extent to which I wanted to play to the exclusion of all else became an issue and my mother prevented me from a subsequent Super Breakout season.

    In no time is was common to see an AVCS+ of 5200, and it became clear that my Breakout season was instead an outlier, despite a brief resurgent Breakout 2000 with the Jaguars.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>