The Humility of Statistical Projection

It’s “projection week” here at FanGraphs, which is a nice coincidence, since I was going to post about projections, anyway. While I dabble with my own projections (which probably will never see the light of day), no one wants to hear about that. Instead, I’ve just assembled some (very) non-technical reminders that might be helpful when looking at projections.

I’ve often heard the complaint that projections are “arrogant,” “put too much faith in the numbers,” or the classic “they rely on what a player has already done, but they don’t tell you want a player will do.” I want to emphasize that projection systems are not based on esoteric “tricks,” but rather are based on the fact that we don’t know very much about the player from the numbers.

Projection is not divination. I’ve sometimes heard that projection systems aren’t worth looking at because “after all, they projected an .800 OPS for player x and he ended up with an .850 OPS.” That’s a straw man, but it gets at the general point: projections are not prophetic divinations of the future, but attempts to measure what the “true talent” of players at any given point in time. The “general formula” for player performance is: true talent + luck + environment. (I’ll table discussion of parks and aging for now.)

The problem is that we don’t know, at least from the raw stats, what exactly is “luck” and what represents a player’s “true talent.” Moreover, “luck” doesn’t just mean things like BABIP rates. Even a player getting 700 PA in a season will have varying levels of performance around his true talent, what we call “hot streaks” or “cold streaks.” (Cf. Willie Bloomquist, April 2009.) To single these streaks out begs the question: how do we distinguish the “streaks” from the “true talent” parts of the seasons from which the projections draw? Projection systems use different methods; here I’ll mention basic factors that are used by most good projection systems. This may be old hat, but they are worth discussing because of how often they are passed over.

Regression to the mean. This is a very important concept, so important that I’m leery of screwing up the explanation. The best introductory piece I’ve read is one by Dave Studeman. In short: given a lack of any other information about a player, our “best guess” is that he’s an average member of (some particular) population. The more data we have on the player, the more we can separate him from the “average” population. This is one place where sample size issues come into play. [Note that there is a great deal of debate about how to regress, e.g., what the "population" should be. For examples, search at The Book Blog or Baseball Think Factory.]

Weighted average. Say a projection involves the last three years of performance. Do you simply take the three year average? Well, no, true talent can change from year to year. More recent years are thus weighted more heavily (5-4-3 for hitters and 5-3-2 for pitchers are common weights). Alex Gordon had a .321 major-league wOBA in 2009, and a .344 in 2008. Do we automatically assume that .321 is closer to his true talent? No, because the .321 was in only 189 PA, while the .344 was in 571 PA.

This isn’t all there is to projection, but you’d be surprised how much work those basic concepts do. Tom Tango’s Marcel works entirely from a weighted average, regression, and a very basic age adjustment, and it hangs in with the “big boys” pretty well. No projection system will ever be perfect, of course. Part of that is the influence of “luck” and the limited samples we have from all players. Part of it is also that some players don’t have that much information available on them. Players develop differently.

The point is that we simply don’t know ahead of time which players will be exceptions. Projection systems generally do better when looking at how the project groups of players, rather than focusing in on individual successes or failures, as in the case of Matt Wieters (ahem). The point I’ve been trying to make in a roundabout way is that regression, weighted averages, generic aging curves, etc. might miss out on certain players, but are based on studies that show how most players would do. They are humble confessions of ignorance on an individual level, but are still the best overall bet. Expecting anything more leads to folly.

One might express the difference as that between a making a conservative, diversified investment and “just knowing” that Enron stock will continue to rise. Tough choice.

More later this week on “breakouts,” “outliers,” and other traps.

Print This Post

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

12 Responses to “The Humility of Statistical Projection”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. JoeR43 says:

    Don’t forget that when dealing with continuous probability distributions, such as projecting a player’s worth, the probability of any individual point being correct is 0.

    Not to mention younger players will obviously have a bigger standard deviation in their projections.

    And of course, if each individual player has a 1 in 10 chance of hitting their 90th percentile mark or higher (obvious statement alert), then 1 out of 10 guys will do just that without any special circumstance causing it (such as increased playing time due to injuries, or constant favorable matchups, etc). Just because Marco Scutaro came out of nowhere to hit for a .351 wOBA doesn’t mean the system’s broken. That’s statistics.

    Vote -1 Vote +1

    • fasho. Very much in line with the point I’m trying to make, but made more concisely and intelligently.

      I go on too long as it is, but I did want to emphasize that this is my “dumb guy’s” understanding, on a very simple level. I assume people can follow links and do searches if they’re interested in more — there’s tons out there for those interested.

      Vote -1 Vote +1

      • JoeR43 says:

        It’s just obvious that people do not understand the point of using statistics and metrics to project.

        They think when guys try to use numbers, that said number-users are creating a certainty in their mind. Untrue. They’re trying to maximize the likelyhood for a good result.

        In 2009, 284 players received enough playing time for 300+ PA’s. Most guys will give you relatively close to what you expect (A.J. Pierzynski, for instance). Some will have inexpicable bad years offensively (Rollins). Some slam down their highest expectations, like Mike Bourn.

        Vote -1 Vote +1

    • Joser says:

      Or, in the wonderful words of MGL

      “Let’s say that we had a sophisticated device for measuring the result of a coin flip. Let’s call it, the “looking at the coin lying on the floor” device. OK, we flip a coin 50 times and it comes up 28 heads and 22 tails. No big deal, right? Now we flip it again 50 times and it comes up 23 heads and 27 tails. Oh my God, there must be something wrong with our measuring device!

      Get my point? I hope so. “

      Vote -1 Vote +1

  2. Rodney King says:

    Begs the question does NOT mean raises the question.

    Vote -1 Vote +1

    • jthomas says:

      Matt correctly uses ‘begs the question’ here.
      Assuming that players have a true talent level separate from hot/cold streaks relies on the assumption that hot/cold streaks are not indicative of true talent level. Such a premise relies on circular logic, or a claim that ‘begs the question’.

      Vote -1 Vote +1

      • Rodney King says:

        You’re right, he does- sorry for my quick trigger- I read the question posed right after his usage of the phrase, and took this to mean it was a question raised by the data, rather than the proper usage. I will now praise Matt for referring properly to the fallacy, and apologize for being so quick to judge and poor in my own wannabe editing; not to derail but the misusage of “beg the question” is probably the second most common annoying error among otherwise intelligent bloggers, behind “irony~coincidence”. I wish I could claim that my initial post was intended to be ironic, in that I was praising Matt for using the phrase correctly, but this is not the case. Sorry, and thanks to jthomas for pointing out my mistake.

        Matt: nice article, and interesting analysis of the various viewpoints w/r/t the usefulness of such projections. Some recent pieces on this site have not lived up to the established high standards of Fangraphs, but this one only serves to reinforce the lofty quality standards we have come to expect (and sadly, I admit, sometimes demand) from Fangraphs authors.

        Vote -1 Vote +1

  3. greenback06 says:

    Another point to be made is that the law of large numbers doesn’t work so well when you’re dealing with, well, individuals. You don’t ask an actuary how long you have to live, you ask a doctor, specifically your doctor.

    Vote -1 Vote +1

    • AxDxMx says:

      You could ask an actuary, and they could factor in all kinds of things to give you some probabilities on how long you’ll live. Your doctor would probably be able to guess about the same too. He’s just doing the fan’s projection, while the actuary is doing the more intensive projection. Large amounts of data on similar people with similar conditions should give rise to a regression analysis that would be a middle of the road projection for you personally representing your true talent. You could either stretch it out with healthy living, or completely tank it and have a coronary by age 30. Once we know what all the human genes do, getting your DNA sequenced should give you an even better idea of life expectancy, and at that point, I’m guessing the actuary is heavily involved with the calculations involved in that.

      Vote -1 Vote +1

  4. NYRoyal says:

    Great points. These are the kinds of things everyone should remember when they feel the urge to criticize a particular projection that doesn’t look right. I’d also throw in “small sample size.” Projections come from data. And when there is little data (particularly little MLB data), projections are inherently less reliable.

    Vote -1 Vote +1

  5. lazlo_toth says:

    I don’t think we’re talking about anything here that doesn’t happen in statistical analysis in any other field. The tighter the focus, (i.e. a single player, or a particularly short period of time), the less accurate any kind of numbers you generate are going to be. You pull back and look at larger trends, and it tends to work pretty well. BP does a great job every year, and every projection they do lists a % chance for something crazy to happen, whether it involves a guy just going way over what they think or going under. It happens, and to use it to knock the process when it does says less about the supposed inadequacies of the analysis than it does a lack of understanding of statistics on the part of the person doing the criticizing. These are educated guesses, we know that. But they are VERY educated guesses.

    And besides, if you really could predict this stuff with complete accuracy, there wouldn’t be a reason to actually play the games anymore.

    Vote -1 Vote +1

  6. In this awesome scheme of things you actually receive a B- just for hard work. Where exactly you confused me personally was in your facts. You know, as the maxim goes, details make or break the argument.. And that could not be more accurate right here. Having said that, allow me reveal to you what exactly did do the job. Your article (parts of it) is certainly pretty persuasive which is possibly why I am taking the effort in order to comment. I do not make it a regular habit of doing that. Secondly, whilst I can easily notice the leaps in reason you come up with, I am not sure of just how you seem to connect the details which make the actual conclusion. For now I shall subscribe to your point but hope in the near future you connect the facts better.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *