FanGraphs Baseball


RSS feed for comments on this post.

  1. But if I flip a coin 10 times in a row and it comes up ‘heads’ each time, surely the next flip is more likely to be ‘tails’ because I’M DUE.

    Right? Right.

    Comment by Yirmiyahu — April 13, 2011 @ 2:57 pm

  2. I with you on everything you said, but I still think there is a missing element to the whole thing.

    No player performs in a linear path and while Pujols has started well below his expected performance that will happen again during the season.

    A some point I agree there is a point where regression is the only thing at play and his performance for the rest of the season will only regress his numbers back towards his “true talent”, right now it seems to me like it could just as well be part of his low portion of his non-linear path to his expected numbers.

    Comment by odditie — April 13, 2011 @ 3:09 pm

  3. Wait, but it’s a contract year! This can’t be happening!

    Comment by Norm — April 13, 2011 @ 3:37 pm

  4. Coin flips are a great way to explain regression, because coins have no memory of what came before. So the next coin flip is completely independent of the prior coin flip.

    Human beings don’t work like this. A batter who is 0 for 4 KNOWS that he’s 0 for 4, and may behave differently at the plate than a guy who’s 3 for 4. He may behave differently for the best or for the worst, but we cannot say that his fifth at-bat is independent of the first four.

    The recent book “Scorecasting” addresses this phenomenon. The authors of this book note that when a golfer tries to make a birdie, he is less successful than when he lines up the exact same put for a par. Why? The authors call this “loss aversion” — human beings are more aggressive in trying to avoid losses than in accumulating gains. Evidently, this aggression helps golfers make puts. Might it also help Albert Pujols get hits? If so, then the “overcorrection fallacy” might not be a fallacy.

    My point is not that hitters who have gone 0 for 4 really are “due” when they come to the plate a fifth time. I tend to doubt that they are. I’m only saying that with human behavior, the past does affect the future in a way that could affect the regression path.

    Comment by Larry@IIATMS — April 13, 2011 @ 3:42 pm

  5. I would think golfers putting for birdie are more prone to sink the putt because they displayed a better skill set getting there in the first place. Guess I was wrong.

    Comment by neuter_your_dogma — April 13, 2011 @ 4:41 pm

  6. If a 5/10 free throw guy hit 0/10 in a game, I don’t expect him to have a 10/10 game tomorrow. But, I do expect him to have three 7/10 games in a row because that’s where his talent is: 5/10.

    Same rule applies to Pujos. We know he more than .200. We know .300 is where he should be. I don’t expect him to hit .400 next month but I expect him to hit .350 for quite a long time.

    Comment by MX — April 13, 2011 @ 4:58 pm

  7. No, basically the book says that golfers are like “Okay, it’s not a huge deal if I miss this, because I can just get par on the next shot.”

    Comment by Arjun — April 13, 2011 @ 4:58 pm

  8. No. There is a 50% chance of tails each time. It’s called Bayer’s rule.

    Comment by drmike — April 13, 2011 @ 5:01 pm

  9. He isn’t any more likely to hit .350 just because he had a slump. If he has been a .320 hitter over the last few years, and everything else remains the same, then he will likely hit .320 in the near-term.

    Comment by drmike — April 13, 2011 @ 5:04 pm

  10. Approach shots and putting are two different aspects of golf. You are indeed wrong.

    Comment by powder blues — April 13, 2011 @ 5:06 pm

  11. “If a 5/10 free throw guy hit 0/10 in a game, I don’t expect him to have a 10/10 game tomorrow. But, I do expect him to have three 7/10 games in a row because that’s where his talent is: 5/10.”

    I can’t tell if you’re being sarcastic or not, but if you’re being serious, then you’re simply wrong. If his talent is 5/10, then why would you expected 3 7/10 performances in a row?

    Comment by vivalajeter — April 13, 2011 @ 5:10 pm

  12. I think he was joking. Which means next time he won’t be joking.

    Comment by Matthias — April 13, 2011 @ 5:13 pm

  13. Albert Pujols has – WAR.

    I did not know my mouth was capable of forming those words.

    Now, I’ll probably never get to say them again.

    Comment by CircleChange11 — April 13, 2011 @ 5:15 pm

  14. The problem that analysts often mistake is that each at-bat is not an independent, random event. Psychological and physical factors can definitely change a player’s “true” talent level over time, even within the same season. An injured and pressing Pujols is not the same hitter that his career numbers show, not saying that is happening now. That will be the next level of sabrmetrics, where we can analyze the mechanics of each swing along multiple dimensions in both space and time through motion tracking. It will give us real insights into whether the player is having a spot of bad luck, or a fundamental change in performance.

    Comment by Phantom Stranger — April 13, 2011 @ 5:39 pm

  15. Only if he’s a fair coin.

    Comment by joser — April 13, 2011 @ 7:22 pm

  16. The authors of this book note that when a golfer tries to make a birdie, he is less successful than when he lines up the exact same put for a par. Why? The authors call this “loss aversion” — human beings are more aggressive in trying to avoid losses than in accumulating gains. Evidently, this aggression helps golfers make puts.

    Is the expected number of strokes taken also lower? It sounds like they’re saying that a golfer will be more likely to hit the ball harder when trying to make par — but while that would increase the number of times they make the putt, it would also seem to increase the number of times they hit the ball past the hole and wind up three-putting.

    Comment by matt w — April 13, 2011 @ 7:52 pm

  17. That sounds about right to me, let’s roll.

    Comment by One Armed Bandit — April 13, 2011 @ 8:04 pm

  18. This.

    And on a side note, I swear I see no less than 10 articles with this basic theme each year, which act as if regression somehow ingrained in reality. News flash: Probability models are… MODELS. They are not reality.

    In this case, assuming regression has two issues as a predictive model:
    1. Our estimate of true talent depends on the observed prior events. (So if a player hits a slump, this information should update our estimate of their true talent, to some degree)

    2. The events are not IID anyways. As Phantom notes, good evidence exists that the process of baseball player performance is non-stationary.

    Based on these two issues, I can think of a variety of models which would taken Pujols performance as an indicator that his performance over the balance of the season would be better, same, or worse than we would have predicted before the slump. (i.e. Maybe he has a small injury, and after recovering is likely to rake. Or maybe he has a deteriorating knee, which is slowly eroding his ability. Or maybe it is just the result of purely random noise, so we can expect regression). The moral of the story is: We just plain don’t know. To my knowledge, nobody has done a lot of reliable modeling that can tell when a slump is likely to precede a breakout, a breakdown, or a return to prior estimates.

    Rather than beating us over the heads with explanations of “regression” to a “true talent mean” (whatever that is), why are we not seeing more articles on the unexplained factors involved? The first factor that would seem to just beg for modeling happens to be injuries: Can we use news data, DL stints, days off, and types of injuries to learn more about patterns of player performance?

    Comment by B N — April 13, 2011 @ 8:22 pm

  19. I’m not sure about your last sentence, but I imagine people have already studied the indepence of at bats. My guess would be at bats are more less independent, but I don’t know. Can anybody else point to reputable work on this?

    Comment by DJG — April 13, 2011 @ 8:33 pm

  20. To be able to do an independence study, you need to study approach, not results. For example, if a hitter in slump expands the strike zone in an AB compared to his earlier periods of success, and if this expansion is repeated AB after AB. If the latter happens successively, then we may have a violation of independence assumption. If not, then we cannot reject the independence assumption. You can also think of other metrics that might be useful provided the data exists (reaction time to a fastball, some measure of balance, etc.).

    In other words, scouting that is quantifiable. I don’t know of any data source that have quantified scouting data to perform such an analysis.

    Comment by Sam — April 13, 2011 @ 11:22 pm

  21. Some players do better in a contract year because they work harder in the off season. For those who have a contract year every year like Pujols, a “real” contract year could cause them to underperform if they press more.

    I expect Pujols is trying to do too much.

    Comment by pft — April 13, 2011 @ 11:58 pm

  22. This is an interesting way of looking at things. Can someone reply to this?

    Comment by Worry — April 13, 2011 @ 11:59 pm

  23. I had in mind a study like the following. Look at a hitter. Look at all the PAs in which they got on base. Then record what they did after this PA. This will give you an “after on base” OBP (or whatever metric we want to use).

    Then do the same thing for PAs in which they didn’t get on base to get an “after out” OBP.

    If the two OBPs agree (and agree with a player’s baseline OBP) then this suggest at bats are independent.

    This is a simplified example, but I’m wondering if this type of study has been carried out. I imagine that it probably has.

    Comment by DJG — April 14, 2011 @ 12:09 am

  24. Vegas would love you. Regardless of how many times in a row you land on heads, the next ‘flip’ is completely independant of all the previous flips and the odds for every flip are always 50/50.

    Comment by Scott G — April 14, 2011 @ 1:12 am

  25. One thing people don’t realize about regression is that it’s the most basic appliation of the Law of Large Numbers. It’s harder to understand when we are dealing with numbers in the 10s or 100s, like in baseball, but when dealing in the 100,000s or higher, it becomes a lot easier to understand.

    So, if for some test,, where the independent probability of any test being successful is 50%, and you are performing 100,000 tests; if after 1,000 tests, your success rate is 100/1,000, that doesn’t mean that you are likely to have 49,900 successes in the next 99,000 tests, to get your overall success rate to 50,000 out of 100,000. All it says is that over the next 99,000 tests, the probability is that you will have success in half of those or 49,500 success. If this happens, your overal reults will be 49,600 successes in 100,000 tests, or a rate of .49.6%, which is really close to 50%. As you increase the amount of tests, this rate will continually approach 50%.

    In short, regression and the Law of Large Numbers doesn’t claim that more tests will “make up” for past tests; it just says that over a huge sample size, the rate of success will aproach the independent probability of any one test. This might be a bit of a convoluted explanation, but it’s what helps me understand it.

    Comment by Fatalotti — April 14, 2011 @ 7:08 am

  26. Regression is not about returning to a true talent level, but that the exceptional were exceptional in part because (a) they were good, (b) they had good luck. At the Masters half of the players don’t make the cut to play day 3 and 4 based on their scores. Then the average performance of those who did make the cut decreases relative to their day 1 and 2 performance. The announcers blame the course, but it is mean reversion–you selected a bunch of people, who, more likely than not, had some good luck getting there. The ones who had bad luck were removed.

    For Pujols, mean reversion means that his 2011 is likely to be more like other players 2011 than his previous years were like other players previous years. So far, that prediction looks like it will be spot on.

    Comment by Barkey Walker — April 14, 2011 @ 9:17 am

  27. There are two theories of gambling. There is the “hot hand” which means that if you streak, you are expected to keep streaking. I forget the name of the other, but is basically that some things are “due” for good/bad luck based on previous luck of the other kind.

    You can see both in the lottery. People expect numbers that have already won not to win again (so there are books of previously winning numbers) and people expect gas stations that sell winning tickets to keep selling winning tickets. Both are incorrect.

    Comment by Barkey Walker — April 14, 2011 @ 9:20 am

  28. Could be, yes. Many things could be. Indeed, it wouldn’t be surprising at all. As a matter of probability, though, it’s not the most probable outcome.

    Comment by The Ancient Mariner — April 14, 2011 @ 9:22 am

  29. DJG: too much noise for that to be meaningful — too many independent variables in play. Sam’s right about what would need to be done, I think.

    Comment by The Ancient Mariner — April 14, 2011 @ 9:25 am

  30. “What if their current performance … is rooted in some sort of skill level change?”

    Aaron Hill might make a good case study here.

    It’s been oft-written that his abysmal .196 BABIP last year was “bad luck”. If so, he’s surely taking his time “regressing” to his “norm”, given his .200 BABIP so far this season (albeit based on a teeny tiny sample size).

    Whatever the cause(s), mechanics, pitch recognition, pitching patterns, his problems are starting to look less aberrational.

    Comment by Lister — April 14, 2011 @ 11:40 am

  31. Your first sentence is, quite simply, flat out wrong. Regression compensates for luck, but the thing you regress to has to be an estimate of true talent.

    Regression is about returning to true talent level. Luck – good or bad – is just noise around that expected level.

    You’re misunderstanding why we regress and the inclusion of league average in regression.

    We regress to league average because we only learn so much about what to expect from a player with a certain number of ABs (or whatever appropriate trial type we’re concerned with), and the best (simple) assumption in that case is that they’re closer to average than they appear, so we regress them to league average a certain amount.

    This has nothing to do with cancelling out luck, though it does help to compensate for lucky events.

    Comment by Patrick — April 14, 2011 @ 1:50 pm

  32. Hmm. The authors did not discuss three-putting, and that’s a damn interesting question. But the point lies elsewhere: athletes take different approaches to a given performance based on loss aversion, and these approaches affect the performance.

    It might be the case that those aggressive puts for par sometimes end up in double-bogeys. Perhaps the frequency of those double-bogeys exactly counterbalances the greater frequency of making those par puts, so that everything is even-steven. But that would be remarkable! More likely, the more aggressive approach to par putting is either a good strategy or a bad strategy overall, and either case proves my point: with humans, the past affects the future in ways that could affect the regression path.

    Comment by Larry@IIATMS — April 14, 2011 @ 2:13 pm

  33. So, reversion to the mean is not about reverting towards… the mean?

    It is not the case that “the thing you regress to has to be an estimate of true talent.” You regress towards the mean.

    You are confusing two topics. (1) Over the long run, players will perform at a level consistent with their true talent. This is the definition of true talent. (2) Mean reversion, which says that those with higher measured values are luckier than those with low measured values.

    Sons will tend to be more average height than their fathers and that fathers will tend to be more average height than their sons.

    Sir Francis Galton was a eugenicist and interested in how desirable characteristics pass from generation to generation and learned that when he found parents with some great characteristics, their offspring tended to be more like the population as a whole (more average) than their parents. He called this process regression and invented regression analysis to study its extent.

    Applied to baseball, it means that if you take the batting average of the top 10 players in 2009, and take their batting average in 2010, it will be lower, or (more precisely) closer to the average. The way that they go to be in the to group was part that they were really good batters and part that they were lucky. The average is not just some point chosen because it is there and useful–it is what you revert to!

    Comment by Barkey Walker — April 14, 2011 @ 3:17 pm

  34. What is Pujols mean that we are talking about a .320 hitter? What is the statistical probability of a .320 hitter having a 10 game slump like this Pujols had to begin the season? I think we are assuming way too much knowledge to think that Pujols will or will not regress and how much he will? don’t projections take into account 10 games of hitting under .200? It seems like I could pick out 10 games from any year in his career and see something like this, maybe not 10 consecutive games, but 10 games….

    Comment by jimiu — April 14, 2011 @ 6:38 pm

  35. I wanted to reply to this, but I can’t regress to that low of an intelligence level.

    Comment by GiantHusker — April 16, 2011 @ 1:01 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *

Close this window.

0.166 Powered by WordPress