FanGraphs Logo

We’re Going Streaking

We are happy to present a two-part guest post by Seth Samuels, who takes an in-depth look at a topic that is often a source of disagreement. Part two will run tomorrow.

Last summer, I was catching up with Fangraphs founder (and my elementary school classmate) David Appelman when he mentioned an interest in being able to identify streakiness in baseball players. Baseball announcers and writers are often criticized for psychoanalyzing a player’s current hot or cold streak, even though those streaks may often be a function of small sample sizes. A full season, however, is a much larger sample than five games. So it certainly seems reasonable that some players might tend to be streakier than others over the course of a full year.

As both a Mets fan and an occasional fantasy player, I’ve repeatedly seen my teams — both real and imaginary — bolstered by surges and short-circuited by cold spells. So being able to identify which players are most or least likely to go on streaks would be a useful tool.

I’m certainly not the first person to look at this question. Most notably, Jim Albert and Jay Bennett discussed the subject a bit in chapter five of their excellent book Curve Ball, using randomization to determine that Todd Zeile showed signs of having been a legitimately streaky hitter. They term this trait “streaky ability,” to distinguish it from “observed streakiness,” which they use to describe the small sample mistakes referred to above. In Curve Ball, Albert and Bennett randomly simulate Zeile’s 1999 season, and look at the fluctuations in Zeile’s moving batting average over eight game stretches in those simulated versions of Zeile’s 1999 season. They then compare those fluctuations to the fluctuations from his actual 1999 season, and find that, in the first half of 1999, Zeile was streakier than simple randomness would suggest.

They argue that this indicates that Zeile has a great deal of “streaky ability.” However, Albert and Bennett are quick to point out that their study suffers from selection bias — they specifically chose Zeile because he had a reputation for streakiness. In 2008, Albert returned to the subject in a paper, “Streaky Hitting in Baseball,” in the Journal of Quantitative Analysis in Sports. In this paper, Albert looks at the streakiness of all players in 2005, using batting average, home run rate, and strikeout rate. However, he finds no relationship between streakiness in one category and streakiness in another. So what happens if we look for Albert and Bennett’s “streaky ability” using a more sophisticated metric and over a longer time period?

We can easily adapt the Albert and Bennett approach to all players, with a slight update to the methodology. In particular, I’ve chosen to use wOBA, rather than batting average, because, as most readers of this site are no doubt aware, it does a far better job of capturing a player’s actual value (and fluctuations thereof). For those who are not familiar with wOBA, it is a catch-all stat developed by Tom Tango, which measures a player’s overall contribution to run-scoring by placing extra weight on more valuable hit types, and which is scaled to look like on-base percentage. So a .400 wOBA is just as excellent as a .400 on-base percentage, and a .300 wOBA is just as poor as a .300 on-base percentage.

Being a Mets fan, I’ll use David Wright’s seemingly consistent 2007 and seemingly streaky 2010 to demonstrate the process. As was recently discussed by Bill Petti, Wright seems to have evolved from a very consistent hitter early in his career to a very volatile one more recently, although there is some evidence that this may have been a fluke. I’ll note here that, though Bill and I have somewhat similar approaches, we arrived at them independently — a nice example of synchronicity. My analysis is a bit more technically involved and is ultimately applied to a larger sample, but I’d encourage you to read Bill’s work too.

Using data from Retrosheet, I started by calculating Wright’s moving wOBA for every seven-day period during the 2007 season. This is plotted below in blue, with the red line representing Wright’s full-season wOBA:

For each point on the x axis, I then take the absolute value of the distance between the moving wOBA and the full-season wOBA. Next, I take the average of all of these distances for the whole season, weighted by the number of plate appearances in each seven-day window. The weighting serves two purposes: first, it helps to avoid placing too much emphasis on small samples — if a player comes to the plate thirty times in one window and only five in another, we care more about his performance in the first. Second, it means that if a player is injured and misses playing time, he will not be punished for his .000 wOBA during that time. Using dates instead of games or plate appearances also helps to account for injuries, as a player who goes on a hot streak and then misses a month with injury should not be viewed as continuing the same hot streak when he returns.

The resulting calculation is our raw streakiness statistic. In essence, this boils down to a weighted average area of the blue region in the plot below:

Wright’s raw streakiness in 2007 was .072. For streakier players, we would expect more frequent and extreme divergence from the full-season wOBA, and therefore a higher resulting streakiness statistic. As noted earlier, David Wright appears to have gotten streakier in recent years. Sure enough, in 2010 Wright’s raw streakiness came in at .101. Wright’s 2010 performance is plotted below:

As we can see, Wright’s performance peaked around the same place in both 2007 and 2010, maxing out at .656 in 2007 and .659 in 2010. However, because his full-season performance was so different in the two years, the peak represents a .292 deviation in 2010, compared with only .238 in 2007. Moreover, in 2007, Wright’s seven-day wOBA never dropped below .227. In 2010, it got as low as .113. Clearly, Wright was much streakier in 2010 than in 2007.

But just how streaky is Wright’s performance really? How does it compare to the rest of the league, for example? This question is more complicated than it may seem at first. The problem is that players with a greater range of results will tend to have greater variation in the value of their performance in general, and will therefore exhibit greater fluctuations. Wright will have his share of outs, singles, doubles, triples, and homers. Luis Castillo will rarely do anything other than single or make an out. So we need to make sure that we are not accusing Wright of streakiness just for being a better hitter than Luis Castillo. Therefore, in order to compare Wright’s streakiness to the rest of the league, we first need to compare it to random chance.

The trick is to borrow (and modify) an idea from Albert and Bennett. Let’s go back to David Wright’s 2010 season. Using the Retrosheet data mentioned earlier, we can randomly simulate Wright’s 2010 season many times over, and see how the universe of simulated David Wrights compares to the real one. While Albert and Bennett used a random simulation method, which allows changes to the bottom line, I prefer something called permutation inference. In our simulations, Wright’s overall 2010 performance does not change. He still has exactly 661 plate appearances (excluding intentional walks), 60 walks, 98 singles, 36 doubles, 3 triples, and 29 home runs. The only thing that will change in our simulations is the order in which those things occurred.

We also assume that the dates remain constant. This will allow us to calculated simulated values for Wright’s seven-day wOBA, and compare his simulated streakiness to his actual result. So, for example, here are Wright’s first ten plate appearances of 2010, along with his first ten plate appearances in five different simulations:

Each of those results in the simulations corresponds to a true at-bat from Wright’s actual season. So, for example, in a given simulation, Wright’s first plate appearance may be replaced by his 247th, his second may be replaced by his tenth, and so on, with his actual first and second plate appearances showing up later. There are 661! possible permutations of Wright’s 661 plate appearances. That’s well over 10 x 10100 and far more than we could possibly calculate. Fortunately, by randomly reordering Wright’s 2010 plate appearances a large number of times, we can closely approximate the actual distribution of possible streakiness scores that Wright could have posted. This allows us to figure out how streaky Wright was in 2010 compared to random chance.

Let’s try simulating Wright’s 2010 season 10,000 times. It’s often easy to forget how streaky even pure randomness can be. So, just to give a sense of that, here’s the least streaky simulation of Wright’s 2010 season, in which he posts a raw streakiness of .053:

One might still consider that a pretty streaky ballplayer. That’s a particularly nasty cold streak in July, worse than any the real Wright endured in 2007. By contrast, here is the most extreme case, in which simulated David Wright’s raw streakiness is .126:

That’s a pretty volatile player, to say the least. Ultimately, however, we’re concerned about the distribution of simulations, not the extremes. In all, out of 10,000 simulations, Wright’s actual streakiness was more extreme than 9,312 of them, suggesting that he was streaky, even in comparison to the full range of possibilities. Here is the full distribution of possible streakiness values for Wright in 2010, with the red line representing his true result:

Our methodology finds that Wright’s 2010 season was streakier than 93.1% of possible seasons, given his performance. So we can assign Wright a true streakiness value of .931. Looking back at Wright’s consistent-looking 2007, we see that his true streakiness was .142. So, even after accounting for the possibility that Wright’s streakiness was largely a function of randomness, we find that he did indeed go from being very consistent in 2007 to extremely streaky in 2010.

So, now that we have a framework for assessing streakiness, we can apply that framework across the game, and see what it tells us about the volatility of all players. The use of permutation inference also means that, should we choose, we can apply this same method with other statistics — contact rate, slugging percentage, and ERA (for pitchers), just to name a few. Tomorrow, I’ll apply this approach to the entire league, and see how David Wright’s streakiness compares with that of other players.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.




Print This Post

63 Responses to “We’re Going Streaking”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. Oscar says:

    What an incredible article. Great work.

    Vote -1 Vote +1

  2. Mark Kieffer says:

    Good work. I love this website. I check it out everyday. Inspires me to do research on topics as well.

    Vote -1 Vote +1

  3. Danny says:

    Brilliant. Very much enjoyed reading this article.

    Vote -1 Vote +1

  4. Albert Lyu says:

    This is incredible. Looking forward to tomorrow’s article. Curious to see if Wright’s 2010 compared to 10,000 Wright simulation distribution would be similar to Wright in 2010 compared to all players in 2010. Very awesome.

    Vote -1 Vote +1

  5. Alan says:

    always nice to see new and original endeavors…looking forward to the rest

    Vote -1 Vote +1

    • CircleChange11 says:

      That’s what i was thinking. I’m not certain how reliable some of the newer stuff is, especially when compared to simulations.

      But, I could understand exactly what was being measured and compared, due to the quality of the writing and graphs.

      I appreciate that people are continually examining new situations and ideas.

      I think with some of the new FX technology, etc there is going to be a lot of new things to look at and study.

      I appreciate the article.

      Vote -1 Vote +1

  6. lex logan says:

    Interesting article. Wright’s 2010 season seems to have a “p-value” of about .07, or 1 chance in 14 that such a season could occur by chance alone — not a startling or statistically significant result. With on the order of 200 players per year having more or less full seasons, we could expect one player per season to exhibit a streakiness of around .995. So, pending part two, this approach appears to ratify the random chance model of streakiness.

    Vote -1 Vote +1

  7. Greg says:

    Also would be interested to know what sort of year over year correlations exist in this streakiness factor, though I suspect that is not within the scope of your research for this article. Are there Todd Zeiles out there who consistently find their results on the high side of their own random streakiness distribution, or likewise the distribution across all of baseball?

    Vote -1 Vote +1

    • Seth Samuels says:

      Greg,

      Fear not. That’s coming in part two.

      -Seth

      Vote -1 Vote +1

    • Lee says:

      It’s a small point, but one worth noting: using a true random distribution of at bat outcomes is going to inherently, on average, be less streaky than reality. In the real world a batter has to face (potentially a really good) pitcher 3 or 4 times in a row, or play in a pitcher/hitter friendly ballpark 3-4 games in a row, or play against a poor defense 3-4 games in a row, or play with an injury many games in a row. Using the mathematically random distribution of at bat outcomes will make everyone look streakier than they actually are.

      The next logical step would be to compare a player to the rest of the league, using your pure random as a baseline. This too has it’s drawbacks, as you’ve mentioned, in that a better hitter will inherently be more streaky than a weaker batter.

      It will be interesting to see if you can come up with a compelling factor for streakiness that doesn’t leave any logic holes.

      Vote -1 Vote +1

      • filihok says:

        Excellent point.

        In the simulations David Wright faced a ‘David Wright Average’ pitcher in every at bat.

        In real life David Wright faced Roy Halladay 12 times and went 1 – 12 with 7 K’s.

        Still an amazing article though.

        Vote -1 Vote +1

      • Pete says:

        This is a very important point. I’m willing to bet that a large chunk of “streakiness” is actually due to the quality of pitching faced. To the point where “streakiness” is probably almost always insignificant.

        Vote -1 Vote +1

      • Seth Samuels says:

        Filihok and Pete,

        I don’t disagree that it’s important, but probably not as much as you’d think. Over a seven-day period a starting player will usually get somewhere between 25 and 40 plate appearances. If 4 or 5 of those come against Halladay or Lincecum and the next 4 or 5 come against Alay Soler (out of retirement!) it actually doesn’t have *that* big an impact unless the batter totally flops against the first two and then tees off on the scrub. And even if that does happen, there’s more going on than just the strength of the opposition.

        As for Pete’s comment on significance, well, check back tomorrow. You might be surprised.

        Vote -1 Vote +1

      • CircleChange11 says:

        This is a very important point. I’m willing to bet that a large chunk of “streakiness” is actually due to the quality of pitching faced.

        In a single series, yes. Over a 7-game stretch, probably not.

        In order for that to be legit, you’d basically have to play Philly, and then travel to SF … facing to top pitchers for each team.

        Back in college (early 90s)we used to say “Luckier than a fat chick in Atlanta.”, referring to ATL’s pitching rotation as being an “immediate slump”, and well, Mark Grace informed all of us the proper way to end a bad streak (Slump-Buster).

        Vote -1 Vote +1

  8. Lee says:

    Meant for my comment to a new thread, doesn’t really matter though.

    Also – regarding my comment about playing with injuries – this may be precisely the kind of thing you WANT to measure, but the other factors (pitchers 3 ABs in a row, parks 3/4 games in a row) are likely inescapable noise.

    Vote -1 Vote +1

    • Seth Samuels says:

      Lee,

      That is an excellent point about parks and opposing pitchers. I think accounting for that would be beyond the scope of my research, unfortunately. With luck, it shouldn’t be an issue when comparing some players to others. As for playing through injuries, that strikes me as something of a Sisyphean task, since there’s no way of knowing how injured a player is, or whether they’re hiding an injury. Furthermore, if a player is someone who will play through injuries every year and be streakier as a result, it might be worth including that. At any rate, my main interest is in comparing players to each other. This is meant to be a best effort, not a perfect-world product.

      I’ll let you draw your own conclusions from tomorrow’s post.

      -Seth

      Vote -1 Vote +1

      • Lee says:

        Agreed. Injuries are going to be noise no matter what, no way to quantify it. Even the pitchers/parks will probably be near impossible to work around. But maybe consider finding a sliding scale correlation between wOBA and streakiness, and use that to weight the league average of streakiness, when comparing a player to the league. So good hitters don’t look more streaky than poor ones.

        Vote -1 Vote +1

      • Seth Samuels says:

        Well, the p-value (which I call “true streakiness” in the article) accounts for the wOBA. It’s possible I didn’t explain this clearly, but the random distribution is re-run for each individual player. It completely removes any correlation between wOBA and streakiness–the results bear this out.

        The problem with trying to model streakiness off of wOBA is that it’s not just the wOBA that affects it, it’s actually the whole distribution of possible outcomes. Players who are more extreme (meaning lots of homers, lots of outs, little else) might appear artificially streakier than a better player who mixes in more walks, singles, doubles, and triples. To give an example, when I look at the median of a player’s distribution, the most artificially streaky player of the last 10 years was Barry Bonds in 2001, who had a .539 wOBA. That shouldn’t surprise anyone. But ranked third is Mike Jacobs in 2008, with a 0.337 wOBA.

        Vote -1 Vote +1

  9. Franco says:

    If you’re looking for long bouts of streakiness for an example, doesn’t Pat Burrell have the biggest rep for going months in either extreme each year?

    Vote -1 Vote +1

    • Seth Samuels says:

      Perhaps, but I hate him.

      Vote -1 Vote +1

    • Seth Samuels says:

      Sorry, I couldn’t resist that. In all seriousness though, I had never heard of Burrell’s reputation for streakiness. As a Mets fan though, I’ve heard a lot about Wright’s. My guess is that every team has a player that fans think of as being really streaky. I used Wright as an illustrative example of the method because he has a reputation among the Mets fan base, and because I’ve been a Wright fan since he was drafted with the pick-the-Mets-got-so-Mike-Hampton’s-kids-could-get-a-better-education. But the point is really the method, not the player, so I don’t think the particular choice of player is all that important to the overall theme.

      I can tell you that Burrell’s streakiness appears to be pretty unremarkable, with his 2007 true streakiness of .886 being the highest it gets. My guess is that as a guy who hits a lot of HR and strikes out a lot, he might appear streakier than he is.

      Vote -1 Vote +1

  10. MikeM says:

    Great, great article. Really enjoyed reading it, and am looking forward to the rest.

    Vote -1 Vote +1

  11. Danmay says:

    I can’t wait to see part two.

    Twenty days in, this is my favorite post of 2011.

    Vote -1 Vote +1

  12. Jerry says:

    Gotta love statistics, this is a great article and great work.

    Vote -1 Vote +1

  13. J-Doug says:

    Seth, you could adjust for park by adjusting the wOBA components for each part using JinAZ’s component park factors: http://www.beyondtheboxscore.com/2011/1/5/1915431/playing-in-parks-component-park-factors-2006-2010

    It’d be a bit more tedious, but it’s a good way to avoid park effects. I’d imagine they’re probably rather strong for Mets, considering that the R/G average at CitiField is near the bottom.

    Vote -1 Vote +1

    • Seth Samuels says:

      J-Doug,

      It’s an interesting idea, but I think there’s a decent chance it would just add extra noise. Once you start adjusting not only for things that do happen, but things that *don’t* (e.g. adjusting for the fact that a park suppresses triples or home runs means giving credit for them on all balls in play), you’re introducing a lot of extra measurement error. At the end of the day, what matters is the ability to compare one player to the next. Without giving anything away, I think you’ll be satisfied that this doesn’t have much influence on the final results.

      -Seth

      Vote -1 Vote +1

  14. Sunny Mehta says:

    Excellent article.

    Couple questions…

    How does Wright’s 2010 streakiness number change if you set the average at something other than his observed 2010 average wOBA (e.g., perhaps his career wOBA, his projected wOBA, etc.)?

    Also, it seems your randomization technique involves essentially Hypergeometric sims. My concern is this: I’d think the theoretical distribution of every player’s results would be pretty heavily left-tailed, i.e. they are likelier to underperform their expected wOBA by a larger margin than overperform, due to injury. Perhaps your model could account for that? (Maybe hypothesize a distribution, perhaps in Beta or some other form, and then sim from that?)

    Probably worth looking at what the rest of the population does, even if you surmise the players’ “batting styles” to all be slightly different.

    Vote -1 Vote +1

    • Seth Samuels says:

      Sunny,

      I haven’t run it with either career wOBA or his projected wOBA. Honestly, I see no reason to think that either one would be any better, especially once you start generalizing to all players. Career doesn’t really make sense because players can change a lot from one year to the next. Projected wOBA is based on a best-guess, but it strikes me as pretty unlikely that a player’s projected wOBA coming into a season is a better representation of his true talent level than his performance during that season.

      As for hypothesizing an alternate distribution, that just seems to be moving way too far into theoretical territory for my tastes. The thing that’s nice about permutation inference is that it doesn’t rely on any guesswork about the theoretical distribution, it just uses the actual distribution. Ultimately, for streakiness, what we care about is the sequence, not the performance itself. It’s also worth noting that I don’t find any evidence that players are more likely to underperform than overperform.

      Anyway, as you’ll see tomorrow when I look at all players, I don’t think it will affect the results.

      Vote -1 Vote +1

      • phoenix2042 says:

        well i think that sunny meant was that a player who has a .350 wOBA can be injured and post a .000 wOBA, whereas they will never ever be able to post a .700 wOBA over a year. basically, they have more room to fall than capability to rise. although, i’m pretty sure this doesn’t have much to do with the article…

        Vote -1 Vote +1

      • Seth Samuels says:

        Phoenix,

        That was my understanding as well, it just doesn’t seem to be true in practice–a player who misses significant playing time doesn’t have those days included in his streakiness anyway. And the variation seems to be more or less the same in either direction, on the whole. If anything, it might be easier to overperform, since the wOBA floor (0.000) is much closer to actual wOBA than the ceiling (1.950, if someone homered every plate appearance). But as I said, in practice this doesn’t seem to be an issue either way.

        Vote -1 Vote +1

  15. Phil says:

    Raul Ibanez should definitely merit a look, he was ridiculously streaky while in Seattle

    Vote -1 Vote +1

  16. Vic Ferrari says:

    Great stuff Seth. Albert called this the Black Stat, because he coloured in the area between the mean (straight line) and moving average plot (squiggly line), the area in black represents the streakiness. You’ve done the same using a hypergeometric model (reminiscent of Bill James’ Batting Temperature) and wOBA. Then he simulated 10000 seasons for each player and determined their rank, as you have done graphically.

    My only quibble would be to question why games were used instead of PA.

    His next step was an order test. Wright would be a count in the tenth bin. If the next player you check has his actual streakiness for the season rank 4763 out of his 10000 sims, then he would count in the fifth bin. On and on for every hitter in MLB who had a reasonable number of PAs.

    You end up with a histogram that is surprisingly flat, leaning only slightly to the right. I assume that is what you will execute in Part II, and I would be shocked if the results are any different than Jim Albert’s.

    An interesting study would be to use your methodology, which I think is terrific, and apply it to allstar break to allstar break, so it bridges the off-season gap. Probably best just to use hitters that stayed with the same team for that, too. My theory is that would yield a population histogram with the 8, 9 and 10 bins significantly overrepresented, though it’s just a hunch.

    BTW, I believe Dr Albert is working on another paper assessing streakiness in baseball using a probablistic forecast of a players p-value based on a geometric model. And I think the early returns show similar results to his recent paper and (likely) your part II.

    Vote -1 Vote +1

  17. Vic Ferrari says:

    Sunny -

    Yeah, I agree on all counts. Essentially you’re questioning the position of the straight line in Seth’s plot. The thinking being that the straight line represents his performance that year precisely, but not necessarily his innate ability during the season. This due to luck. Am I interpretting correctly?

    My concern with your suggestion would be that we would be adding in more noise, simply because the available forecasting models aren’t that good. This mostly because of the transitive nature of player ability in the off-season, a factor which is extremely difficult to pin down. Hell, it’s extremely difficult to get anyone other than a Bayesian mathematician to even acknowledge it is a problem.

    Using Seth’s methodology using the back half of the 2009 season and the front half of the 2010 season, all treated as one continuous series of games … that should shed light on the problem. Perhaps do more than that.

    You agree?

    Vote -1 Vote +1

    • Sunny Mehta says:

      “Am I interpretting correctly?”

      Yup, that’s exactly what I meant. And I see what you’re saying about adding more noise. Definitely a binomial sim or anything that hypothesizes a distribution will in a sense be more “biased” than a pure hypogeometric one. Though I’d argue that’s preferable when dealing with a sample of one season of one player’s observed results.

      But if you’re right that everyone is f’ing up projections due to underestimating inter-season talent changes, I agree that it’s probably best to get back to basics by using hypergeometric sims spanning across seasons. But I just think it’s imperative to do it for multiple hitters to get some idea of the population shape. (Seth may be right that comparing Wright to the whole population is slightly unfair to Wright, but I think it’s more fair than NOT doing it all. Plus, further decisions about inclusion/exclusion of certain players from specific populations can always be made down the road.)

      Vote -1 Vote +1

      • Seth Samuels says:

        Sunny,

        Just in case it wasn’t clear, I’m not doing this just for Wright. Wright is the example I picked to explain the method. Tomorrow’s post looks at the whole distribution. But perhaps I’m misunderstanding your question. If your question is whether I should be comparing Wright’s raw streakiness to everyone else’s, the problem is that raw streakiness has a reasonably strong correlation with performance, so the permutation inference (well, really randomization inference) gets around that. I compare players to their own theoretical distributions, and then compare those comparisons across the league, if that makes sense.

        Vote -1 Vote +1

  18. Vic Ferrari says:

    Sunny, just to add, that criticism applies to Albert’s paper, the one that Seth mentions in this article. Have you talked to Jim about that at all? I’d be interested in hearing his reasoning.

    Vote -1 Vote +1

    • Seth Samuels says:

      For what it’s worth (and this applies to Sunny’s criticism too), I also tried running a LOESS on the full season, and taking the length of the curve. Doing it that way actually doesn’t make any assumptions about the baseline performance, it only looks at the fluctuation. The results were basically the same, so I went with this approach, which is much easier to explain to the mathematically disinclined.

      Vote -1 Vote +1

      • Sunny Mehta says:

        Seth, see my comment to Vic above, but basically I think Vic is surmising that, while many people have (correctly) found very little evidence for streakiness by players in a given season (indicating no significant changes in a player’s true talent during a season), we’ve all underestimated that effect BETWEEN seasons. I.e., players’ true talent levels change significantly in the offseason. And we should be able to test that using your model here, but doing it for samples of [second half of year 1 + first half of year 2] instead of [full year 1] or [full year 2].

        Vote -1 Vote +1

      • Seth Samuels says:

        I think I see what you’re getting at. The idea is using this to determine whether talent level changes across seasons? So, though that’s obviously a bit different from what I’m looking at, it’s a fair question. I guess I think two things about that: one is just that I think people understand that talent may change across seasons, but the best way we have to account for that is through aging curves. The issue isn’t necessarily that projections discount improvement as the fact that improvements (and loss of skill) are very hard to predict from player to player. Point two is just that I think this particular approach seems like it would be conflating a few things if you used it for that. However, you could use the randomization inference and have a different “raw” stat, such as wOBA in the first half of year two minus wOBA in the second half of year one.

        It’s an interesting idea, certainly. Not something I’m likely to do right now (I’m a grad student and classes just started up again) but worth thinking about in the future.

        Vote -1 Vote +1

  19. intricatenick says:

    This is great stuff. A massive result would be to perform this on every player for every year. The streaky players would be those who had the highest year to year correlation of above average streakiness.

    I think that actually doing permutation inference by day rather than AB might be interesting to see if those values differed in any way. Many ideas about streaks concentrate on being “hot” and I would think an AB that took place in the same game maybe be different from an AB that takes place the next day in terms of “hotness”. If there is no difference in running randomized days (i.e. keep each day the same but mix their sequence up rather than each AB) that may say something about the “hot hand” hypothesis. I think they used this idea of game separation in the hot hand basketball paper.

    Vote -1 Vote +1

  20. mmoritz22 says:

    Wow, This is a fantastic post. I do have a question: how much do you think that streakiness affects a player’s value, if at all? Also, do you think this stat will get a real name and become a stat that is used in baseball eventually?

    Vote -1 Vote +1

  21. Seth Samuels says:

    I just wanted to thank everyone for the kind words and creative suggestions. I thought about going through and thanking each of those who complimented me individually, but I thought that might be even cheesier than doing it in one comment. But do know that I read every comment and appreciate them all (and am happy to respond if people have further questions).

    This makes me wish I had time to do this type of thing more often, but alas, my education pretty much takes up my whole life.

    Thanks again,
    Seth

    Vote -1 Vote +1

  22. theperfectgame says:

    Truly fantastic article, Seth. The actuary side of me loves the technical statistical analysis, and the Met fan side of me loves your choice of case study subject.

    Can’t wait for Part 2!!

    Vote -1 Vote +1

  23. Woods says:

    Interesting analysis. What would you think of an alternative approach (assuming the data sets cooperate) of taking a player’s season and then looking at the distribution of their performance over every X game or Y plate appearance sample from the season. You could compare distributions and standard deviations for different players as a potential measure of their “streakiness.”

    Just a thought.

    Vote -1 Vote +1

    • Seth Samuels says:

      Woods,

      I’m not totally sure I understand what you’re asking. If you’re basically asking why I’m not using standard deviation instead of “area under the curve,” the main reason is that standard deviation measures distance from the sample mean (i.e. the mean moving average), and I prefer distance from the full-season wOBA. SD also squares values, placing greater emphasis on more extreme deviations, which I don’t have a problem with. But it doesn’t actually make much of a difference (I’ve tried it both ways).

      Hope that answers your question.

      Vote -1 Vote +1

  24. Vic Ferrari says:

    I wasn’t referring to aging, Seth, though that’s a terrific topic in it’s own right. I was talking about the transient nature of ability in the off-season, independent of age.

    Using your type of model (which is a Calvinist presentation of Albert’s work, nothing more or less, and that’s a good thing imo, despite Mehta’s concerns) … look at April vs September for the same population of hitters. This using the hypergeometric model and order test (I assume the latter is coming in part II).

    I think you’ll see that ability changed precious little in the population.

    Now repeat the exercise for September to the following April … it’s off the hook.

    Brad Null articulated the phenomenon well, though he never tackled it. There was another paper on forecasting, can’t remember the authors right now, the presentation of results vs PECOTA and Marcel seemed like a fishing expedition to me, so I never made note of the writers. They were Bayesian mathematicians, though. And they tried to capture it by creating separate population of hitters (priors) and tried to detect when a player was due to shift from one group the the other.

    Jim Albert told Sunny, indirectly, that the phenomenon predates PEDs as well. Specifically his graduate class was studying Hank Aaron’s career arc and could not explain the season to season shifts in p-value, which were well outside the bounds of chance but (as you’ll doubtlessly show indirectlylater today) the in-season p-values are stunningly consistent for the population as a whole. Sunny will correct me if I’m wrong about that, I’m sure.

    No bugger is going through the looking glass until someone figures that out.

    You would seem to be set up to execute that, at least make headway, though I appreciate you’re busy with school.

    Vote -1 Vote +1

  25. This is absolutely off the charts. Excellent research!

    Vote -1 Vote +1

  26. Jason W. says:

    “There are 661! possible permutations of Wright’s 661 plate appearances.”

    I don’t think there are anywhere near that many permutations. Every out, single, double, etc. is the same as all the others. There are only six events for each possible place in the permutation, so 6^661 is the upper bound on the number of orderings, right?

    This would reduce the number of permutations from on the order of 10^1600 to on the order of 10^500. (Unfortunately, this is all a quibble because it doesn’t change the fact that it’s probably silly to deal with every single one of those.)

    Vote -1 Vote +1

    • Seth Samuels says:

      Jason,

      This is more of a conceptual question than anything else. The way I think about it, as a random re-ordering, the 661! holds form. Basically, in my mind, even if the 1st and 101st appearances are both strikeouts, they are distinct. So, while there will be some 661-appearance sequences that will repeat, they represent distinct components of the distribution.

      As you note however, this doesn’t really affect the need to randomize. My guess would also be that the distribution as I’m thinking of it and as you’re thinking of it should be functionally the same, though I’m not sure if they’d actually be the same. The difference in the coding is that, if I’d done it the way you’re thinking, I would have taken only unique sequences, whereas in practice I did not. But again, this doesn’t have any discernible impact on the results.

      Vote -1 Vote +1

  27. Beancounter1010 says:

    This reminds me of studies of volatility of stock prices or commodity prices. Almost all stocks or commodities are more volatile than random chance would indicate. The problem is that we are assuming assuming Baysian probability. Instead, use power laws and fractals, and you’ll model the base behavior better. Individual streakiness needs to be compared to that base.

    The best book explaining this is by former Yale Professor and IBM researcher Benoit Mandlebrot, who invented Fractal Geometry. The book, cowritten by WSJ editor and Harvard Math major, Rich Hudson, is called the Misbehavior of Markets. I bet if you followed this fractal math, you’d get a better handle on the Misbehavior of Hitters.

    Vote -1 Vote +1

    • Seth Samuels says:

      Bean,

      Wow, that actually sounds fascinating. I’m a first-year PhD student in Poli Sci, so I try to keep my pleasure-reading dumbed down after long days of dry academic lit (think wizards and vampires). But I’ll definitely check that out. It may have to wait until summer though. Thanks for the rec!

      Vote -1 Vote +1

  28. Mesefraniarew says:

    ??????? ?? ????????? ? ??????, ??? ????? ?????? ????????…

    Vote -1 Vote +1

  29. Zereictenaing says:

    ????? ??????, ? ???? ?????? ????????? ???????? ?????? ????????? – ??? ????? ????????. ????????? ???? ?????????? ?? ??????.

    Vote -1 Vote +1

  30. Angel says:

    Such a wonderful text! I have no clue how you came up with this report..it’d take me long hours. Well worth it though, I’d assume. Have you considered selling ads on your blog?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Player Linker - Contact Us - Advertise - Terms of Service - Privacy Policy