## We’re Going Streaking (Again)

In the first part of this piece, I established a framework for evaluating streakiness, using David Wright’s consistent performance in 2007 and his streaky performance in 2010 as examples. Now that we have a methodology for assessing the streakiness of players, we can extend it to all players. I repeated the same process I applied to Wright for all 1,545 players with 500 or more PA in every year dating back to 2001. To save computer processing time, I only ran 1,000 simulations for all players, rather than the 10,000 I ran for Wright in the first part of this piece (this is the difference between the calculations taking days and their taking weeks). While this reduces our precision slightly, the distributions are nearly identical:

So, let’s just get right to the red meat. Here are the five most and five least streaky players in every year from 2001 to 2010:

So what do we see? Looking at 2010, there are some great names at both the top and bottom of the list. I don’t think anyone would be disappointed to have the streaky Carlos Gonzalez or the un-streaky Joey Votto on their team. And certainly, no one would take 2010’s super-steady Skip Schumaker over either one. Looking back at earlier years, Mark Kotsay looks pretty consistent. Not listed here, however, is Kotsay’s 2004, in which he posted a true streakiness of .987. Chone Figgins also goes from among the most consistent players in 2005 to the streakiest in 2007. Neither of the two David Wright seasons we looked at earlier makes the list, but his concussion-marred 2009 season was the streakiest in the league that year. So it seems that Wright is not the only one whose streakiness jumps around from season to season.

The surprising jumps we see from Wright, Figgins, and Kotsay, it turns out, are not a fluke. One way to assess the extent to which a statistic represents an inherent skill, as opposed to randomness, is to calculate the correlation coefficient across seasons. The correlation coefficient, represented by the letter r, tells you how closely related two variables are—in this case, that means how reliably you can predict a player’s performance in a given season based on what he did the year before. A correlation coefficient close to 1 suggests a strong relationship, and a correlation coefficient close to 0 suggests no relationship at all. Negative relationships are also possible, but shouldn’t be relevant in this case. A high correlation across seasons is a good indication that what you’re looking at is related to a player’s actual skill. Strikeout rates for pitchers, for example, tend to correlate across seasons at around 0.7 or 0.8. Voros McCracken’s groundbreaking work on BABIP found that it only correlates across seasons at about 0.3, which lead to an increased emphasis on strikeout and walk rates instead of ERA. With streakiness, the correlation coefficient is -0.014, which is not statistically different from zero (p = 0.667). Here is a scatterplot of player streakiness, with the x axis reflecting a player’s streakiness one season, and the y axis showing his streakiness the next season:

In this sample, there were 938 players who had 500 plate appearances in consecutive seasons (a player can count more than once, e.g. playing from 2001-2003 means that both the pair from 2001-2002 and the pair from 2002-2003 are included). Each blue dot represents a player’s streakiness in two consecutive seasons. How far the dot is to the right indicates how streaky he was in the first season, and how far the dot is toward the top indicates how streaky he was the next season. This is just a sea of randomness. Clearly, there is no relationship at all between the two. Compare that to the scatterplot for a true skill like batter contact rate (Balls In Play)/(Balls In Play + SO), which has an extremely strong year-to-year correlation (r = 0.893):

We can see clearly that a high contact rate in one year almost guarantees a high contact rate the next year. At the most, players might shift by about 10% from year to year, but a high-contact player will almost never become a whiff artist, nor will a strikeout king close every hole in his swing. This means that if we know a player’s contact rate in one year, we can make an accurate guess about what it will be the next year.

With streakiness, however, it is quite the opposite: Knowing a player’s streakiness in one season effectively gives us no ability at all to predict his streakiness in the next. In fact, even knowing a player’s streakiness in three consecutive seasons gives us no ability to say anything about the fourth. Streakiness also appears random within a given season: correlation between streakiness from one month to the next (minimum 100 PA) is r = 0.013, which is, again, not statistically different from zero (N = 3,844, p = 0.413). In short, if we believe our methodology—which I personally have no reason to doubt, although I’m open to suggestions—streakiness among hitters appears to be completely random.

While streakiness may be random for individual hitters, there is reason to think that streakiness overall is not. Here’s a histogram of the total distribution:

For those unfamiliar with histograms, this simply cuts the range of streakiness scores into 20 bins, (i.e. 0.00-0.05, 0.05-0.10, etc) and displays the number of players who fall into each bin. If streakiness were truly random, we would expect a uniform distribution, with roughly the same number of players in each bin and the bars forming a flat horizontal line. What we see, however, is a greater proportion of players in the top half of the distribution than in the lower half. This means that, on the whole, a greater proportion of players appear streaky than appear unstreaky. Moreover, this shift toward the streaky end of the spectrum, while not extreme (mean = 0.537, median = 0.566), is highly statistically significant (p < 0.0001, using both parametric and nonparametric methods). This suggests that players may tend to be streaky, on the whole, even if individual players are not. Although, as commenter Lee noted yesterday, this may also be a function of the non-random nature of the schedule—park effects and opposing pitchers undoubtedly play their part as well.

I should add a couple of things about the methodology. Although I did not start this research knowing if I would find a strong measure of streakiness, I did set out to find something that would be useful for identifying streaky players. I was, to be honest, completely shocked by the utter absence of a relationship between a player’s streakiness in one year and his streakiness in the next. In an effort to find something, I tried this study several different ways. I tried increasing the size of the moving average window. I tried using different measures of streakiness, such as the difference between a player’s maximum moving wOBA and his minimum, variance in moving wOBA, or even using strikeout and homerun rates instead of wOBA. I tried adjusting for luck on balls in play, giving extra credit for line drives over pop-ups. I also tried with pitchers, albeit in fewer different ways (the calculations take much longer to run, for a variety of reasons). I used xFIP, which effectively gives a pitcher’s luck-and-defense-independent ERA, with fifteen-day windows for relievers and twenty-five-day windows for starters. Again, the correlation was basically zero. No matter how I sliced it, the results came back the same. Each time, there was no relationship from one year to the next.

Furthermore, streakiness did not show any relationship at all with any conventional statistics (batting average, on-base percentage, slugging, wOBA, BABIP, walk rate, or strikeout rate), suggesting true randomness. The one relationship that was statistically significant was a weak negative correlation between streakiness and plate appearances (r = -0.061, p = 0.016). It is tempting to think that this may suggest that better players (who play more) are less streaky, but this is unlikely. The fact that streakiness shows no relationship with any other hitting statistics suggests that any relationship with plate appearances is, if anything, a function of strategic usage in response to streakiness. My best guess is that when a player has a streaky season, he is more likely to have a prolonged cold stretch and spend some time on the bench. But given the fairly large size of the sample and the weakness of the correlation, it may also just be a fluke.

So what, ultimately, can we take away from all of this? Although the analysis is complicated, the lessons it teaches us are straightforward. Streaky seasons undoubtedly exist, but it appears that there is no such thing as a streaky or unstreaky player. Rather, the truth seems to be that all players are streaky players. Being human, they have their ups and downs, and they are inherently streakier than random chance would dictate. They are not dice, and they are not random number generators. If Murray Chass ever read Fangraphs, I’m sure he’d be thrilled to hear that. But, again, there is no evidence whatsoever to suggest that a player who is especially streaky in one season will continue to be so in the next. Is this the final word on this issue? Almost certainly not. But right now there’s just no reason to believe that a player’s inherent streakiness, even if it exists, will have any greater impact on his performance than random chance. So, perhaps the next time you hear another owner in your fantasy league complain about how streaky David Wright is, you can offer Skip Schumaker one-for-one, and see what happens.

A Google Doc containing the results of this study has been made available for your perusal. The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Print This Post

### 96 Responses to “We’re Going Streaking (Again)”

You can follow any responses to this entry through the RSS 2.0 feed.
1. Pete says:

I’m not surprised by the lack of year-to-year correlation. Apparent “streakiness” is probably mostly the result of scheduling and luck.

• Matt says:

I think that’s over simplifying it, especially in regards to cold streaks. The variables are close to endless. You can’t discount focus levels as the season wears on, habits formed and lost, motivation, etc.

Then you take in the fact that the cause of each streak could be caused by completely different things each time: undisclosed injury this week, upset stomach the next, recovery the week after, fighting with your wife after that. When you look at everything that could possibly affect an athlete’s performance, it’s no wonder that everyone is streaky and you can’t predict when or if they’ll streak.

• Seth Samuels says:

Matt,

I agree with your thoughts. To me the key fact is that everyone seems to be streaky to some extent, so there’s no way to discern who’s streakier.

I will say this though: when I checked the overall distributions of starting pitchers and relief pitchers (I did this after submitting the article), starters had a strong shift to the streaky side (much stronger than for hitters), and relievers had basically none. This may be evidence that strength-of-schedule (much more than park factor) is what’s driving this. One would expect a starter’s game-to-game shifts to be significantly impacted by the opposing lineup, whereas a reliever’s usage may be more consistent on a game to game basis (e.g. a lefty specialist always facing tough lefties) or more random (a closer coming in to pitch the ninth, regardless of who’s due up).

2. bsally says:

Man, this is absolutely fascinating. Brilliant analysis.

incredible stuff… great job

4. chel says:

amazing work

5. Oscar says:

Best all-time articles on Fangraphs since its beginning. I’m surprised they volunteered to be upstaged so thoroughly, to be honest, haha.

6. MikeS says:

Fantastic. That first scatter-plot is beautiful.

7. mike says:

totally fascinating. i had thought otherwise, using my limited observed and perception-impacted following of baseball; that there were such things as streaky and consistent players. i always thought it would help me to know this in my head to head weekly fantasy leagues. knowing these results, i will rest easy putting no stock into this factor.

i would be interested in the results, however, for fielding. i’m sure there would be inherent sample size limitations of using, say uzr, over a 7 day rolling period given that uzr is not thought to stabilze even over a full season, but there is the old thought that defense doesn’t slump. i wonder if the conclusions would be the same. or even the correlation between offensive ‘slumps’ and defenive ones. do players take their strikeouts into the field with them. it’s also said that speed doesn’t slump, so concluding that there is no correlation to player type is interesting as well.

i look forward to more work from you, and especially appreciate your involvement in the comments threads, discussing the feedback.

• Seth Samuels says:

Mike,

I agree that UZR would probably be pretty noisy. You’re dealing with extremely small samples and a lot of uncertainty. The other issue is that I don’t actually know where to get daily UZR data.

My suspicion though, would be that there’s not much of a difference. While we can identify isolated incidents, I doubt we’d find a big overall trend. My guess is the same with “taking it out to the field,” though it’s an interesting idea.

When FieldFX hits, that should make those things a lot easier to look at.

8. Jim Lahey says:

I wonder if the Red Sox use a similar analysis and try pick up players that score lower on the streakiness scale so they have consistent production all season long

Trot Nixon, Mark Kotsay, Rocco Baldelli, David Ortiz, Victor Martinez*, Marco Scutaro, Edgar Renteria, Jacoby Ellsbury.. (i know some didnt play for the Sox at the time)

*VMart is very interesting.. one of the streakiest 2 years in a row then he makes an appearance at the bottom of the list the following. Wierd.
Went from 2005 p=.998 to 2006 p=.973 to 2007 p=.019

• Ricky says:

Lahey, are you drunk again? The article just said that streakiness is random; why would the BoSox assign value to something that doesn’t exist? If they did any analysis on streakiness, they would have done something similar to this and known that streakiness isn’t a skill.

Vmart isn’t interesting except that he’s a good example of how streakiness isn’t a skill.

Lay off the liquor, Jim.

• Bubbles says:

F*ckin A, Ricky

• Ray says:

Bubbles, answer that. I gotta rock a piss off, buddy

9. isavage30 says:

Yes, very interesting. I have to say, when I clicked on this article, my first thought was: Victor Martinez is ridiculously streaky and has been every year except 2007, I wonder if he was ever one of the streakiest in the league? Sure enough, Victor was top 5 in two years, 2007 he was bottom 5. I’m curious where he ranked in ’09, which I’d have pegged as one of his streakiest years, when he put up a .455 wOBA in April and a .245 in July … so, is Victor not a streaky player? I think he’s so streaky, he might just go an entire year being one of the most consistent players in the game.

• Jim Lahey says:

Vmart
2005: .998
2006: .973
2007: .019
2008: —
2009: .912
2010: .922

So streaky he was … consistent for an entire season?

10. Daniel says:

You seemed to miss a bit there.

11. Vic Ferrari says:

Again, just wonderful stuff, Seth. I disagree with your conclusions though.

The scatterplot, essentially a test for real effects, an autocorrelation test in frequentist terms … it shows nothing.

The histogram (an order test) shows an obvious tendency, in the general population, towards streakiness. The skew can’t be missed with the eye.

Autocorrelation is a Big Apple Near The Bottom Of The Tree Detector, and the order test is much more sensitive.

I would think of the order test as being a seismograph, and autocorrelation being a count of the amount of stuff that fell off of shelves during a tremor. Given a powerful enough earthquake … the results will be near enough the same. Given a small tremor, autocorrelation is near useless.

I read your results as saying that streakiness very clearly exists in the population of MLB hitters, but pinning the quality on any player using math … like tryuing to corner a weasel in a round room. Perhaps scouts and managers have a better feel for it, especially if they know how the guy has been hitting in BP and if he’s been hitting the ball hard when he gets a hittable pitch. I dunno.

Slightly stronger indication for streakiness than Albert found, which is a credit to your adaptation of his model. Huge props, brother. Hands down the best sabermetric article I’ve read in ages.

• Travis says:

Wouldn’t the histogram just be reflecting that the mean streakiness is weighted a little bit towards being streaky? That is, the league as a whole is generally a little more streaky than a random distribution would imply. But I don’t think it tells us anything about any one player being more or less streaky across seasons, which would indicate it a skill.

• Seth Samuels says:

Travis,

I don’t want to speak for Vic, but I think he (and I) would agree with that interpretation.

• Vic Ferrari says:

Travis,

The histogram (order test) tells us that there is streakiness in the population. In other words, it tells us that some players are more streaky than others, in no uncertain terms.

Autocorrelation won’t pick that up. Gelman (who owns a much bigger brain than mine) has some good commentary on this subject lately. His blog is a good read, he’s an engaging writer, and I would guess he’s one of those rare academics that is also a good teacher.

The acid test is a model, at least to my mind. Build a population of hitters, some consistent, some streaky … see how it shakes out. Does autocorrelation (i.e. real effects, Seth’s scatterplot) help you find it? Does the order test (Seth’s histogram) shed light?

Since you have specifically assigned streakiness to players, and you know who they are … can you identify them with math?

These are good questions, I think.

• Seth Samuels says:

Vic,

I’m not familiar with the order test. Mind if I e-mail you to find out more?

I appreciate the kind words though. And I love the earthquake analogy (especially as a northern Cali resident).

12. Aaron J says:

Those two scatterpoint graphs tell such a great story even without the other graphs. Awesome work…thanks Seth.

13. Esoteric says:

“Streakiness” is reminiscent of “cluchiness”. I think there’s a belief that, while not a repeatable skill for most, that cluchiness may be an actual skill for a few. Do you think this is true for streakiness as well?

• SteveM says:

I’d love to hear the answer to that question too.

• joser says:

What you’re asking, I think, is this: when looking at that random-seeming scatterplot of “Batter Streakiness Across Seasons” are several closely-clustered dots different season pairs for the same guy. Or, put another way, are there individual players for whom r is high, despite it being low across the population as a whole?

• Seth Samuels says:

Esoteric,

I do think that’s at least theoretically possible. I had been thinking of it more like BABIP, where you have knuckleballers who are just qualitatively different, but clutchiness may be a good example too.

I do find a few guys who appear to be consistently in the same general range (e.g. Matt Holliday is in the .640-.800 range for the last six years, and Randy Winn has barely been below .700 since 2002).

I don’t personally know a good way to test this probabilistically, but if anyone can suggest a metric, I’ll try to find the time to run it.

14. Lee says:

Seth,

Really great stuff. The only thing I would contend with is your parting remark. You claim that all players are streaky, yet also concede that you can’t find any year to year correlation of streakiness on a per player basis.

Without any of the raw number in front of me back this assertion up, I would guess that all of this streakiness you’ve found (or atleast 98% of it) is due to the external factors brought up yesterday, factors that are just noise when trying to find a player’s “true streakiness” capture (parks, pitchers, injuries, etc.)

I say this because if you were to believe these numbers and take them at face value – that players do actually have streaky seasons above and beyond random distribution around the mean – then you would see SOMETHING year to year. Essentially, if something is under a player’s control (to any degree), you will find some year to year correlation, positive or negative, somewhere. It can’t be nonexistent. Otherwise, as is the case here, as I believe, we’ve just measured the amount of streakiness imparted on a player’s production, due solely to the nature of the game of baseball.

Really enjoyed it though. Looking forward to more!

• Lee says:

there are some extra words in there, sorry, kind of rushed that post. work sucks!

• Vic Ferrari says:

“would guess that all of this streakiness you’ve found (or atleast 98% of it) is due to the external factors brought up yesterday”

Lee:

The other side of that coin is that managers may be riding the hot hand, rightly or wrongly.

If a young LH batter, a fastball hitter mostly, is just tearing it up … what does the manager do when the next game’s starting pitcher is a cagey old LH junkballer? My guess is that he leaves him in, even if he has a useful RH vet on the bench, a guy he’d normally sub in for him in that instance.

If managers understood streakiness perfectly (and I doubt they do, though it’s always dangerous to assume others are fools) then they would be able to hide it perfectly from Seth’s math, as rigorous as it is.

If the same methodology were applied to only RH batters vs RH pitchers … would the skew of the order test increase?

I don’t know, but I would bet real money on the over.

i.e. The exagerating factors are likely less than the mitigating factors, or so I suspect.

• Lee says:

I think the sample is just too big for part time players to have an impact on what’s going on here. And if I understand your example correctly, you are saying that if a manager leaves in a “hot” part time player, even if is clealrly hurt by the platoon split during the next game, this would make him less streaky. I think it would make his propensity for streakiness greater, because in reality, just because he is hot doesn’t mean he’s going to be any more likely to hit the crafty old southpaw. So, he has a bunch of hot games in a row, then a game with an expected wOBA of crap. This is what Seth is measuring as streaky.

I think there’s just way too much to wade through here to say with any certainty that players are streaky outside of the factors they can’t control.

• Lee says:

And to reiterate, or make my point clearer – the line I disagree with is:

Rather, the truth seems to be that all players are streaky players. Being human, they have their ups and downs, and they are inherently streakier than random chance would dictate.

Once you account for the X factors, I don’t think there’s actually any “streakiness” at all, as I assume we all are defining it (the ability to actually under or outperform your talent for extended periods of time.)

• Vic Ferrari says:

Lee: Was that a response to my post?

What if Seth modelled an imaginary league where there was significant streakiness with several players, but little with most others?

How would the scatterplot look? What would autocorrelation tell us? How would the order test look?

Good questions I think.

• Lee says:

Sorry, it was a response to mine, before I read yours.

I think.. the question you raise is very interesting, and I think it actually lies outside the scope of what Seth said, and what I objected to.

He claimed that everyone across the league exhibits some amount of streakiness, and it’s shown when looking at the league wide plots. I think that this baseline of streakiness actually represents the true mathematical random distribution of events, given the external factors, and that the league is perfectly unstreaky, or averagely streaky. I hope this makes sense.

However… you bring up an awesome point. From what we are looking at here, and I’m not sure by Seth’s work if this is true or not… would you even see those very streaky players (as you have created in your model league)? Would they poke their heads out from under all of the other average streakiness players?

I’m not sure if what Seth did would catch those players or not. I’m sure he could tell us.

I think you probably wouldn’t, but I wouldn’t be surprised if they really existed. However, I think the main point of this article was to say that the league exhibited streakiness, as a whole, I that’s where I really disagreed with him. The league is randomly distributing the events, but it’s these outside factors that lead Seth to conlcude they are exhibitnig streakiness.

• Seth Samuels says:

Lee,

I can’t disagree with you. To be honest, I hadn’t considered park/opponent effects until you brought it up yesterday (a major blindspot, I realize, but it happens). I added in the sentence mentioning that at the last minute, but the rest had already been written and submitted.

In truth, I think your 98% statement is probably a bit of an exaggeration. It’s hard to tell really. I will say (as I noted in a comment above), the effect is strongest for starting pitchers, strong for batters, and nonexistent for relievers, which may give some credence to your claim.

I left that particular line in because, honestly, it strikes me as probable and the data don’t necessarily refute it. If I had to guess, I would think that, in addition to park and competition effects, some players may be affected by, e.g., a bad breakup or a child’s illness or something (negative externalities strike me as more likely to have an influence than positive ones), while others might be less affected. These external events would be, for our purposes, completely random, so there’s a decent chance that it would still show up this way.

Vic’s idea about simulating a season is an interesting one. If anyone has the time to code something like that, it would be great to see. I can try to adapt my code to do it, but it might take a while (I have lots of homework these days).

• Lee says:

Yea, I think Vic’s idea of creating a model with streaky players would shed a lot of light on the situation. However, just thinking about the parameters of that model is an enormous task.

You could easily take 10-20 players and model them so that their true wOBA fluctuates every 15 games or so. .380 for 15 games, 300 for 15 games. But this still leaves all of our existing noise. Our data for this project is so laced with things we don’t want to measure.

After modeling our streaky players, we’d have data for 10 guys who were streaky in a vacuum, and a real data on a league full of players who are (as I believe) totally averagely unstreaky, but play with many factors that make them look streaky.

There just doesn’t seem like a good way to drill down to exactly what we are looking for. I wonder if tango has any thoughts on how to clear the noise. He posted part 1 yesterday on his blog. (and he said for the most part what I’ve been saying)

15. Albert Lyu says:

Fascinating.

16. SteveM says:

I love this site and read it religiously. In all of the countless (get it?) articles I’ve read here, however, this is by far the best. Interesting and highly applicable to the real world, and addresses one of the most common idioms in baseball. Add to it, the focus on the best player on my favorite team, and this is one brilliant article.
What are the odds your next will suck? No correlation I’m guessing!

17. notdissertating says:

Great work here, and fun to read. Is it clear from statistical theory what the distribution of p-values should look like? The implicit assumption here seems to be that they should be uniformly distributed under the null that streakiness is randomly assigned to players. I’m not convinced. It might be better to adjust the streakiness scores from the simulation for each player to mean 0 and variance 1 (instead of obtaining the p-value), and then plot the distribution of these normalized values, which, I think, under the null should be distributed standard normal. This test might be isomorphic to the route you take, but that is not immediately clear to me.

e.

• Seth Samuels says:

Notdissertating,

I’ve run it that way too, it just doesn’t look as interesting. I think it’s not quite isomorphic (though it’s been a while since I took linear alg), but it doesn’t much affect the results. CI for the mean is (.08,.19), for the median it’s (.07,.22). For the uniform distribution it’s (.52, .55) for the mean and (.53,.59) for the median.

• Barkey Walker says:

How did you get a CI on a median?

• notdissertating says:

As a sixth-year graduate student the username is perhaps more apt than it ought to be!

• Barkey Walker says:

The definition of p-values is that they are u(0,1). That is their one and only property. Yay Fisher for giving them too us!

• Seth Samuels says:

Barkey, median CI’s are bootstrapped. Sorry, should’ve noted that.

18. joser says:

Maybe I missed it in the first article (should go back and re-read it, I guess) but I still find myself wondering how much streakiness you get purely from random chance. I mean, if your typical high-in-the-order hitter gets at least 4 plate appearances per game and has a batting average better than .250, he should get a hit in every game — if he had no “streaks”. In other words, if (say) Ichiro had no streakiness whatsoever, not even the kind that comes from random noise, he’d get to his annual 200 hits by getting one in every game, plus an extra one about every fourth game. And no one would talk about DiMaggio’s 56 game hitting streak being perhaps the hardest record in sports to break, because many players would be shattering it every year.

So while players are not dice, there’s some amount of streakiness we should expect from them even if they were, and that’s really the baseline we should expect them to exceed (or not) when we’re talking about them being streaky or consistent. Afterall, the argument against the “hot hand” is just that: even random events can come in streaks, so you shouldn’t be making decisions based on it. So at what point can we say a player really (probably) is a “hot” or “cold” hand, at least temporarily? Or, more precisely, how likely is any given streak, and so how much can be attributed just to random chance?

• joser says:

(And yeah, I know, wOBA is a better measure of offense and we don’t care about batting average; I’m just trying to keep my question simple and brief, and getting hits is going to be one of if not the first thing people think of when you start talking about offensive streakiness)

• Seth Samuels says:

Joser,

I think yesterday’s column addresses your concerns. It lays out the method I use. The point is that this is after adjusting for that random variation (though it misses, as Lee has noted, the effects of park and opposing pitchers).

19. Vic Ferrari says:

Lee:

Thanks on the hockey props. Bye the bye, I have executed precisely the same math as Seth for NHL EV shooting percentage, both of us stealing from Albert and converting to hypergeometric (is it surprise I like Seth`s thinking :D ), this starting a couple of years ago.

The files at timeonice.com are penner.php and wolski.php I think. I`ve never published links to them, they are php files and run very slowly. Hopefully there is a url advice comment there. If not, let me know.

In any case, Hockey players would appear to be extraordinarily streaky. This because while manager`s actions in MLB mitigate, coaching actions in NHL exaggerate (i.e you play hot shooters with good linemates more).

The truth lies somewhere between the two. Either that or human nature has driven freakishly consistent persnalities to baseball and erratic loons to hockey. That seems unlikely to me.

The fact that even Seth`s order test shows only a moderate right skew is a testimony to the qualioty of field management in major league baseball. Or so I think.

Somewhat ironically, this is the type of information generally used to display their foolishness.

Hockey is the polar oppisite of baseball though, coaching decisions to play the hot hand exaggerate the effect, baseball managers decisions

• Lee says:

What events (we are using at bats) are you using to measure hockey players? Shifts?

• Vic Ferrari says:

I was using shots directed at net, the script actually crudely graphs the Wright-style rolling average plot for each player. The Black Stat. If memory serves the url appendage is `shottype=1` for shots only. `shottype=2` for missed shots included, and `shottype=3` for all shots directed at net.

I think the other url appendages, by way of example are `team=PIT`and `player=87`… which would of course be Crosby. Let me know if that doesn`t work, I`m just going by memory.

• Lee says:

The nature of Hockey, as you mentioned, is inherently more streaky. And while the management has something to do with it, I’d chalk most of it up to line chemistry, and in the case that you are using shifts as your time increment, you introduce all of factors that come along with playing a specific team compared to another team (opposing shut down lines, defense quality, goaltending quality, home/road, etc.)

This is the problem when you use the pure mathematical distribution of of events. There are things that are natural to the game of baseball and hockey that look streaky when compared to the pure distribution, but this isn’t what we want to measure when we say “that guy is streaky.”

Now, baseball is fascinating for this type (and any other statistical work) because events can really be pinned down on a single player’s shoulders (way more than any other sport, but of course there are endless factors involved as opposed to a player hitting a pitching machine in a vacuum.)

This is why I think baseball appears to be way less streaky than hockey. There’s just less noise.

• Vic Ferrari says:

I disagree completely, Lee. Humans be humans.

This MLB writer is very good though, I hope he carries on.

• Lee says:

Well, your opinion is certainly respected by this commentor… but what do you say to the fact there is zero correlation year to year for these streaky players? That’s a hard pill to swallow.

20. glassSheets says:

Ron Gardenhire’s use of Jason Kubel and Brendan Harris are great examples of what Vic is talking about in his 1:16 PM post.

I know there was comments about park factors in the part 1 comments section. But what about just home/road? I thought of this when reading how streaky 2009-2010 David Wright was compared to David Wright sans Citi Field. I realize a sample of 1 was the impetus for my comment and not significant observations, but still interested in how it would turn out.

21. Owen says:

Fantastic stuff. You might find it interesting to look at “hot hand” research from bsaketball. Yet another study has come out on it, one in a long line of studies. They point to a similar conclusion as your study. Human streakiness is random but no amount of explanation will get people to see it that way.

From True Hoop on Thursday:

“The book “Scorecasting,” by Tobias J. Moskowitz and L. Jon Wertheim is the latest to make a killer case that the hot hand really does not exist (or is far more scarce than most basketball players would admit). Their case may not matter, as they also include this line: “Amos Tversky, the famous psychologist and pioneering scholar who initiated the original research on momentum and the myth of the hot hand, once put it this way: ‘I’ve been in a thousand arguments over this topic. I’ve won them all, and I’ve convinced no one.'”

• Seth Samuels says:

Owen,

There is one factor in basketball that would be tougher to account for, which is that players who think they’re on a hot streak might be more likely to take bad shots. I’ve never seen a study that tries to adjust for this–I have heard of studies showing that it happens though. Maybe Scorecasting does (I plan to read it, for whatever that’s worth). In truth, I’m not sure you *could* properly account for this, since you’re dealing with, say, a probability shift from like 70% on a good shot to 20% on a bad shot, so you’re gonna get a lot of noise when you start valuing tough makes more than easy makes.

I suppose in baseball that could equate to swinging at bad pitches, but I suspect you’re less likely to swing at something a foot off the plate than to try to launch a fallaway three from the corner (see Bryant, Kobe). Also, in baseball, you hear about guys saying the ball looks like a grapefruit or what have you. If they’re seeing the ball better (or “better,” if you prefer), that should mean taking bad pitches more effectively, as well.

Anyway, I’m not at all prepared to say that streakiness is purely random. But there’s a distinction that’s not often made between something that is “random” and something that is ” indistinguishable from randomness.” I’m inclined to believe that this falls in with the latter. Very little about our lives is individually random, but the overall picture has a whole lot of noise to it.

I’ve used your post as a jumping off point to something else, I guess, but anyway those are my thoughts.

22. Barkey Walker says:

I read the previous post and I still wonder, why does it take on numbers from 0 to 1? Why might we expect it to be uniformly distributed?

• notdissertating says:

Q: Why does it take on numbers from 0 to 1?
A: Because the metric is probability — a 0.9 score means that the players’ season was streakier than 90 percent of possible seasons, while a 0.1 score means the season was less streaky than 90 percent of possible seasons.

Q: Why might we expect it to be uniformly distributed?
A: Good question. It makes some intuitive sense, I guess. See my post above in which I ask the same thing.

• Barkey Walker says:

Is it, conditional on their season total, this is the percentile of simulated streakiness that their actual season falls into?

• notdissertating says:

yes. exactly. this is my understanding.

• Seth Samuels says:

If we’re really talking about something random, then (I think) it should be uniformly distributed because each of these statistics is a percentile, relative to the player’s individual distribution (which is approximately normal). So each “true streakiness” score can be thought of as a p-value relative to the player’s individual distribution. If we’re just going by randomness, then just as we should see a t score less than -1.65 about 5% of the time, we should see a p-value less than .05 about 5% of the time.

I realize I might be wrong about this assumption, even if the data bear it out, so I’m happy to accept corrections or comments.

• Barkey Walker says:

That is right. If the simulation generates the true distribution, then the distribution will be uniform on [0,1].

The problem is that you really need to remove opponent effects from this. To say it in a way that might appease the FG powers, If Longoria was playing third yesterday when I went to bat, then he is probably playing third today. A more traditional, pitcher centric approach, would say, if I was playing vs a Boston pitcher yesterday, I am probably playing vs one today. That story makes less sense because it could also lead to negative correlation. i.e. If I was playing vs the fifth pitcher in the rotation yesterday, I’m probably playing vs the first today. Also, if I faced a closer yesterday, I’m unlikely to face one today.

Obviously, park effect matters too. Fly balls are not worth as much at Safeco field as they are in Coors. This could also change how I approach my at bats. i.e. a guy who hits 10-15 HRs per year is not going to swing for the fence in Safeco, but might in Coors.

23. don says:

This is really cool.

I wonder if any of the minor streakiness effect is due to homestands and road trips. On a road trip or home stand of > 1 week you’d have a few days where the rolling 7 day window included a mix of home and road games, but also a few days where it were exclusively one or the other and you’d expect them to hit slightly better at home and slightly worse on the road.

24. Dan says:

At work so maybe I didn’t get everything but, I wonder if you are dismissing your analysis too soon. You seem to say that there is no correlation between a player’s streakiness from one year to another, but is that true for every player? What I want to know is, do the same players show up near that y=x line consistantly on the scatter plot? Or, what players have a low variance of streakiness?

25. Dan says:

Player Average Streakiness Variance Number of Years
Matt Holliday 0.717 0.003 6
Aaron Hill 0.575 0.113 4
Jose Lopez 0.352 0.011 5
Kenny Lofton 0.764 0.015 5
Randy Winn 0.750 0.019 8
Eric Chavez 0.353 0.022 6
Shannon Stewart 0.235 0.024 5
Juan Encarnacion 0.298 0.025 5
Scott Hatteberg 0.637 0.026 5
Rafael Furcal 0.658 0.030 7
Miguel Cabrera 0.516 0.036 7
Vernon Wells 0.605 0.038 8
Jose Vidro 0.271 0.039 5
Angel Berroa 0.753 0.027 4
Ron Belliard 0.448 0.041 5
Placido Polanco 0.452 0.042 9
B.J. Upton 0.507 0.118 4
Carl Crawford 0.647 0.045 7
Dan Uggla 0.631 0.047 5

26. Dan says:

Sorry it is hard to read but above I provided the top 20 players in consistency of streakiness with 4 or more years in the analysis by Seth. The interesting players would be those with high or low average streakiness. We can say that these players are consistently streaky/unstreaky.

For example, the number 1 there Matt Holiday was extremely consistent at being streaky, and Jose Lopez was consistent at being, well, consistent.

• Seth Samuels says:

Dan,

Thanks for putting that together. I had actually been looking at that on my own computer when you posted. I guess the thing is, it’s hard to know whether those 20 players are that consistent because they really are, or because we’d just expect to see that happen sometimes in a big enough sample.

I’ll see if I can tease that out.

• Dan says:

Hmmm… I am guessing the sample would have to be really large to “expect” a player to have a variance of 0.003 over 6 seasons. Outlier? umm… maybe? If you do any testing on this subject, I think it would be even more interesting.

• Barkey Walker says:

I think this might be like the birthday problem (if you have 19 or so people in the room, chances are two of them share a birthday. Far sooner than you would think).

• Seth Samuels says:

I just ran it. Basically, Holliday looks like an outlier (we’d expect something that extreme about 0.0002 of the time). But there’s nothing in the distribution of variances to indicate non-randomness.

I just did a quick-and-dirty nonparametric test, but basically I took the variance of 5 randomly selected numbers in the uniform distribution, 6 randomly selected numbers and so on, calculating the distribution of expected individual player variances. I then compared a player for whom we had five seasons to the distribution of five-number samples, and so on, so I got a p-value for each player’s variance. The distribution is not significantly different from a uniform dist:

Low High Count
0 0.1 14
0.1 0.2 7
0.2 0.3 14
0.3 0.4 15
0.4 0.5 14
0.5 0.6 16
0.6 0.7 10
0.7 0.8 14
0.8 0.9 10
0.9 1 15

So, there’s nothing here that indicates to me that these guys are actually the rare consistently streaky players. I think if that existed, we’d see a slight peak at the bottom end, and then smoothness above that.

• Dan says:

Okay, let’s do this (and it has been a while since I took any prob/stats or did anything like this so I may be wrong):

For each player I took the max real streakiness – min streakiness.
This is the probability that a random event would occur between that max and min.
Now, exponentiate that to the number of years and you have the probability that the player would randomly have his streakiness within his max – min.

So for Matt Holiday it is (0.80-0.64)^6= 0.00168%. I am using players with over 4 years so a sample size of 130. That would not be expected.

Actually, to me if you look at it for individual players, and not the league as a whole, it seems there is some correlation for many players.

Please tear apart my logic and teach me a lesson.

• Seth Samuels says:

Dan,

Down the rabbit hole we go. So yes, the p-value on Holliday’s season is about 1/5000. The thing is, if we focused on Holliday only, we’d be biasing the results. So, we calculate that p-value for everyone (I did it for the 129 players with 5 or more seasons) A hypothetical distribution if we had a league where *some* players were streaky would probably have a larger number of players concentrated at the bottom (since some would be there reliably while others would be there randomly) and be pretty uniform above that. This is, as I noted a little earlier, not something we see.

Sticking with Holliday’s results for a minute, the p-value on his variance is 0.02%. But we don’t have a sample of one, we have a sample of 129. So the probability of seeing an outcome as extreme as Holliday’s isn’t 0.02%. Rather it’s 1-(probability of no outcomes as extreme as Holliday’s)^(number of players in sample), which translates to 1-(.0002)^129 = 1- .975 = .025. So the probability of seeing one result like Holliday’s in this sample is about 2.5%. That’s not a lot, to be sure, but it happens. We’d expect a result as extreme as Randy Winn’s (the next lowest p-value at .75%) about 62.1% of the time. This number very rapidly approaches 100% in a sample of this size.

So basically, we’re left with a question of whether we had the unfair coin come up tails, despite a 2.5% chance that it would happen, or whether Matt Holliday is the only player in the last ten years who is either consistently streaky or consistently unstreaky. My guess is the former.

Hope that helps.

• notdissertating says:

great explanation of a really tough concept. i personally like the anecdote of the blade of grass on a golf course who is amazed and befuddled that of all the thousands upon thousands of blades of grass this little white ball landed right on top of her. what is the probability of that?! of course the golf ball had to land somewhere, so taken in context, it is only surprising from the perspective of any given blade of grass. matt holliday’s consistent streakiness is only surprising if you happen to be looking at it from his perspective – random chance suggests *someone* would exhibit crazy amounts of streakiness.

for the interested reader, this episode of NPR’s Radiolab has the golf ball anecdote among other enlightening discussions of stochasticity: http://www.radiolab.org/2009/jun/15/

27. Dan says:

I would say about 5% of that population have probabilities less than 50% and are interesting to look at. Time to go home, very interesting and was fun to play around. I’ll def read your stuff…

28. EWolf says:

Seth,

This has been a very interesting look at the topic of (probably) random fluctuations in player performance.

When I first read the articles I had the same question as Sunny (from yesterday’s post) regarding why a player should be regarded as “streaky” if they start the year or end the year at a considerably different wOBA than the overall average wOBA. That is to say, if a player has a down month (or a good one) are they “streaky”?

Let’s look at an absurd theoretical example of a player who plays through the All Star Break at a consistent 0.400 wOBA, but then plays the second half at a consistent, albeit lower 0.300 wOBA. We’ll assume that the same number of ABs occurred in both sections so that the weighting is equal. By your definition, this would seem to give a raw streakiness of 0.050 (as long as I’m interpreting it correctly), as the season wOBA was 0.350 and he spent the entire season either 0.050 above or 0.050 below the season figure.

However, I would argue that a similar player with a 0.350 season wOBA but who went through two up cycles and two down cycles would be streakier. That is, first 40 games at 0.400 wOBA, next 40 games at 0.300 wOBA, next 41 games at 0.400 wOBA and last 41 games at 0.300. The raw streakinesses of the two scenarios are the both 0.050 (so the “true streakiness” will be the same figure as well).

For that matter, are the players above any more or less streaky than another 0.350 season wOBA player that spent 90% of the season at a respectable 0.378 and the other 10% sucking wind at 0.100?

Again, my understanding of the method and/or my fast-and-loose area calculations could be off a bit (not to mention it certainly would take a great deal of skill to perform in these exact fashions), but I think that perhaps I just disagree with the notion that the 1-norm is the most effective in determining streakiness. I am not particularly helpful since I don’t have any alternate suggestions, just thought I’d add those thoughts in there.

Additionally, it would be interesting to see whether the path length metrics your briefly mentioned yesterday show the same year-over-year independence that the “true streakiness” stat exhibits.

• Seth Samuels says:

EWolf,

That is, as it happens, the exact same thought I had myself. That’s why I tried using the length of a LOESS curve around the moving averages, which I mentioned in the comments at some point. But that yielded results that were no different. So, given that I knew I was going to end up with a null result, I felt like it was better to explain it in terms everyone could relate to, rather than trying to explain local regression curves. But yes, I do agree with your premise.

• EWolf says:

Fair enough. If the results are the same, then this is clearly a more accessible approach. It certainly got me thinking about the topic a bit.

29. George Purcell says:

Seth-

Interesting work. The first scatterplot clearly indicates that there’s nothing about players that correlates with streakiness. Yet you have that odd histogram.

You’ve combined 11 years of data into a single analysis–my hunch is that this histogram is reflecting some sort of yearly variation.

I downloaded your data and recreated the bins. I then did a pivot table in Excel doing a count by year of the number of players in each of your bins. Finally I figured the mean and standard deviation for each year.

Year Mean Std. Dev.
2001 7.75 2.022895267
2002 7.6 2.72222819
2003 8.2 3.707744385
2004 8 3.094987459
2005 7.45 2.584875035
2006 8 2.492092758
2007 8.05 2.928534751
2008 7.2 3.001753873
2009 7.7 1.780005914
2010 7.3 2.154554539

There’s some pretty clear variation by year–especially 2003. It has the highest mean and the highest std. deviation. Whatever is causing the skew in your histogram is something that varies by year.

• Seth Samuels says:

George,

Yes, it does vary a bit from year-to-year. It doesn’t do so in any way that appears to be meaningful to me, but I’ll admit that I haven’t delved into that as closely as I did other things.

That said, looking at the numbers you posted, I’m not sure where you’re getting your means from. My guess would be that you’re using my raw streakiness stat rather than my true streakiness stat, but even still, your results are rather different from what I get. Perhaps I’m missing something in your calculations.

My quick and dirty analysis finds that, using nonparametric tests, the streaky-skew is highly significant in all years except 2003 and 2005. Perhaps something’s going on in those two years. I think it’s ultimately a function of opponent pitching, more than anything else.

• George Purcell says:

What I did was rank order true streakiness then place the measures evenly in 20 bins. I then did a simple count of the number of events in each bin that occurred in a given year and took the mean of the bin counts for each year.

I don’t have a way to post the pivot table, but take a look at the bin counts for 2003:

7 10 9 4 7 14 9 7 3 3 7 6 5 11 9 17 14

There has also been a decrease in the number of players qualifying–and the number of these players was highest in 2003 (164). By contrast 146 qualified in 2010 and 154 in 2009.

• Seth Samuels says:

George,

Got it, thanks. At any rate, as I said, the skew is pretty reliable from year to year, so I think it’s probably still an underlying thing, rather than just year to year variation. Also, there’s no theoretical reason that this should vary much from year-to-year, since performance effects are stripped out. It would be one thing if this included the switch to the unbalanced schedule or something, but without that there’s not any reason I can think of for the population to vary from one year to the next other than random variation.

30. pft says:

Unfortunately, the noise is such that it tends to overwhelm any analysis. I am sure for example that many “streakiness” issues for a given year by a number of players is due to them playing hurt or going through personal issues (the latter JD Drew in 2007). This increases the amount of streakiness in the larger population, as does scheduling and park effects, not to mention seasonal effects, aging (older players seem to be more streaky, esp on the cold side), etc, and this may mask some individuals true streakiness.

The data is imperfect and makes it difficult to find evidence of what you are looking for. Like clutch hitting, catchers impact on pitchers performance, etc. Sometimes one needs to look beyond the numbers, since the numbers are only as good as the data, and the data is not always correct or complete enough.

As they say, the absence of evidence is not proof something does not exist.

Nice study though, and at the very least I think it suggests that such true streakiness is not that frequent or significant.

• Seth Samuels says:

Pft,

I totally agree with your assessment. Somewhere earlier in the comments I noted the importance of distinguishing between something that is random and something that is mathematically indistinguishable from randomness. I don’t at all believe that this analysis shows the former. I do think that, for the time being, there’s no reason to think that any streakiness we see would be meaningfully different from randomness.

I have to ask though, you said older players seem to be more streaky. Is this something you’re pulling from my data? Or are you just saying that anecdotally?

31. Eric M. Van says:

Terrific and fascinating work, about which I may say more in a bit. But first this important caveat: as cool as it is, the streakiness metric doesn’t seem to capture all of our subjective sense of a player’s streakiness.

Johnny Damon in 2003 measures as consistent (.211), but he had a .703 OPS through July 7 and .812 afterwards. He was coming off a messy divorce and only the second half was consistent with his established talent, and I correctly argued at the time that only the second half was predictive.

Carlos Pena in 2003 measures as tremendously consistent (.094), but he had a .589 OPS in his first 143 PA and a .596 in his last 116, and a .955 in the intervening 256. A few years later I argued that that prolonged “hot streak” in the middle, as well as similar streaks in 2004 (which did measure as streaky at .869) and 2005 (not enough PA to measure) argued for tremendous upside as a hitter, and that was correct, too.

Todd Walker in 2003 was so streaky (subjective sense of those watching him every day) that the team sent him to a sports psychiatrist, but he measures at .220.

(It’s presumably a fluke that all three of these examples are from 2003!)

So it appears as if this streakiness metric is not good at characterizing seasons marked by prolonged streaks; it seems to nail micro-streakiness but not be good at macro-streakiness, as it were. I’d be curious to see what happened if the moving average was taken over a much longer time period.

• Seth Samuels says:

Eric,

Thanks for the comments. If it helps, one of the other ways I ran it was taking the distance between a player’s maximum moving average and his minimum, and doing it that say gives Damon a .733, Pena a .603, and Walker an .884. I don’t particularly like that measure though, because I don’t think of the difference between the absolute max and the absolute min as really being about streakiness. As far as Damon goes, I tried adjusting it to measure performance before and after July 7, just to see the probability of his having as extreme a first-half second-half difference as he did in reality (.043 by wOBA). The p-value comes up as .13, which is not that bad (and that’s one-sided, to boot). So Damon’s second-half improvement just wasn’t all that extreme. More generally, though Damon did have some second-half improvement, his ups and downs just weren’t very extreme aside from a really severe cold stretch at the end of the season.

With Pena, part of it is that the windows you’re using are pretty arbitrary (e.g. if you look at his last 145 PA, he had an .805 OPS), and most of what you’re seeing there could well be random fluctuation, given the small sample size. Seriously, if you can get your hands on it, take a look at Pena’s moving average (I’d post it, but I don’t have the ability to do that). It’s just not that streaky.

As for Walker, it’s kind of interesting, but he basically has an incredibly consistent season except for a massive cold spell for about a month and a half. But really, outside of that, the line is nearly flat. So, we’re left with a question of whether we want to place extra emphasis on his super-cold streak, even though he was so unstreaky for the remainder of the year. To be honest, I’m comfortable with the way the data treats him.

Nothing is going to be perfect with this kind of measure, I realize, but there’s nothing with these three players that makes me think it’s systematically getting anything wrong.

More to the point, as I’ve mentioned in earlier comments, I tried this with several different streakiness comments and still got the null result. So the main benefit of the way I presented it was that it was easiest to understand.

• Seth Samuels says:

Eric,

I should also add that I’m glad you brought up Damon’s relationship problems. I think that kind of thing would be a big part of what drives the streakiness we observe. Such off-field events occur at times that are, from a baseball standpoint, completely random, and may well be part of why there’s no identifiable individual streakiness–since we can’t control for off-field noise.

32. Eric M. Van says:

I don’t know why people are assuming that the schedule (park and opposing pitcher variations) adds to streakiness. It’s clear that it would make a perfectly consistent hitter appear to be streaky, but it seems just as clear that it would make a maximally inconsistent hitter appear to *less* streaky. If you were absolutely locked in and you face Pedro in his prime in a big park with the wind blowing in, you’re probably going to go 0-4. If you were slumping terribly and faced a AAA callup with the wind blowing out a gale, your odds of going deep despite the slump are much higher. The schedule just adds *noise* and regresses the actual streakiness towards the streakiness inherent in the schedule.

33. Seth Samuels says:

It seems things have settled down here, so I’d like to thank everyone again for all your thoughts, comments, criticisms, and kind words. Hopefully I can find some time to do this more often, so this won’t be a one-off thing. If you have more questions or comments, I get an e-mail when you post here, so by all means continue to do so.

Thanks again.

-Seth

34. MattD says:

Hello,

First of all, thank you for the interesting study.

I seem to be a bit late to the discussion, but if you’re still around … have you looked into any correlation between streakiness and DL-time? My first thought upon seeing the skew of your last histogram was that it could be due in part to playing through/after injuries; this could also help account for the weak negative correlation with PA.

Thanks

• Seth Samuels says:

Matt,

I haven’t, but that’s a good idea. Unfortunately, I don’t have any data on DL time, nor do I know where to get it. I could definitely see that being an explanation for underlying streakiness. If you have any idea where I might get something like that, I can try to run it.

35. Kevin says:

Hi Seth,

I looked at a somewhat similar topic one day: the consistency of players. If you wanted to take a look here’s the link: http://theicebat.wordpress.com/2010/11/13/how-consistent-are-baseball-players/

Instead of taking differences between a player’s seven day length and his seasonal mean for a metric, I used a time series model approach. I think this would be interesting using wOBA and trying to predict a player’s next day wOBA based on his past seven days. What’s nice about time series would be the lag (of seven days in your case) determined parametrically. What do you think?

• Seth Samuels says:

Kevin,

Interesting piece. Honestly, I don’t think it would really work with WPA for a few reasons. The first is just, most obviously, you’d have sample size issues, since your wOBA on the left side would be based on 4 or 5 plate appearances. Also, one of the main concerns with parametric methods is the need to rely on a particular set of assumptions about the distribution. I’d frankly be shocked if the error in a model like that were normally distributed. Not to say parametric methods are bad–they’re very useful. But it’s a weaker argument when assumptions don’t apply.

It’s also worth noting that I’ve tried some variation on this in the past (it was a while ago, so I don’t remember exactly, but I think I did a logit model on walks, based on like 20 previous PA’s, and didn’t find anything there. If you have the time and the interest, I would still encourage you to go ahead with it, or at least to keep tinkering. You never know what you might find.

36. Jon says:

Too many more articles like this and we’ll understand baseball and won’t need to read Fangraphs anymore.

37. Darryl Strawberry Fields says:

Here are a couple notes:
(1) We need to first establish an agreed upon definition of streakyness, because player performance always varies in baseball. To me streakyness is “statistically significant variations in player performance which occur in a random pattern.” Using this definition at least two of your streakiest cases are not streaky at all because the variation in their performances is not random. Therefor their data, and similar cases, should not be used in your year-to-year correlation.

(2) “The one relationship that was statistically significant was a weak negative correlation between streakiness and plate appearances (r = -0.061, p = 0.016). It is tempting to think that this may suggest that better players (who play more) are less streaky, but this is unlikely.” – Actually this correlation is just a statistical representation of something we know to be true. As n increases we approach the true mean of N.

This is a fancy way of saying that as the # of AB’s increase all players averages in all stat categories will regress towards that players mean for each category. Since there is a maximum of 162 games more AB’s in essence means more AB/Game. So if a player gets more AB/Game then their performance, when viewed on a game by game or series by series manner will appear more consistent. To look at it from a logical standpoint, If every player got 20 AB’s each game, then there would be a greater chance that they would post extremely consistent box scores.

When you look at it this way it makes incredible sense to see this correlation from a statistical standpoint. This correlation is not meaningless, but is totally anticipated and would be shocking if it were not statistically significant, or at least closely approaching statistical significance, every season.

(DAVID WRIGHT 2009) “Neither of the two David Wright seasons we looked at earlier makes the list, but his concussion-marred 2009 season was the streakiest in the league that year.” – In this case you variable is not streakyness. You are determining a players streakyness by analyzing performance statistics. These statistics, and your streakyness coefficient, are being confounded by a third variable.

Wright was injured. Injury affects performance. Your streakyness coefficient identified his performance variations as significant and therefore tagged his performance as streaky. Unfortunately your coefficient is neglecting the fact that these variations were not random. They were effected by a third variable which impacts performance. If you determined Wright’s streakyness coefficients for the smaller periods in between each of the injuries you are likely to get a very different value.

When you minimize the effect of the confounding variable (injury) Wright’s 2009 streakyness coefficients are each likely to approach those in other seasons. You don’t expect that performance statistics (especially counting statistics) would correlate well when comparing an injury marred season to a healthy season, so why would you expect that a streakyness statistic (if it exists) would correlate well from an injury marred season to a healthy one.

(BRENNAN BOESCH 2010) Brennan Boesch’s 2010 season wasn’t streaky in the least bit, but you list him as the streakiest player of 2010. Boesch was the model of consistency. He was consistently brilliant in his first 30-40 games or so, and consistently horrible the rest of the season. This change was due to a change in the way pitchers approached him, there is nothing streaky about it.

If you isolate his performance from before pitchers adjusted to him and seperated that data from his performance after made those adjustments you’d have two very different sets of statistics, and both would have similar streakyness coefficients. Boesch is not a streaky player because pitchers adjusted and he failed to adapt to that, he is just a younger player who’s talent was negated by pitchers who had figured him out.

Here your measure is again being affected by a confounded viable, although I have trouble giving that variable a name. It is not that common that a player is shut down as badly as Boesch was, but there was very little streak to it. Consistent brilliance and consistent darkness. Kind of like the extraordinarily predictable, and not streaky at all, rising and setting of the sun. So again your coefficient is identifying statistically significant differences, but failing to recognize that these differences are not occurring in a seemingly random pattern.

(4) So injury and other events in a players season are confounding your variables and really reducing the validity of this study. Without controlling for confounding variables there is no way to use statistics to represent a players streakyness. I use statistics exclusively in making fantasy determinations because I can’t watch a lot of games and because my eyes too often lie to me, only seeing what I want to see. Numbers can also be use to lie, and while your intention was to uncover the truth, you need to combine statistics and common sense in order to do that.

I am thoroughly surprised that you didn’t see these problems when Boesch and an injury riddled season popped up on your most streaky list. The only way to make any determination on the existence of streakyness requires using a very unscientific method. You need to pinpoint where the so called streaks which your correlation has identified are occurring in the season. Then you need to and attempt to correlate them with isolated events which you deem to have an impact on the players performance in the statistical category being evaluated.If you can correlate the players streak to a confounded variable you must control for that variable of exclude the data.

There is no way to do this objectively, and as such there is no way IMO that you can statistically evaluate streakyness. If you have any ideas for objectively eliminating (or significantly reducing) the confounding variables without eliminating too many valid cases then please let me know as I would be happy to brain storm with you.

The way I see it there is no truly valid point in time to say, this is the game pitchers figured Boesch out. Additionally how do you correct for playing a series in 100 degree arlington. This is likely to affect performance, but should the data from this series be excluded or controlled for. You could make a case, for it, but then you’d have to start controlling for playing in Minnesota in April. I just don’t see any way that you could statistically analyze streakyness, but I also see no reason to use this article and the data presented to discredit that some players are streaky. I know you say that this article is not meant to be absolute proof that streakyness doesn’t exist, and agree with you. I may even go one step further to say that this article provides very little proof at all, however the ideas are sound and it is an excellent premise for more statistical analysis.