Understanding Your Patterns

Sometimes, when I’m supposed to be working, I read things that don’t have anything to do with baseball. Sometimes I’m able to salvage that lost time by twisting my new information into a vaguely baseball-y angle. So it’s been today, when, this morning, I scanned Erik Klemetti’s Eruptions blog. There’s a good new post up, focusing on the matter of trying to predict earthquakes around the globe. (Hint: don’t do it.) I can’t think of a way to write about baseball-y earthquakes. But within that post, toward the start, is a discussion about patterns, and the perception of them where they sometimes don’t exist. Now this — this could be something to put up on FanGraphs.

Contained within the post is a link to this piece at Scientific American. The author talks about “patternicity,” or, as he puts it, “the tendency to find meaningful patterns in meaningless noise.” This might be a pretty familiar concept to you, and the author advances an evolutionary argument for its existence. There’s a reason, it’s asserted, that we’re so good at finding patterns. There’s a reason we try to find patterns where no patterns exist.

I don’t need to review everything — the post isn’t long, and you should just read it. But we know that the human brain looks for patterns, all the time. The ability to spot patterns is selected for, on account of the benefits with regard to reproduction and healthy living. We’re all pattern-seekers, but we can’t and don’t really seek them selectively. We’re not just wired to look for significance when dealing with matters significant. We’re wired to look for trends everywhere, sometimes when it matters and sometimes when it doesn’t.

We might accept or reject a pattern, but we’ll accept most, because it’s better to accept too many than not enough. As Klemetti puts it:

Or, in other words, it is better to believe wrong and right things (and thus get all the right things) than accidentally miss some of the right things.

This is the terrible segue into talking about baseball. You’ve probably heard before about how eager we are to look for patterns in baseball data. It’s easy and inviting, because baseball generates so much information, and so many repetitions. There’s virtually limitless information for every player and every team, and plenty of people consider much of that information. In it, they see things. They see patterns, they see trends, they see “streaks” as we might call them. They see data that might be significant. Sometimes, it is significant. Sometimes, it is not.

In a way, you could define sabermetrics as the identification of baseball patterns, and the testing of their significance. That might be too narrow, but it’s a lot of what takes place. This guy’s streaking. This team’s underperforming. The league is doing more of this and less of that. What does it mean? Does it mean anything?

The reality is that we all look for patterns all the time, and we think of patterns as being meaningful, as being indicative of something. The reality is also that, at least with baseball, so much is just random noise. The reality is that we need to learn to accept randomness. The reality is that we’re wired not to.

It’s weird to talk about an evolutionary concept and relate it to baseball, since baseball doesn’t matter with regard to survival, but again, we’re not selective. We don’t only look for patterns in potential mates and when we’re picking out food. When confronted with information, we’ll look for an arrangement, and we’ll accept the pattern if the benefit outweighs the cost. Say Player X is on a hot streak. We can either accept the pattern or reject it. The cost of accepting it as legitimate is minimal; you don’t stand to lose anything by being wrong. By being right, you might experience some sense of satisfaction.

This could probably stand to include a better, real example. I don’t mean to pick on Ken Rosenthal, but I’m going to use him here. Rosenthal wrote about how the Angels should keep Mike Trout in center field, even when Peter Bourjos is healthy. Part of his argument is that, this year, Trout has posted a .738 OPS from left field, but a 1.055 OPS from center. Rosenthal has accepted this pattern as meaningful. If he’s wrong, it’s not a big deal; it’s just baseball stats being funny, and it doesn’t destroy his whole argument. If he’s right, Rosenthal has a more solid argument, and it’s fairly original, and it reflects well on him. The benefit outweighs the cost.

And it’s probably just randomness. Randomness that Rosenthal has interpreted as non-random. Last year, Trout posted a 1.040 OPS from left, and a .946 OPS from center. For his career, the OPS split is 62 points, based on small samples, and it’s almost entirely BABIP. Trout, probably, is not a meaningfully better hitter when he gets to play center, but the numbers are arranged in such a way that it’s appealing to us to see something. We want there to be something — that’s how we’re programmed, and it’s not our fault.

It’s interesting to apply the costs and benefits to people actually within the game of baseball itself. Take, for example, old-school managers, who love batter-vs-pitcher splits. It’s been demonstrated that batter-vs-pitcher splits are meaningless, in terms of predicting future events. But managers will sometimes make objectively wrong decisions because of what that data says to them. They’ll start the wrong guy or sit the wrong guy, based on the history against a given arm. We’re terrible at determining what’s actually significant, so we err on the side of pattern acceptance. You could argue that a manager might upset a player if he, say, doesn’t play him against a pitcher he’s hit in the past. That’s a clubhouse issue. If a manager listens to the numbers and it backfires, though, he has security in the data. Only a subset of fans would be upset, and to the manager, the fans don’t matter. This can all get very complicated and fascinating.

Stepping back, more generally: we all look for patterns. This is how we are, and there’s no getting around it, because it’s coded into us. The reason you’re sick of hearing about small sample sizes is because people love to point out patterns they observe over small sample sizes. This is why, every year, we have to write about the relative meaninglessness of spring-training stats. So much is random, and people refuse to see it that way. It’s critical, in baseball and in everything, to be willing and able to accept randomness. It might not be your first instinct — it won’t be your first instinct — but seldom should you just react instinctively. Be aware of the fact that you’re looking for the patterns you observe. You won’t be aware at the time, because it’s built right into you, but remind yourself before you try to draw conclusions. Any conclusions. Nobody lies to you more than you.

Print This Post

Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.

Sort by:   newest | oldest | most voted

Couldn’t agree more! But how many times will it be written/said/tweeted, “I know this is a small sample, but..”

But I always keep this in mind whenever I’m about to say anything about sports: XKCD/904, and baseball in particular.


Better link, since it includes the mouseover text: http://xkcd.com/904/


Whoops, thanks, much better link!