Using Statistics to Forecast the Death of Baseball

In the idyllic indie film Terminator 2: Judgment Day, director James Cameron tells the story of Skynet, a computer which has been created to ease the tedious labor of a shambling, bone-weary humanity. Skynet is doing great, fixing routine traffic congestion and playing Zaxxon, until it attains self-awareness on August 29, 1997 and proceeds to nearly eradicate all life (if not for a couple of meddling humans). Though much of the film was realistic, particularly in its depiction of how cool mercury looks, this particular plot point was hard to swallow. After all, computers have been ruining things long before 1997.

Take chess, for example. Chess has beguiled and tormented the great figures of history since it evolved from shatranj in the thirteenth century. For seven hundred years, people played chess according to various “styles”, having “fun” by playing risky gambits and discovering breathtaking and unforeseen combinations. This means they were playing suboptimally. Once the computer arrived, it took only a handful of decades to distill the game down to the memorization of thirty-five move opening books and a demand for a heartless positional struggle slithering toward an inevitable rook-and-pawn endgame.

garry-kasparov-deep-blue-ibm_lowres-detail-main

Deep Blue and his pals aren’t necessarily killing baseball, because baseball is doing that itself, with its three true outcomes and five-hour games. But they do open up the possibilities of statistical calculation, which previously demanded far more arithmetic than the average person could do by candlelight. Now we can dump all the numbers of existence into a single spreadsheet, spend half an hour formatting the data, and arrive at the horrible truths that await us in a previously mystifying and vaguely interesting future.

If you aren’t taking the free Sabermetrics 101 course being offered online, here’s a quick run-down. By graphing a series of numbers over time or jersey color or another variable, we can establish the line that fits those points better than any other line. We do this by calculating the sum of the squares of the difference between each point and our line, thus minimizing the difference between them. Or, we right-click on the graph and tell the computer to do it, which is much easier. What we arrive at is what mathematicians call the “real” pattern, or the “true talent line.” Then, we can talk about how everything will regress, and that nothing interesting will ever last.

It’ll also spit out something called an R-score, which basically tells you how likely it is that your line is really correct, and not thrown off by things like outliers, or luck, or steroids. The numbers range from 0 to 1, with 1 being a perfect “correlation.” If this seems too complicated, don’t worry. Anything about a 0.3 or so is perfectly fine.

Now that I’ve provided you with five credits at your local community college, let’s look at some future truths about the game of baseball. If you do not enjoy spoilers, and want to be less educated than other people as you wait to see how things turn out, I suggest that you stop reading and find some syndicated Tank McNamara strips to read.

Still here? Let’s start with the triple.

graph1

Triples have been disappearing at a steady rate since 1945, which is when I bothered to start charting this graph. “Why 1945?” You might ask. “Isn’t that an arbitrary endpoint?” The answer is no. Everybody uses this endpoint. There was a war and everything. Why wouldn’t you start there? That’s why they call it the Modern Era.

At any rate, the True Talent Line is very straight and has an R-squared of 0.82, which means you should probably go out and make some binding, long-lasting wagers with your bar buddies right now. The last triple will be hit in late 2100, and the word will be forgotten, along with all other non-numeric or acronym-based descriptors of sporting accomplishment, by 2185.

graph2

Our second topic is when no-hitters will become more common than complete games. You’ll notice that the no-hitter line has an R-squared of 0.02, which seems really bad, because in a given year there might be seven or zero or three – there’s really no pattern to it. But we’re still looking at a number between zero and seven. Ben Revere’s home run totals have fluctuated wildly between zero and one over his career, but we can still kind of get a feel for him.

According to this graph, we’ve only got four more years of complete games left. If you attend one, you may want to take pictures. “But wait,” you might interrupt if you’re a smart aleck. “That blue thing doesn’t look like a line. It looks like a… look, high school math was a long time ago, and I spent all my class periods drawing centaurs on my spiral notebooks. But it’s supposed to curve, I think.”

Fine, I guess 0.8662 isn’t good enough for you. Have this instead.

graph3

This time we’ll use a logarithmic curve. In this case, Kevin Millwood-style seven-man no-hitters will become the norm in 2050, which means some of you won’t have to live with the indignity.

Finally, a graph that may provide a little optimism.

graph4

Much has been made of the pitcher, and his inability to pick up or say the word “bat” without hurting himself or loved ones. In an age of specialization, there is no time for the pitcher to do both halves of his job. Assuming, as we have and can, that all variables in the future will remain as they are today, that the DH will never reach the NL and that robots will never replace human beings, it seems inevitable that the pitcher is doomed to become a literal, rather than a figurative, automatic out.

Comparing a pitcher’s batting average to their GIDP rate, however, tells us that we have 140 years before the danger of making two outs on contact causes pitchers to go full Gaedel. We’ve got plenty of time to kick that can down the road.

So baseball is essentially in decline, and math tells us that each game will someday be comprised of three home runs, four walks, and twenty-seven strikeouts by sixteen pitchers per team. The team that wins will be the one that happens to distribute its walks directly in front of its home runs on a given day. But on the bright side, it’ll still outlive football.




Print This Post

Patrick Dubuque writes for NotGraphs and The Hardball Times, and he served as former Bill Spaceman Lee Visiting Professor for Baseball Exploration at Pitchers & Poets. Follow him on Twitter @euqubud.


14 Responses to “Using Statistics to Forecast the Death of Baseball”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Art Vandelay says:

    Seems legit.

    Vote -1 Vote +1

  2. frivoflava29 says:

    “If this seems too complicated, don’t worry. Anything about a 0.3 or so is perfectly fine.”

    These are the exact words my marriage counselor used when my wife was asked to rate our relationship.

    +8 Vote -1 Vote +1

  3. BenRevereDoesSteroids says:

    One of these days no hitters will be more common than complete games, because teams will start pitching their pitchers one time through the lineup each. The “Starting” pitcher will only be allowed to pitch 3 innings max, if he is perfect. They’ll do this for a variety of reasons.

    1. Limit pitcher injuries.
    2. Maximize on the phenomenon on pitcher performing worse every time through the order.
    3. Suppress salaries, particularly arbitration numbers.
    4. They will be able to to pinch hit for the pitchers’ spot in the order EVERY time through the lineup.

    I’m telling you guys, it is coming. I’ve been calling this for years. Nevermind the tin foil on my head, dammit, this thing is COMING!

    Vote -1 Vote +1

    • joser says:

      4. They will be able to to pinch hit for the pitchers’ spot in the order EVERY time through the lineup.

      Uh, dude, a bunch of teams have been doing that since 1973.

      Vote -1 Vote +1

    • Randy Marsh says:

      “We… We didn’t listen!”

      “Why didn’t we listen?”

      Vote -1 Vote +1

    • ReuschelCakes says:

      2. Maximize on the phenomenon on pitcher performing worse every time through the order.

      Well, worse than his first-through-the-lineup self… but also possibly much better than the alternative pitcher(s)…

      Vote -1 Vote +1

  4. scb says:

    I’m disappointed that there is no mention of the era of negative triples predicted by your line in the post-2100 (apocalyptic?) landscape.

    Vote -1 Vote +1

  5. Grant says:

    Curvy things to the rescue?!

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>