Bartolo Colon’s Amazing Streak

We already talked about Cliff Lee and Matt Cain‘s pitcher’s duel for the ages last night, but that wasn’t even the most remarkable thing that took place last night. Over in Anaheim, Bartolo Colon was doing something that we might not see done again in our lifetime.

In the fifth inning of last night’s game, Colon threw a first pitch ball to Maicer Izturis. He wouldn’t throw another pitch that was called a ball until he faced Bobby Abreu in the eighth inning. Between Izturis and Abreu, he faced 11 batters and didn’t throw a single ball to any of them. His all-strikes, all-the-time approach lasted a remarkable 38 pitches. You can see all 38 of them in this video compiled by MLB.com.

How unlikely is that? Well, we can estimate the chances of an event occurring 38 times in a row using a mathematical tool called binomial distribution. Essentially, binomial distribution takes the probability of an event occurring and then extrapolates how often you’d expect that event to happen a certain number of times given a number of opportunities. In this case, the probability of Bartolo Colon throwing a strike on any given pitch is roughly 67% percent. In other words, out of every three pitches Colon throws, we’d expect two strikes and one ball.

Last night, we got 38 consecutive strikes without a ball. Binomial distribution tells us that the odds of that occurring, given what we know about Colon’s career strike percentage, is about 0.000000246. In other words, you’d expect to find one string of 38 consecutive strikes if you had a population of approximately 4.1 million strings of pitches thrown by Bartolo Colon. One in 4.1 million.

Yeah. What Lee and Cain did was downright ordinary compared to what Colon did.

Update: As pointed out in the comments, I should have clarified that the binomial distribution assumes independence of events, where the results of one test do not affect the probability of the next test. It is not clear that balls and strikes are independent from the previous pitch, as batters are more likely to chase pitches out of the zone when they are behind in the count. Of course, pitchers are also less likely to groove one down the middle when they’re ahead in the count, so these effects may cancel out to some degree, but it’s not clear that the probability of balls and strikes on each of those 38 pitches was indeed .67. So, consider this more of a rough estimate based on one model’s assumptions, which may or may not hold precisely true in an MLB game scenario.




Print This Post

Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.

77 Responses to “Bartolo Colon’s Amazing Streak”

You can follow any responses to this entry through the RSS 2.0 feed.
Click here to view comments in a non-threaded output.
  1. fiyazkanji says:

    This estimation is NOT accurate. Binomial distribution assumes that the events are not statistically dependent. In this case, they clearly are.

    Vote -1 Vote +1

    • Dave says:

      What’s your reasoning? Each pitch is an independent event. Thus, the odds of 38 consecutive strikes is .67^38 = 2.46×10&-7, as above.

      Vote -1 Vote +1

    • after9 says:

      Dave, each pitch is an independent event, but a pitcher is more likely to throw balls or strikes based on a count, so basically you can’t just say colon has an x% chance of throwing a strike and that’s it. Because his probability of throwing a strike does depend on what he’s thrown before (events are not statistically dependent).

      However, I’d argue that this fact makes it even more impressive, because a pitcher is more likely to throw a ball on say, an 0-1 count than a 1-0 count.

      Finally, I’d also like to suggest that it’s impressive how many times Colon adjusted the right side of his jersey after each pitch.

      Vote -1 Vote +1

      • Dave says:

        Just in case it confused anyone, I should start by saying I didn’t write the article, I just happen to also be Dave. Should have chosen a different name, sorry. Anyhow, yes, it’s based on the assumption that he’s trying to throw a strike every time, rather than intentionally wasting some in 0-2 counts, etc. That assumption isn’t necessarily wrong, though.

        Vote -1 Vote +1

    • Jonny's Bananas says:

      Dave (poster) is correct here. The events are definitely independent. What you could argue is that the underlying distribution may not actually have a q of 0.67, because Colon’s historical strike ratio might not have been the exact probability of him throwing a strike yesterday (for instance, if he had been throwing to that umpire for his entire career maybe his historical strike ratio would have been 0.75). Regardless, determining the true underlying distribution is impossible, so 0.67 works for me.

      The one (slight) change in the number is that the streak doesn’t start until Bartolo actually throws a strike. If you assume that he will throw a strike at some point during a game, the number in question is the probability of throwing 37 consecutive strikes given that he has already thrown one strike, or 0.67^37. This gives you 3.67 x 10^-7, or 1 in 2.7 million.

      Vote -1 Vote +1

      • Jason says:

        The events definitely are NOT independent. Independent events are like flipping a coin. The results of the previous flip can not possibly have an effect on the next outcome.

        Pitching in baseball is nothing like this. The results of the previous pitch and the game situation certainly dictate how a pitcher approaches his next pitch. We all know this to be true. Hitters especially. If pitches were independent events the pitch count wouldn’t matter to a hitter. Pitch counts matter because a hitter can better PREDICT what type of pitch he will see based on the count. The only reason he can do this is because the events are not independent.

        Vote -1 Vote +1

      • Jonny's Bananas says:

        You (and everyone else) are right that the events are not truly independent (I wasn’t really thinking about the count changing desired pitch location), but I think that there are a large enough number of variables in play that you aren’t going to do better for an estimation of the strike rate than a “normal” strike rate.

        Vote -1 Vote +1

    • Dave Cameron says:

      Right, I should have clarified how binomial distribution works in the post rather than just settling for the link to the description. I’ve added an update at the end.

      I’d imagine that the odds of a ball or strike being called don’t actually vary that much by count, though. It’s an interesting thing to look into.

      Vote -1 Vote +1

      • Bobby says:

        Seriously? I’d bet a pitcher throws a ball outside the strike zone 80%+ on a 0-2 count against a non-pitcher.

        Vote -1 Vote +1

      • Johnny Come Lately says:

        But what part of that 80% is because the strike zone gets a lot smaller on an 0-2 count?

        Vote -1 Vote +1

      • Dave Cameron says:

        The question isn’t really where the ball is thrown, though. They might throw it out of the zone a lot more often on 0-2 counts, but hitters will also swing at those pitches more often. Since we’re just measuring ball/strike, not in zone/out of zone, the outcome of the pitch is what’s important, not the location.

        Vote -1 Vote +1

  2. This Guy says:

    Also, this is the probability that Colon, at the start of pitch 1 we’ve assumed is right now, will throw 38 straight strikes. The probability that someone, at some point, does this feat is a different question. This makes it a whole lot more likely we will see in in our lifetime…but still not very likely.

    Vote -1 Vote +1

  3. WarehouseWorthy says:

    Dave, shouldn’t it be: 1 in 4.1 million strings of 38 pitches instead of just 4.1 million pitches.

    Correct me if I’m wrong…

    Vote -1 Vote +1

  4. Matt says:

    But given there are about 300 pitches thrown per game, 2430 games per season, then something that only happens every 4 million pitches should turn up every 6 years or so.

    Vote -1 Vote +1

  5. Jason says:

    Dave,

    One way to think about low probability events calculated from a model (binomial in this case) is that the assumptions of the model aren’t actually reflecting what is real. If you consider the binomial the null hypothesis, you might think that something else is actually going on.

    The situation that the binomial models is very simple. Each pitch is released and there is a 67% chance of it being a strike and a 33% chance of it being a ball. Each pitch is independent from all other pitches.

    In reality, the assumption of independence between pitches is false. Pitch counts and batter reactions certainly effect pitch choice and location of subsequent pitches.

    Also, in reality, intent is important. Not every pitch is actually meant to be thrown in the strike zone.

    I’m not trying to argue that Colon’s strike throwing is not remarkable. It just probably isn’t as unlikely as you suggest.

    Vote -1 Vote +1

  6. Kevin says:

    Pitch 39 looked pretty close. You’d think after 38 straight strikes the ump would give him the benefit of the doubt there!

    +5 Vote -1 Vote +1

  7. Lou Kemia Reeterns says:

    Odds?

    Vote -1 Vote +1

  8. Big Daddy V says:

    You forgot the best part of this statistic.

    The previous record holder was… Tim Wakefield.

    +19 Vote -1 Vote +1

  9. Mick O says:

    [At some point in late July 2012, as Evan Longoria is trotting around third after hitting a three-run bomb in the first on an 0-2 fastball right down the middle]

    Kurt Suzuki: I called for something out of the zone. I thought we’d waste one there.

    Bartolo Colon: Surely you jest, sir. Why, we all know that the baseballs that I throw are governed purely by my statistical history. I implore you to revisit Cameron’s fine piece from back in April. My pitches are never a result of game situation or conscious decision on my part. I can no more choose to throw a ball in or out of the strike zone than I can choose the color of the gumballs that come out of that infernal contraption in the clubhouse when I deposit a hopeful solitary nickel. It is merely a function of the laws of the universe. There was a roughly 67% chance that pitch was going in the strike zone. And, my esteemed collegue, is exactly what happened.”

    +41 Vote -1 Vote +1

    • Bill says:

      Love it, but I think Dave does answer this objection even if he doesn’t substantiate it – batters are more likely to swing at a ball at 0-2 than they are at 0-0 or 0-1.

      Vote -1 Vote +1

  10. Jason says:

    After watching the video, someone has to tell Bartolo to stop throwing so many 0-2 strikes! Way too many hard hit balls on 0-2.

    Vote -1 Vote +1

  11. L.UZR says:

    Throw 39 pitches to Yuniesky and the record is yours.

    +9 Vote -1 Vote +1

    • chuckb says:

      That’s like 25 PAs though and he’s not going to get to hit 25 consecutive times.

      A team full of Yunis and Colon could probably throw a perfect game in less than 50 pitches.

      Vote -1 Vote +1

    • Dave Cameron says:

      Actually, it looks like that query is just picking up games where one team threw exactly 38 strikes in a game, not 38 consecutive strikes.

      Vote -1 Vote +1

      • This Guy says:

        If it is games with exactly 38 strikes being thrown (by one team), that would be interesting in its own right.

        Vote -1 Vote +1

      • Oliver says:

        Oh weird, it linked to team streak rather than individual streak of 38 strikes.

        It can’t a single instance where a player threw 38 consecutive strikes, presumably because Colon’s start isn’t in there yet, and it can’t find anyone with ?38 strikes in a row. So what was the most prior to this start?

        Vote -1 Vote +1

      • Oliver says:

        The previous “record” was held by Tim Wakefield, who threw 30 straight strikes in 1998.

        So Colon beat his record by quite a bit.

        Vote -1 Vote +1

      • Johnny Come Lately says:

        I think it’s even more amazing that Wakefield was the previous record holder. A knuckle baller with 30 consecutive strikes! Wow.

        Vote -1 Vote +1

  12. statmind says:

    Where does the 67% come from? Wouldn’t we be using his Zone% for this, which is 54% for his career (and 2012)?

    Vote -1 Vote +1

  13. markatoolio says:

    Dave Cameron — instead of using 67% strike rate, shouldn’t you use 1 minus the odds that Colon throws a ball for the % you use in your calculation?

    It seems like you are not accounting for pitches put in play that may have been balls. I don’t know how the data works though and I’m sure you do. For example, if a batter is beaned, is that pitch considered a ball or if a pitch is put in play is it considered either a strike or a ball?

    I’m sure I just don’t know as much about the data you are working with so set me straight!

    Vote -1 Vote +1

  14. markatoolio says:

    Also, what are two-strike pitches that are fouled off considered in your data?

    Vote -1 Vote +1

  15. kdm628496 says:

    so who wants to find the correlation between a strike in pitch(n) and a strike in pitch(n+1)? not it.

    Vote -1 Vote +1

  16. Rich Johnson says:

    Getting a bit off topic, but the former Angels Cy Young winner handed it to an Angels team that’s trying hard to find rock bottom in a different way each night. Scioscia is struggling with his juggling act concerning the bullpen and lineup. My statistical point is as follows: Colon’s performance against another silly looking lineup by the Angels has me curious if Scioscia will actually outdo Eric Wedge’s (SEA) 152 different lineups from 2011. Somehow Colon’s performance has me obsessing about lineup construction so much so that I mulled over manager tendencies in the 2012 Bill James Handbook for quite awhile this morning. What is the record for most different lineups in a single 162 game season? Mike averages about 120 a year. I believe his high was 133 in 2010, but I’m thinking 140+ is a slam dunk this season.

    Vote -1 Vote +1

  17. RJ says:

    Can we just agree it was VERY VERY impressive and VERY VERY rare and move on?

    Vote -1 Vote +1

  18. mcbrown says:

    “consider this more of a rough estimate based on one model’s assumptions”

    I’d say it is more like a very generous lower bound. But still impressively rare.

    I’d venture a guess that the reason we don’t see it more often isn’t because it is hard (though I’m sure it’s somewhat hard) but because pitchers generally don’t find it advantageous to throw so many strikes consecutively, and therefore never give themselves the opportunity to accumulate streaks like this. If Roy Halladay decided he would not have a pitch called a ball in his next start, I’m confident he could make that happen.

    Vote -1 Vote +1

    • Cream says:

      Roy Halladay is great. Has excellent control and command of his pitchers.

      That said, this is a stupid assertion.

      Vote -1 Vote +1

      • mcbrown says:

        Why? If the objective is not to minimize runs scored but simply to not have a ball called, do you really think someone like Halladay couldn’t accomplish the task?

        Vote -1 Vote +1

      • Rick says:

        Colon did it without allowing a run

        Vote -1 Vote +1

      • mcbrown says:

        @ Rick: As I said, it is impressive. I’m just suggesting that it’s more impressive in a “Hey, isn’t it kooky that we just witnessed that?” kind of way than in a “Bartolo Colon just accomplished something that no other pitcher in the history of the world has ever been able to do!!!” way.

        Vote -1 Vote +1

      • Rick says:

        It was kooky. But we’ll see other things like perfect games time and again; this we have never seen, and never will again (I’m presuming). It was impressive in a kooky/truly unbelievable kind of way.

        Vote -1 Vote +1

      • mcbrown says:

        That I can definitely agree with.

        Vote -1 Vote +1

  19. Eric Cioe says:

    Almost 40 comments and no joke about “COLON STREAK”?!

    +8 Vote -1 Vote +1

  20. Michael says:

    Other than the obvious dependence problems, this is also a multiple comparisons issue, or else your probability is simply the probability that he ran off such a streak at that specific time. Since he’s thrown just fewer than 25,000 pitches in his career (if I’m right), he’s had about that many strings of 38 consecutive pitches, so the odds that he’d ever do it some time in his career would be considerably raised (conservative bonferroni correction would be 25,000 * 1/4.1 million = approx .6%). Since we’re empirically seeing that this feat appears much more unlikely (per pitcher of bartolo colon’s career length) than that estimate in reality, it might suggest that the probability you suggested is on the high side. (i.e. the dependence actually makes it much harder to throw that many consecutive strikes) This might imply that the argument for the dependence improving the probability (that a pitcher can control pitches and might get in a “groove”) is outweighed by the argument for dependence hurting the probability (that a pitcher will be likely to throw more balls when he gets into favorable waste-pitch counts).

    Vote -1 Vote +1

  21. Rick says:

    This streak only seems possible if Leslie Nielson (aka Enrico Palazzo) is umpiring.
    Seriously, this streak is so freaking amazing, it’s nearly impossible fathom it. Going through the pitch-by-pitch for the rest of this same game for all other pitchers after the streak ended, I think the longest consecutive strike streak is 3 (that’s 35 pitches less).

    Vote -1 Vote +1

  22. Dennis says:

    I believe Joe DiMaggio’s hit streak ends before Colon’s consecutive strike streak.

    Vote -1 Vote +1

  23. Rob says:

    Colon’s “streak” is meaningless without compares.

    Vote -1 Vote +1

  24. Jason says:

    The more I think about this streak, the more it suggests to me that Bartolo Colon is not a very intelligent pitcher. If you watch the footage, Bartolo is not dominating the Angeles hitters. In fact he is getting hit quite hard. He gave up several hard hit balls, some of which were on counts that were very favorable to him.

    The lack of independence is important. Not because Dave used an inappropriate model to calculate the probability of the event, but because it says something about baseball strategy. Pitchers do not try to throw all strikes (in fact I have no doubt that Mariano Rivera could break Colon’s record tonight if that was his goal and he didn’t care about run prevention). Pitchers try to prevent runs and this often means pitching on the corners and intentionally throwing pitches outside the strike zone.

    Colon’s streak is remarkable because he seemed completely oblivious to the pitch count. …the only pitchers I can think of who are oblivious to pitch count are knuckle ballers. They throw a knuckleball every pitch and generally aim to throw it for a strike every pitch. In retrospective it is actually NOT surprising that Wakefield was the previous record holder despite how difficult it is to throw strikes with a knuckleball. He’s one of the only pitchers in baseball that tries to do it!

    My hypothesis predicts that Wakefield has many more long runs of consecutive strikes than other pitchers. It would be interesting to compare runs of strikes in Wakefield versus other pitchers known for their control (say Maddox or Halladay). We might be surprised to learn that Wakefield is actually more likely to reel off long runs of strikes.

    …of course, this makes you wonder, what the hell was Bartolo doing out there?

    Vote -1 Vote +1

    • pas299 says:

      He had a four run lead through most of the streak, and only one runner got into scoring position during it. That matters too–as you said, lack of independence. I haven’t watched a million Colon starts, but I’ve watched somewhere between five and ten in the last two years, and I watched the game last night. He was hit hardest during the streak and definitely got lucky–if it wasn’t an April night at low elevation he might have given up two home runs–but I would imagine he would have pitched differently if he had ever been in real trouble. He tends to start throwing more low changeups and sliders late in close games, and when he has runners in scoring position. And while it’s hard to get a sense of pitching intelligence, because the combination of movement and command are not the same as intelligence but can seem like it, I’ve always thought Colon uses his two-seamer “intelligently.”

      Having said all that, I agree that he somewhat overdid it. No reason why he couldn’t mix in some 0-2 changeups in the dirt or fastballs off the plate and see if someone would bite. You may or may not be aware that Colon has gotten considerably more strikeouts looking and less strikeouts swinging than the average pitcher since coming back last year. I’m not sure how much that’s a fluke and how much it’s the two-seamer, the low percentage of offspeed pitches, and the general strike-throwing approach.

      Good thought about Wakefield btw; seems at least possible.

      Vote -1 Vote +1

      • Jason says:

        pas,

        I got to watch Bartolo pitch all year last year. He still has really good stuff and top command. He did seem like he gave up a maddening number of 0-2 hits. …of course this may just be because he starts every batter off 0-2 and has more chances to give up 0-2 hits than other pitchers. …or it could just be my perception bias.

        Vote -1 Vote +1

  25. Non Sequitur Guy Says says:

    It must be indicative of something, besides the redistribution of wealth.

    Vote -1 Vote +1

  26. Ricky says:

    I could throw 38 straight 75mph fastballs that hitters could just crush every time.

    Vote -1 Vote +1

  27. Jesse says:

    Hahaha, the announcers were calling strikes before they crossed the plate. awesome.

    God I love baseball, and this site. Im having a statgasim here.

    Vote -1 Vote +1

  28. Austen says:

    I was at the game and I remember looking up during the 2nd inning and seeing Colon had a strike to ball ratio of something like 14:14 and became concerned that he was going to start walking guys. Then I remember looking up and he was at something ridiculous like 72:20 and suddenly realized I couldn’t remember the last ball he had thrown.
    It was remarkable, he was just throwing it right down the middle and the Angels hitters were just letting the first 2 go by them, AB after AB.
    Incredible. Between that and the Cespedes jack, all I need now is to watch a perfect game in person.

    Vote -1 Vote +1

  29. Strikes schmikes.

    What’s the record for consecutive pitches called a ball?

    Will the name Mark Wholers appear?

    Vote -1 Vote +1

  30. MD says:

    Ok, so not a really a helpful comment, but can anyone confirm the date of Wakefield’s 30 straight strikes? I think I was at that game, but I can’t find an actual date online (or even whether it was at Fenway or not).

    Vote -1 Vote +1

  31. Bryan says:

    As a Red Sox fan, I’m waiting to see someone throw two consecutive strikes this year.

    Vote -1 Vote +1

  32. Bartolo Colon 300/300 Club says:

    Guys have a hard time picking up the two-seamer with the release point just above his big gut.

    Vote -1 Vote +1

  33. Gman says:

    I watched this game and it was pretty amazing. As I watched the Angels hitters keep taking that 89 mph heater for strike one time-after-time I started thinking, has Colon figured out something here? The hitting trend toward being more disciplined and working counts is certainly at it’s peak in ML history. There used to be a whole lot more 1st pitch FB swingers before Money Ball/OBP/OPS than there is today. Now it has become so frowned upon by teams (and fans) when a guy makes an out on one pitch that I think the pendulum is firmly in favor of pitchers doing exactly what Colon did.

    Colon looked like he knew he could just toss that thing in the strike zone with complete impunity. He threw some quality 1st pitch strikes but he also put a whole lot of them in very hittable spots. I was just amazed that no one was going up there ready to swing, especially once he got up to 12-15 straight strikes. So many of them were fastballs too. When it got to 25 and they were still taking I was dumbfounded.

    Everyone is focused on the statistical rarity of what Colon did but the real question to me is how did he get away with it and why? There’s definitely a lesson to be learned by pitchers if they watch the replay of what Colon did. They’ll learn they can get a lot more of the plate with their first pitch against today’s hitters then they realize.

    Vote -1 Vote +1

  34. eastsider says:

    Sorry for coming to this late. I think another way of looking at this would be what is the longest sequence of strikes you would expect to see given n number of pitches and p probability of a strike being thrown. The equation is:

    R = log1/p (n)

    Using the 67% strike rate used in the article and a guess of 680,000 pitches in a year (2430 games with 280 pitches/gm) you’d expect to see 33.5 strikes in a row thrown every year.

    There are a lot of assumptions built in, both mentioned in the comments above and in the application of this equation. And, of course we don’t see 33 strikes in a row every season so all of those caveats – pitches are not independent, it isn’t beneficial to throw a strike every time, etc – are reasonable points.

    In case you are wondering, the number of pitches doesn’t matter much. It is a logarithmic equation so to get to 34.5 strikes in a row, you’d need to have 6.8 million pitches. But the probability is a huge assumption. Put that at 50% and you’d expect to see just 19.3 strikes in a row every season.

    Vote -1 Vote +1

  35. Aj Grands says:

    I’m trying to pass my stat class by taking all the concepts I can’t seem to understand and applying them to baseball. So I figured out binomial distribution by figuring out the probability of Colon’s streak, and then, after all that work…found this article.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*