Bartolo Colon’s Amazing Streak

We already talked about Cliff Lee and Matt Cain’s pitcher’s duel for the ages last night, but that wasn’t even the most remarkable thing that took place last night. Over in Anaheim, Bartolo Colon was doing something that we might not see done again in our lifetime.

In the fifth inning of last night’s game, Colon threw a first pitch ball to Maicer Izturis. He wouldn’t throw another pitch that was called a ball until he faced Bobby Abreu in the eighth inning. Between Izturis and Abreu, he faced 11 batters and didn’t throw a single ball to any of them. His all-strikes, all-the-time approach lasted a remarkable 38 pitches. You can see all 38 of them in this video compiled by MLB.com.

How unlikely is that? Well, we can estimate the chances of an event occurring 38 times in a row using a mathematical tool called binomial distribution. Essentially, binomial distribution takes the probability of an event occurring and then extrapolates how often you’d expect that event to happen a certain number of times given a number of opportunities. In this case, the probability of Bartolo Colon throwing a strike on any given pitch is roughly 67% percent. In other words, out of every three pitches Colon throws, we’d expect two strikes and one ball.

Last night, we got 38 consecutive strikes without a ball. Binomial distribution tells us that the odds of that occurring, given what we know about Colon’s career strike percentage, is about 0.000000246. In other words, you’d expect to find one string of 38 consecutive strikes if you had a population of approximately 4.1 million strings of pitches thrown by Bartolo Colon. One in 4.1 million.

Yeah. What Lee and Cain did was downright ordinary compared to what Colon did.

Update: As pointed out in the comments, I should have clarified that the binomial distribution assumes independence of events, where the results of one test do not affect the probability of the next test. It is not clear that balls and strikes are independent from the previous pitch, as batters are more likely to chase pitches out of the zone when they are behind in the count. Of course, pitchers are also less likely to groove one down the middle when they’re ahead in the count, so these effects may cancel out to some degree, but it’s not clear that the probability of balls and strikes on each of those 38 pitches was indeed .67. So, consider this more of a rough estimate based on one model’s assumptions, which may or may not hold precisely true in an MLB game scenario.

We hoped you liked reading Bartolo Colon’s Amazing Streak by Dave Cameron!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Dave is the Managing Editor of FanGraphs.

newest oldest most voted
Fiyaz Kanji
Member
Fiyaz Kanji

This estimation is NOT accurate. Binomial distribution assumes that the events are not statistically dependent. In this case, they clearly are.

Dave
Guest
Dave

What’s your reasoning? Each pitch is an independent event. Thus, the odds of 38 consecutive strikes is .67^38 = 2.46×10&-7, as above.

after9
Member
after9

Dave, each pitch is an independent event, but a pitcher is more likely to throw balls or strikes based on a count, so basically you can’t just say colon has an x% chance of throwing a strike and that’s it. Because his probability of throwing a strike does depend on what he’s thrown before (events are not statistically dependent).

However, I’d argue that this fact makes it even more impressive, because a pitcher is more likely to throw a ball on say, an 0-1 count than a 1-0 count.

Finally, I’d also like to suggest that it’s impressive how many times Colon adjusted the right side of his jersey after each pitch.

Dave
Guest
Dave

Just in case it confused anyone, I should start by saying I didn’t write the article, I just happen to also be Dave. Should have chosen a different name, sorry. Anyhow, yes, it’s based on the assumption that he’s trying to throw a strike every time, rather than intentionally wasting some in 0-2 counts, etc. That assumption isn’t necessarily wrong, though.

Jonny's Bananas
Guest
Jonny's Bananas

Dave (poster) is correct here. The events are definitely independent. What you could argue is that the underlying distribution may not actually have a q of 0.67, because Colon’s historical strike ratio might not have been the exact probability of him throwing a strike yesterday (for instance, if he had been throwing to that umpire for his entire career maybe his historical strike ratio would have been 0.75). Regardless, determining the true underlying distribution is impossible, so 0.67 works for me.

The one (slight) change in the number is that the streak doesn’t start until Bartolo actually throws a strike. If you assume that he will throw a strike at some point during a game, the number in question is the probability of throwing 37 consecutive strikes given that he has already thrown one strike, or 0.67^37. This gives you 3.67 x 10^-7, or 1 in 2.7 million.

Jason
Guest
Jason

The events definitely are NOT independent. Independent events are like flipping a coin. The results of the previous flip can not possibly have an effect on the next outcome.

Pitching in baseball is nothing like this. The results of the previous pitch and the game situation certainly dictate how a pitcher approaches his next pitch. We all know this to be true. Hitters especially. If pitches were independent events the pitch count wouldn’t matter to a hitter. Pitch counts matter because a hitter can better PREDICT what type of pitch he will see based on the count. The only reason he can do this is because the events are not independent.

Jonny's Bananas
Guest
Jonny's Bananas

You (and everyone else) are right that the events are not truly independent (I wasn’t really thinking about the count changing desired pitch location), but I think that there are a large enough number of variables in play that you aren’t going to do better for an estimation of the strike rate than a “normal” strike rate.