The Odds of Hitting for the Cycle

Last week, Mike Trout hit for the cycle. When asked for a comment, coach Mike Scioscia said, “If I’m a betting man, I’ve got to believe there’s another cycle in his career somewhere.” That got me wondering.

Whenever I was in a math class where probability was being discussed, the question often in the back of my mind was, “How can this be applied to baseball?” One of the things I love the most about baseball is how well it lends itself to situations of probability, compared to most sports. I’m not sure what that says about me. Anyway, I figured this would be the perfect opportunity to refresh my memory (and hopefully some of yours) on how to crunch the numbers on situations like this. Don’t worry — the principles work on useful things other than just calculating the odds of that gimmicky achievement we call the cycle.

OK, let’s get right down to the math. This won’t be too hard, really. Kind of long, but hopefully worth learning.

First example: say a batter gets a hit 40% of the time overall, and makes an out the remaining 60% of the time. Let’s break down the odds of how 2 plate appearances of his will turn out:

 Results Odds Combined Odds PA #1 PA #2 PA #1 Result PA #2 Result Possibility #1 Hit Hit 40% 40% 16% Possibility #2 Hit Out 40% 60% 24% Possibility #3 Out Hit 60% 40% 24% Possibility #4 Out Out 60% 60% 36% Total: 100%

For example, the odds of both PAs resulting in hits is 40% multiplied by 40%, which equals 16%, or 0.16. But there are two ways (“permutations”) that can result in getting a hit and an out between the two PAs, and each has a 24% chance of occurring… so, together, there’s a 48% chance this player will bat 0.500 over his two PAs. The remainder is the 36% chance of going 0-for-2.

The example is a really simple one… it gets a lot more complicated when you’re dealing with, say, 7 PAs, and considering the odds of a single, double, triple, etc. This being math and all, of course there are formulas you can use as shortcuts for coming up with the number of permutations. The formula for coming up with the total number of permutations is: n^r (n to the power of r), where n is how many types of things we’re considering (in the simple example, it’s 2 — hits and outs) and r is how many events we’re looking at (2 PAs in the example). 2^2 = 4 total permutations here. If we were considering singles, doubles, triples, homers, and outs as the only possible outcomes (there are 5 of them), and were analyzing the possible ways these could be arranged in a span of 7 PAs, the answer would be 5^7 = 78,125. So, yeah, that wouldn’t be fun to calculate by hand.

That formula, by the way, is specifically for situations where repeats are allowed (a.k.a. “with replacement”); since there’s nothing really making it impossible for a hitter to get several outs in a row, we can use this here. However, when it comes to breaking down the number of specific types of permutations (e.g. 1 hit and 1 out over 2 PAs), there’s another formula we should consider: r!/(r1! * r2! * … *rn!) . By the way, I saw this formula written with n’s instead of r’s, but I think that’s just confusing, since the variable here is the number of events. The exclamation mark stands for factorial, which tells you to multiply that number by all the positive integers that come before it; e.g. 4! = 1 * 2 * 3 * 4 = 24 … in Excel, =FACT(4) will do the trick. All the different r’s in the denominator represent how many instances there are of each type of event. I think that could use an example:

So if we’re talking about a cycle happening over the course of six plate appearances, since the cycle is achieved in only four of those PAs, we have two “spare” PAs to consider. Let’s simplify the possible outcomes to 1B, 2B, 3B, HR, and non-hits. Possibilities for those two spares include:

• 2 singles
• 1 single, 1 double
• 1 single, 1 non-hit
• 2 non-hits

… and you can imagine the rest. But let’s look at each of those. If the two spares are both singles, then there are a total of three singles in the six-PA sample. There are only one each of doubles, triples, and HR, and no non-hits in that situation.

1! and 0! both equal 1, which means we can ignore everything but singles in the denominator. If we wrote it all out, though, it’d look like 6!/3!1!1!1! … notice the different r’s in the denominator add up to the big r in the numerator. Simplifying down, the formula we end up with is 6!/3! = 120 permutations. That means there are 120 possible sequences of 6 PAs that could result in 3 singles, 1 double, 1 triple, and 1 HR. You’ll see why that’s relevant in a second.

OK, let’s say we’re dealing with a hitter who singles in 20% of his PAs, doubles 5%, triples 1%, and homers 9% of the time. We start finding the odds of him hitting for the aforementioned combination by doing: .2 * .2 * .2 * .05 * .01 * .09 = 0.00000036 . Not very likely, right? Well, that’s really the probability of each possible arrangement of that combination, which we discovered are 120 of them. So multiply that result by 120 to show that he has an overall 0.0000432 chance (or 0.00432%) of hitting 3 singles, 1 double, 1 triple, and 1 HR over the span of 6 PA.

If we’re talking about a 6-PA sequence with 2 singles, 2 doubles, 1 triple, and 1 HR, that’s 6!/(2! * 2!) = 180 permutations. The odds are therefore .2 * .2 * .05 * .05 * .01 * .09 * 180 = 0.0000162.

If it’s 2 singles, 1 double, 1 triple, 1 HR, and 1 non-hit, then there’s only one repeat, and it’s 6!/2! = 360 permutations. The odds of a “non-hit” are 1 – .2 – .05 – .01 – .09 = 0.65. So our odds are .2 * .2 * .05 * .01 * .09 * 0.65 * 360 = 0.0004212. A lot likelier than going 6-for-6, right?

Finally, let’s consider that the 2 PAs other than the cycle are non-hits. The non-hits are the repeat this time, so it’s again 6!/2! = 360 possibilities. Now, .2 * .05 * .01 * .09 * .65 *.65 *360 = 0.0013689. Our likeliest way to get a cycle by far.

Now you have to repeat the process for combos involving things like 3 triples (not likely), 2 doubles & 2 HRs, etc., but hopefully you get the point. The permutations will follow the same patterns, but the odds calculations will differ. But after you figure them all out, you add them all up and it gives you the total odds of that player hitting for the cycle given that many PAs.

The next step is doing the same sort of procedure for different given PA levels. You know that a cycle is impossible if you only get 3 PA in a game, so you can skip that. At 4 PAs, it’s really simple — there’s only one combination that allows you to get the cycle, and there are no repeats among them. There are 4! = 24 permutations of {1B, 2B, 3B, HR}. At 5 PAs, we’ll either have 1 non-hit in the mix — 5! = 120 — or there will be 1 repeat of a 1B, 2B, 3B, or HR — 5!/2! = 60. And so on. The next step is finding out how likely a player is to get 4 PAs, 5 PAs, 6 PAs, etc. in a game. I just did an analysis on 2012 Retrosheet data and found (by lineup position):

# of PAs in Game Leadoff 2nd 3rd 4th 5th 6th 7th 8th 9th Average
2 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.1% 0.0%
3 0.2% 0.5% 1.4% 3.2% 5.8% 10.7% 17.0% 25.7% 34.8% 11.0%
4 44.5% 52.6% 60.1% 65.7% 69.6% 69.9% 67.7% 62.7% 56.2% 61.0%
5 48.6% 41.5% 34.2% 27.8% 21.8% 17.0% 13.3% 9.8% 7.4% 24.6%
6 5.5% 4.3% 3.4% 2.7% 2.3% 1.9% 1.6% 1.4% 1.2% 2.7%
7 0.9% 0.7% 0.6% 0.5% 0.4% 0.4% 0.3% 0.3% 0.3% 0.5%
8 0.3% 0.2% 0.2% 0.1% 0.2% 0.1% 0.1% 0.1% 0.1% 0.2%
9 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
10 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

As you can see, hitting towards the top of a lineup can make a huge difference in how often a player will get those crucial 5+ PA games.

The Results

Using the 2012 PA breakdowns from above and MLB averages for 2010 through last week, I found the average odds of hitting for the cycle for a hitter with an “average” lineup spot to be about 0.0044% per game, or about once every 23,000 games. Well, maybe you can bump those odds up a little bit, because I didn’t consider the results of 8 PAs in a game and beyond. For a leadoff hitter (with the same MLB average stats, not with typical leadoff-hitter stats), it would be about 0.0071% per game, or close to once every 14,000 games.

But it turns out that Trout appears to indeed be the likeliest batter in the majors to hit for the cycle, with a neutral lineup position. He’s been hitting 2nd in the lineup recently, but if he were hitting first, you might figure him for about a whopping (relatively) 0.0375% chance of the cycle per game, or better than once per 2,700 games, based on his career rates. Add in the fact that he’s in a good lineup — and should therefore get more PAs than average — and things look even better for him. But even if we optimistically put him at a cycle per 2,500 games, that’s about once every 16 seasons, on average. Since triples are the hardest part of hitting for the cycle, we have to wonder how easy it will be for a bulky Trout to leg out a triple as he advances in age. It’s not going to get any easier — that’s for sure. So, sure, there’s a pretty good chance he’ll have another cycle, relative to most players, but it’s probably a 50-50 shot, at best.

Oh, 4th place on that list, by the way — Bryce Harper, at better than once per 3,600 games. The projections see him hitting fewer triples than he showed us last year, though.  Here are your top 25 by chance of a cycle per game (based on 2010-present historical numbers, signified by “H”, or by the average updated Steamer and ZiPS 2013 projected numbers “P”, with a 400 PA minimum):

Leadoff (H) Leadoff (P) Mid-lineup (H) Mid-lineup (P)
Mike Trout 0.03754% 0.04023% 0.02331% 0.02495%
Tyler Colvin 0.03452% #N/A 0.02133% #N/A
Carlos Gonzalez 0.03360% 0.03402% 0.02090% 0.02112%
Bryce Harper 0.02824% 0.01958% 0.01748% 0.01212%
Carl Crawford 0.02759% 0.01398% 0.01712% 0.00867%
Jose Reyes 0.02351% 0.02053% 0.01461% 0.01276%
Josh Hamilton 0.02349% 0.01411% 0.01459% 0.00871%
Dexter Fowler 0.02271% 0.02206% 0.01405% 0.01364%
Seth Smith 0.02102% 0.00653% 0.01300% 0.00404%
Yoenis Cespedes 0.02065% 0.01379% 0.01279% 0.00852%
Shane Victorino 0.02062% 0.01106% 0.01276% 0.00685%
Ryan Braun 0.02028% 0.01888% 0.01262% 0.01172%
Corey Hart 0.02027% #N/A 0.01256% #N/A
Carlos Gomez 0.02012% 0.03331% 0.01245% 0.02067%
Todd Frazier 0.01996% 0.01141% 0.01234% 0.00704%
Curtis Granderson 0.01885% 0.01216% 0.01163% 0.00750%
Robinson Cano 0.01842% 0.00807% 0.01145% 0.00501%
Logan Morrison 0.01800% #N/A 0.01111% #N/A
Peter Bourjos 0.01797% #N/A 0.01110% #N/A
Will Venable 0.01792% 0.01449% 0.01107% 0.00894%
Brett Lawrie 0.01754% 0.01707% 0.01086% 0.01054%
Stephen Drew 0.01736% 0.01452% 0.01072% 0.00895%
Troy Tulowitzki 0.01699% 0.01187% 0.01055% 0.00738%
Andrew McCutchen 0.01676% 0.01215% 0.01039% 0.00753%
Melky Cabrera 0.01604% 0.01220% 0.00997% 0.00757%

Here are some actual, historical stats:

And a downloadable spreadsheet for you, if you want to see all of my calculations (watch out — it’s not pretty, and the numbers are from last week):

Some Caveats:

• We can’t really be sure how relevant a player’s past rates are to their future odds (especially triples, since there are so few of them). You can try out Steamer or ZiPS projections in place of past performance, if you download the spreadsheet.
• Park and lineup effects can make a big difference (changing teams can change the odds)
• I’m sure PA frequency breakdowns aren’t entirely consistent between years, yet mine are based on only 2012 data
• I only worked this out through 7 PAs in a game.  Obviously, if you get 8 or 9 PAs in a game, your odds of a cycle go up considerably… but that rarely happens, especially over 9 innings.
• Triples rates are basically the deciding factor here, and they are hard to predict.

Print This Post

Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?

Guest
Thufir

Babe Herman laughs at your calculations. From the grave.

Excellent post, my money is on Trout.