## Could Chris Davis Match Roger Maris?

Chris Davis, with 37 home runs so far this season, has been generating a lot of buzz lately — both on the field and more recently with some comments he made during the All-Star break. When he was asked about the all-time home run record, Davis said:

“In my opinion, 61 is the record, and I think most fans agree with me on that.”

I have no idea if most fans agree with him, but it probably shouldn’t be surprising that a guy within spitting distance of a 61 home run season would view that as the mark to beat — rather than 73 home runs, which is essentially out of range. So, just for fun, let’s figure out what Davis’ chances are of reaching Roger Maris.

At Tom Tango’s website, there was a discussion that tried to put a number on Davis’ chances of reaching that mark. Tango performed a “quick back-of-envelope calculation” to do so, but today, I’ll be providing you with an interactive tool that might make it easy for you to perform a more sophisticated calculation for situations like this (and many other types of situations).

Retracing my footsteps: Davis needs at least 24 home runs to reach or surpass 61. That’s the first entry that needs to be made, in the “number of successes” box. The next step is to enter the number of trials — in this case, plate appearances (PA). Next to that box, you’ll enter your best guess as to the probability of the player getting that number of PAs. You can enter as many as 10 possibilities, or as few as one; just make sure the probabilities you enter add up to 100%. Repeat the same procedure for the “true rate estimate,” which for this exercise is home runs per plate appearance.

For the assumptions I used, the calculator figures there’s only an 8.56% chance of Davis hitting at least 61 homers this season. But feel free to change the assumptions, either on this page or by downloading the spreadsheet.

### What is Davis’ True Home Run Rate?

First of all, what is a “true rate?” Well, for a fair flip of a fair coin, we intuitively know the true rate of heads is 50%. That doesn’t mean that if you flip a coin 100 times, it will come up heads 50 times; in fact, the binomial distribution that my calculator is based on says there’s only about an 8% chance of exactly 50 heads coming up over 100 flips. It also says there’s almost a 37% chance of the number of heads being at least five away from 50, after 100 flips.

You can try that yourself in the calculator by entering “100” as the number of trials, giving that a 100% probability, and setting “.5″ as the true rate (also at 100% probability), with “55” as the number of successes (heads, in this instance). The answer box will say that there’s an 18.41% chance of getting at least 55 successes. You can just double that number to get the chances that it will be at least five away from 50 in either direction.

Davis is not a coin, unfortunately, which makes it a lot harder to intuitively pinpoint his true home run rate. If he were a coin, his true rate would probably be very close to zero, since coins are very bad at hitting home runs. Anyway, here are some of his particulars to consider:

HR/PA | |
---|---|

2013 to date | .094 |

2012 | .059 |

Steamer RoS | .060 |

ZiPS RoS | .062 |

Career | .056 |

“RoS,” by the way, is the updated projection for the rest of the season for each projection system. Clearly, ZiPS and Steamer aren’t buying that he can keep up this pace. For further context, the MLB leader in HR/PA last season was Josh Hamilton, at 0.068. Davis’ 0.059 was good enough for seventh in the majors. For my assumptions, I stuck pretty close to the ZiPS and Steamer RoS numbers that I believe to be the best guesses of his true ability — though I tried to err slightly on the side of, “He may have legitimately made a big improvement,” so the probability weighted average of my assumptions comes out to about .064.

One factor you have to worry about is pitchers may start pitching around Davis or intentionally walking him more often as his reputation grows, which will of course hurt his HR/PA. I haven’t done the research to project how that might affect him, but yeah, it matters.

### Projecting PAs

Chris Davis has been regularly hitting in the fifth spot in the Orioles’ lineup. If that continues, he’ll have fewer plate appearance to work with than if he’d hit higher in the lineup. All else equal, getting fewer PAs would certainly hurt his chances at reaching the 61-homer milestone. The Book says the five-spot in the American League gets 4.39 PA per game. That could use a bit of updating, though, since it was based on numbers from 1999 to 2002 seasons (basically the most offense-heavy era in modern baseball). The 1999 to 2002 AL teams averaged .340 OBP; the 2013 Orioles have a .316 OBP. Clearly, there will be fewer PAs to go around when players aren’t getting on base.

Some comparisons, which include the third lineup spot that Davis could hypothetically be moved up to:

OBP | PA per Game | ||

3rd spot | 5th Spot | ||

2013 Orioles | .316 | 4.39 | 4.16 |

2012 Orioles | .311 | 4.45 | 4.25 |

1999-2002 AL | .340 | 4.61 | 4.39 |

You may be wondering why the 2012 Orioles had more PAs despite a lower OBP than this year’s team. I think the most obvious explanation is that last year’s Orioles got into a ridiculous number of extra-inning games — 11.1% of their total games, compared to only 8.3% this year. Last year, the Orioles pitched 9.15 innings per game, compared to 8.91 this year. I looked at Baltimore’s double play and caught stealing numbers on offense to see if they also contributed, but it turns out they’ve actually improved substantially in the GIDP department this year: 1.58% of PAs this year vs. 2.47% last year.

Anyway, the weighted average of my assumptions comes out to 253 remaining PAs for Davis this season, which means I expect him to average around 3.83 PAs for the team’s remaining 66 games. That’s the result of me unscientifically factoring in the chance that he’ll get some days off or get injured. Steamer and ZiPS are more pessimistic, figuring him for 199 and 241 PAs, respectively.

### Results

As you see in the web app above, my assumptions predict an 8.56% chance of Davis reaching or surpassing that 61-homer mark this season. Here’s a more complete list of some of the other HR levels it expects for him:

Davis’ Final 2013 HR Minimum | Estimated Probability |
---|---|

51 | 67.5% |

56 | 30.2% |

60 | 11.4% |

61 | 8.6% |

62 | 6.4% |

66 | 1.7% |

71 | .2% |

### Thoughts

I hope you’ll be able to get some use out of the web app here, since I think it has a lot of uses in baseball stats (and other things). Maybe I didn’t look hard enough, but the ability to account for uncertainty in your estimates seems like something other online binomial calculators you might come across online don’t have. If you’re wondering — yes, entering the whole distribution can make a big difference over simply using the weighted averages — using only the weighted averages of my assumptions would produce just a 3.43% chance of Davis hitting at least 61.

Of course, there’s uncertainty within the uncertainty, especially with the assumptions I made in this exercise. These were just semi-educated guesses, without a ton of research put into them.

As always, thoughts are welcome. I’ll be hanging around the comments section where I’ll attempt to answer questions.

Print This Post

Nice article Robot Steve. My head just exploded. Anyway, why don’t you give YOUR best wag as to how many Davis ends up with. And please, no reference to Streamer or ZIPS. We already have that. Even if it might be the most accurate in your opinion.

He did. 8.56%. See above.

That wasn’t the question.

It was answered in the article. He projected .064 HR/PA with 253 PA’s. That projects Daivs to hit 16.192 HR’s the rest of the season.

Using this I would say Steve is projecting Davis to end up with 53 HR’s

Thanks! So people actually do read those bio lines! Maybe I’ll have to think of a real one now…

Yeah, like Kogoruhn says, a little over 53 is the average for my assumptions. Since I tend very slightly towards optimism, I’ll round up and say 54, though.

Also, no major leaguer has ever hit exactly 53 home runs in a season, while there are 7 seasons with exactly 54.

Ha! Nice catch. There’s a record he can shoot for.

With a little modification to the calculator, it’s telling me there’s about a 7.7% chance of him hitting exactly 53 homers, based on my assumptions. So apparently, hitting 61 is a slightly more obtainable record for him than hitting exactly 53. You know, assuming he keeps trying after hitting #53…

He got it!

Haha, awesome!

Can a head expload twice. I was going to ask why the totals of the probabilities added to >100%. But after reading this it seems pretty clear I wouldn’t understand anyway. Ha! However, isn’t 8.56% the probability he hits 24 hr’s? And now looking a little closer I see the 0.0637*253=16…

Although I wasn’t 100% certain Steve would agree with his own math and might simply have his own opinion. That sounds a little insulting. It isn’t meant to be.

Do we re-visit after the season to maybe explain any outlier that may have occured?

Good stuff and thanks

8.56% is supposed to be the probability that he hits 24

or morehomers the rest of the season, if that’s unclear. It adds up the probabilities that he hits exactly 24, or 25, or 26, etc. That’s called the cumulative binomial distribution.The main thing I did in that response to Ramorda, FYI, was to change the “TRUE”s in my BINOMDIST formulas to “FALSE”s (by find and replace) because “TRUE” makes it use the cumulative binomial distribution. I added 1 to the number of successes, then looked at cell S17 for the answer (long story). That method tells me there’s a 2.2% chance he hits exactly 24 the rest of the way.

Which probabilities are you talking about that add to >100%?

In the “results” sheet. But now I think those are the % he hits at least 51 or 56…So I’m catching on.

Thanks again

This is neat – thanks for sharing the Excel spreadsheet and giving us the opportunity to fiddle around with the parameters.

You’re very welcome! Minutes of fun, right? Hope it serves everybody well.

Very cool stuff. I wanted to see how much Davis improved his odds with his four-straight games with HRs before the break, so I input 28 successes and 260 PAs. That put his odds at 2.4%.

That’s a pretty huge improvement over four days. If he has another binge or two like that the rest of the way, these odds could go up significantly.

Those 4 home runs are almost 7% of 61 home runs. That it increased his probability of getting to 61 by 6% isn’t surprising.

Wait, I’m going to have to disagree on the grounds of logic. The All-Star break has always been the ceremonious “halfway point” of the season. Therefore, by simply taking 37 (current HR) x 2 (halfway point of season), we can all safely say there’s a good chance he will hit at least 74 home runs (37×2). This is all assuming he stays healthy and maintains his will to win.

yeah half of 162 is 90 something

good one math major

Very nice subtle trolling there champ.

Including that “will to win” bit at the end there makes me think that this comment was posted with tongue planted firmly in cheek. Or at least, I certainly hope so.

Thank you, I was hoping it was pretty obvious.

Your implicit assumption that all FanGraphs’ readers have a sense of humor is not supported by objective analysis. Please factor that in to your calculations next time.

You mean “Fangraphs’ readers have senses of humor.”

*Haughtily pushes taped-up glasses up the bridge of the nose*

just curious, but considering his little hr streak before the break artificially inflated his hr/pa numbers, what sort of projection would he have if we were to lay his hr and pa on a graph and get a best fit line to better estimate his hr/pa? its fun to say, look davis is on pace for 62 hrs, but really, he isnt

artificially inflatedHow is hitting actual HRs considered artificial?

^^this. Can’t take away the ones already banked.

Unless you also take away, say 20 or 25 AB HR-less strings, since they’re “artificially low”.

(But I wouldn’t recommend doing either.)

I don’t follow your logic. His HR/PA is what it is. Are you just saying you expect that figure to regress? That’s accounted for in the article.

ya i did a crappy job explaining what i meant…yes his total hr/pa is what it is. all im saying is that he is coming off a 4 game hot streak, so his hr/pa is at a high right now. maybe the question should be; how many hrs is he trending to?

That’s perfectly fair and I feel like the article takes a good stab at estimating what his regression will be in HR/PA. The writer has it dropping more than 33%.

Wait what hot streak? He broke out of his season long slump the last three games before the All Star break. As long as he doesn’t fall into another slump, he’s going to hit 72 more giving him a total of 111, if my math is right.

As a statistician, I appreciate the level of analysis that went into this post. Good work.

As somebody who wishes he took more statistics classes in college and is now trying to learn as he goes along, I thank you!

This is awesome, my favorite FG article in a long time.

Wow, thank you! Ironically, it was the article that took the least effort of anything I’ve done here.

you clearly haven’t spent enough time on this website.

I read virtually everything that is posted on this website. What do you suggest, reading slower?

Hit Tracker provides some evidence respecting projected HR rates for the rest of the season. It shows 8 no-doubters and 9 just-enoughs for 2013. That, combined with Davis’ historical record, suggests some regression, as this article and ZIPS suggest.

I do think that the regression level suggested is awfully high, subjectively. Davis has a different approach at the plate in 2013, which manifests itself in several ways. His GB rate is down, his pop-up rate is down, his line drive rate is steady, his doubles rate is way up and his opposite-field home run rate is way up. Personally, I would use a figure something in the .07 to .075 range for his projected HR rate.

This tool needs more levels.

We need probabilities for PA’s, Fly ball%, and HR/FB %

Davis opinion is pretty silly, MLB still recognizes 73 as the record, whether it’s got an asterisk or not.

Just because individuals have an opinion that 61 is still a true or untainted record, doesn’t mean it’s still a record. Until MLB takes the 73 out of the record books, 73 is the number to beat. 62 could still be celebrated as an amazing feat, and absolutely should be, but to celebrate it as a record would be false.

Well, it would be a record. Just an AL record, not an MLB record.

Actually, celebrating 73 homers as a record is false. Up here in Heaven we don’t count anything that Sosa, McGwire, Bonds, Lance Armstrong, or Marion Jones ever did — before or after they started drugcheating. That’s how Heaven rolls. My single-season record, Bi-yatches!

Did your own asterisk not follow you to heaven Roger?

I’m too drunk and on too many uppers to see something as small as an asterisk.

stop hogging all the good stuff, roger.

“If he were a coin, his true rate would probably be very close to zero, since coins are very bad at hitting home runs.”

What the factual basis for this assertion? Do you have any data or first-hand observations of coins attempting to bat?

Go here:

http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2013&month=0&season1=1871&ind=0&team=0&rost=0&players=0&coin=y

Only seven coins show up in the list of those with over 50 career HR, with the all-time leader being a 1972 Kennedy half-dollar from the Denver mint, which collected 86 career HR from 1983-1992. Probably could have gone on to reach 100 but it was pulled out from behind a kid’s ear by his grandpa and has been in a piggy bank for the last 20+ years.

Sure it doesn’t sound like that that many, but when it started hitting homers the Kennedy half was only eleven. How many major league homers did YOU hit at eleven? Exactly.

You guys are absolutely right; that was prejudiced of me. I just made assumptions based on their small stature and non-existent arms. More research needs to be done on this topic. I’ll apply for a grant immediately.

67 hrs. Bank it, Dano.

Great Article Steve! Thanks!

Can someone explain to me why it is incorrect to just take the weighted averages of the # of trials and rate and just plug it into the Binomial Distribution? Isn’t this probability just as legit? I know the outputs are different, but why is one more correct than the other?

“If you’re wondering — yes, entering the whole distribution can make a big difference over simply using the weighted averages — using only the weighted averages of my assumptions would produce just a 3.43% chance of Davis hitting at least 61.”

Just to clarify, why do we prefer entering several distributions with all the possible pairs of parameters rather than just entering the weighted averages of the parameters into one distribution?

Because f(g(x)) =/= g(f(x)). Where f() = expected value function (average) and g() = multiple of HR rate * PA.

ex.

f = 2x

g = x^2

f(g(x))= x2^2

g(f(x))= (2x)^2 = 4x^2

Hi Dusty Gator, thanks for your response. My question was not clear enough.

I understand that those two are not equal. (As I stated above when I said that I know the outputs are different).

What I’m asking is this: Why not use the weighted average of PA’s and home run rate as the two parameters in your binomial distribution?

Above, Steve had the weighted averages as PA’s = n = 253 and rate = p = .0637.

Why not use f(x|n,p). Where f is the binomial dist. with parameters n, p, as a function of x successes (homeruns)?

Steve did this (I think):

w_11*f(x|n_1,p_1) + w_12*f(x|n_1,p_2) + etc…

Both clearly give different outputs. Why is the second method preferred to the first?

Yeah, that’s an interesting one. Well, the only way I know how to explain that is by showing an example.

I’ll hold the true rate constant at 0.05, and will look for the chance of 20 successes over a weighted mean of 250 trials:

Trial 1: 100% probability of 250 trials –> 2.71% chance of 20+ successes

Trial 2: 25% chance of 300 trials, 50% chance of 250 trials, 25% chance of 200 trials –> 4.40% chance of 20+ successes.

What’s happening in Trial 2 is this:

300 trials = 11.90% chance of 20+ successes

250 trials = 2.71% chance of 20+ successes

200 trials = 0.27% chance of 20+ successes

Now, weight those by the chances of getting that many trials: 0.25*11.9% + 0.5*2.71% + 0.25*0.27% = 4.40%

So, it’s the asymmetry of 11.9 vs. 2.71 vs. 0.27 that’s doing that.

The reverse is true if you’re talking about a high-percentage thing — e.g. same circumstance, but you’re looking for the chance of 8+ successes: Trial 1 would give 93.5%, but Trial 2 gives 91.02%. Now the uncertainty works against you, as the downside of getting only 200 trials is considerably worse than the upside of getting 300 is helpful.

Hi Steve, thanks for the example.

So are you weighting the different binomial distributions rather using the average of the parameters so that the extreme parameters will have more say in the final probability?

To me, the 4.40% chance in trial 2 is no more correct than the 2.71% chance in trial 1.

Hi Steve,

Sheesh, I think I finally figured out why you picked method 2. Thanks for your response (and yours too gator). Again great article. Sorry that I kept asking about this.

No problem — it’s not that intuitive of a concept. I was really scratching my head when I first encountered it.

What happens to the probability if you just use league average HR/PA?

Why on earth would you do that?

If you use 0.028 HR/PA, around the AL average, it says the chance is about 0.000065%. So, not great.

Juan Pierre and Ben Revere, draggin’ down the man. AGAIN!

I disagree with one part of this analysis. Strikeout and walk rates are generally one of the first values to stabilize for hitters — and Davis making MUCH better contact this year than last year. It might be interesting to looking at Davis’ HR/FB instead of his HR/PA, which includes his insanely high strike out rates from earlier seasons.

Well, he’s batting the ball in 60.3% of his PAs this year, compared to 62.1% last year. He is, however, hitting fly balls a lot more often — 26.5% of his PAs, vs. 23.3% last year, so good catch there. His HR/FB is way up as well. Will he keep that up, though? Who knows.

The most absurd part of this article was:

“Chris Davis has been regularly hitting in the fifth spot in the Orioles’ lineup. If that continues, he’ll have fewer plate appearance to work with than if he’d hit higher in the lineup.

He may become the first player ever to hit over .300 and 50 HRs from the 5 hole.

I’d love to see a complete list of players with a minimum of, say, 35 home runs at the All-Star Break from 1962 (i.e., the year after Roger Maris’ 61 home run season) to the present, along with what each player’s final tally wound up being. Also! I’d love to see each of these player’s career WAR. There seem to be a fair number of rather unlikely “Will he catch Roger Maris?” candidates over the years. I’d do it myself but I’m not sure how to access this information.

Barry Bonds / 2001 / 39 / 73 / 164 fWAR

Mark McGwire / 1998 / 37 / 70 / 66 fWAR

Chris Davis / 2013 / 37 / ?? / 6 fWAR

Reggie Jackson / 1969 / 37 / 47 / 73 fWAR

Ken Griffey Jr / 1998 / 35 / 56 / 77 fWAR

Luis Gonzalez / 2001 / 35 / 57 / 55 fWAR

Frank howard / 1969 / 34 / 48 / 39 fWAR

It may be more instructive to see how these players compared after, say, 90 games and then at year-end, since the All-Star break bounces around the calendar a bit (sometimes after 80 games, sometimes after 90, so not all pre-break totals are created equal).

If we’re saying that Bonds doesn’t count because of the roids, then Maris doesn’t count because of the 8 extra games. The real record is Babe with 63, since we want fair comparisons and whatnot.

Is that a typo or are you assuming the Babe would have hit 3 more home runs had he played 8 additional games? Either way, Ruth played prior to the breaking of the color barrier and thus his record can’t be taken seriously. The true home run champion lies somewhere between 1947 and 1960 (American League) and 1961 (National League). My god I just put in a lot of effort to make a terrible joke. As you were…

Those extra 3 HR are based on his HR/PA extrapolated to a 162 game season. If you are really going to bring era into the discussion, Bonds faced juicing pitchers and had far less at bats with actual pitches to hit as pitchers pitched around him. He had a 26.1 % walk rate versus Maris’ 13.5 % walk rate causing a disparity of 476 ABs to 590 ABs. There really isn’t a way to claim the record is 61, it’s either 63 or 73, which means that Davis is looking at somewhere between a 3 and 5 % chance or less than a .1 % chance.

Don’t worry, Ben Revere will shatter this HR record any day now.

This is very nice work, Steve. If Davis stays on pace to challenge Maris, I’d love to see you revisit this around the 3/4 point of the season.

I’m perfectly happy in the case of Chris Davis to suspend my disbelief and impatiently wait for the unlikely and improbable.

The curse of the HR Derby ended any chance of that, he will not hit 50 HR this year let alone more than 61 .

Will he chase Maris or Reggie? Maris had The Mick driving him ’til he got hurt; Reggie hit 10 after the AS break. As a Jays’ fan, I say ‘a pox on you, Davis!’ As a baseball fan, I want him to hit 62, 67, or 74.

An extreme example to show why the second method might be a better choice than the method of using just the weighted averages as the parameters:

Lets say the number of trials for flipping a coin is determined first by flipping a coin. You’re going to do 50 trials if you get a heads or 50 if you get a tails. Therefore, you have a 50% chance of it being 150 and a 50% chance of it being 50. The weighted average for the number of trials would then be 100 (.5*150 + .5*50).

If you wanted to know the probability of getting 110 heads, if you used the weighted average the probability would be zero. Prob of 110 heads in 100 flips = 0.

If you used the method that Steve used, it would be .5*Prob of 110 heads in 150 flips + .5*Prob of 110 heads in 50 flips = .5(non-zero value) + .5(0)

^^ meant to say you either do 150 trials or 50 trials, sorry for the confusion

Great example, thanks. Trying out your example made me realize my calculator goes haywire when you enter something impossible, like 110 successes in 50 trials, or a success rate over 1 or under 0… I’ll try to fix that.

Thanks Braves Fan. That’s exactly what I was asking about.

” If he were a coin, his true rate would probably be very close to zero, since coins are very bad at hitting home runs.”

<3