Off to the Races

I find these early season posts harder to write than the off-season posts. Part of it is the desire to write about the games occurring, while at the same time trying to provide something worth reading. There are only so many ways to write about the improbability of Yuniesky Betancourt’s at-bat against Justin Verlander yesterday*. With that in mind, I’m going to shamelessly steal a topic from Dave Cameron’s 2009 early season posts: Wins in the bank.

Dave’s original post was followed by Sky Kalkman expanding on the topic by applying it to the CHONE standings through that point. The concept is explained by both, but I’ll rephrase it here for originality’s sake. Say the Orioles start off 6-4. CHONE projected the Orioles to win 75 games, or 46% of their games. That 60% win rate seems to be overachieving, but don’t trip into the Gambler’s Fallacy line of thinking that the Orioles will go 3-7 at some point to ‘even things out’.

No, instead you give the Orioles credit for those earned wins while respecting the projections heading forward. 46% of 152 equates to 70 wins. Add those six they already racked up, and the Orioles solid start improved their expected record by a whole game. Of course, this can be applied at just about any time throughout the season. Say the Rays go 35-20 and we still have reason to expect them to be a 90 win team, then they would improve their expected record by four games. In that division, in that race, that’s a huge swing.

It’s early and easy to get swept away in some paranoia and hyperbole. But yeah, the results matter, and they can make a difference.

*I posed the question: How improbable was that?

Betancourt has 2,473 career plate appearances. He has 32 home runs, which means 1.3% of his total plate appearances have ended in jogs … okay, that’s not true, let’s say trots. Per Baseball-Reference, Betancourt has 141 plate appearances that went to a full count. That’s, oh, 5.7%.

On to Justin Verlander. He’s faced 3,580 batters throughout his career and has allowed 81 homers; or a shade over 2%. He’s went to a full count 500 times, or 14%.

Multiply that out and the probability that all of it happens during one plate appearance is roughly: .0002%.



Print This Post





Sort by:   newest | oldest | most voted
Rusty
Guest
6 years 4 months ago

That math is atrocious.

Betancourt going to a full count and Verlander going to a full count are NOT independent events when they’re facing each other, so you can’t just multiply the probabilities!

Even going to a full count and hitting a home run are not completely independent.

A reasonable estimate of the probability of Betancourt hitting a full count home run in that particular at-bat would be ~10% chance of count going full, times ~1% chance of a homer ~= .1%. About 500 times more likely than your calculation.

jscape2000
Guest
6 years 4 months ago

Verlander has allowed 11 full count homers in his 3578 PA career (.003).
Betancourt has 2 full count homers in his 2478 PA career (.0008).
Can I just multiply those numbers together?

Rusty
Guest
6 years 4 months ago

No.

But if you assume that .003 and .0008 are their true probabilities, and you know the league rate of full count homers, you can come up with an estimate.

leagueRate * (.003/leagueRate) * (.0008/leagueRate)

which reduces to

(.003 * .0008) / leagueRate

DavidCEisen
Guest
DavidCEisen
6 years 4 months ago

@Rusty: I don’t think thats correct either. You can’t multiply the probability of hitting a home run by the probability of giving up a home run.

To simplify: Say a player has a 10% chance of hitting a HR and a pitcher has a 10% of giving up a HR. There isn’t a 1% of a HR occurring, because these aren’t independent events.

I’ll put forward my own guess as to the best way to figuring this out. First we need rate Verlander gets into a full count compared to the league average. Then since Betancourt gets into a full count 5.7% of the time (assuming this means against the league average pitcher), we can find a way to normalize Betancourt’s rate against Verlanders–don’t feel like thinking this through so I’m not going to write a formula. This would give us the probability of getting into a full count in this situation.

After this we would need to figure out the probability of hitting a home run in a full count situation. This is more difficult. It might be that we could find the rate Verlander gives up home runs in full count situations compared to the league, but this might be a small sample size. So maybe we should just consider home run rates (like R.J. did). Either way at the end we would have two probabilities: one for a full count the other for a home run.

These may or may not be independent–are home runs more or less likely to occur in full count situations or does it not matter at all? If they are independent, multiply away. If not…

Rusty
Guest
6 years 4 months ago

@DavidCEisen

Yes, it is correct if you are dividing by the league rate.

To use your own example, say a player has a 10% chance of hitting a HR and a pitcher has a 10% of giving up a HR. If the league rate is also 10%, the probability is ( 10% * 10% ) / 10%, = 10%.

If the league rate is lower, say 8%, then the batter is better than average and the pitcher is worse than average, so the probability is higher: (10% * 10%) / 8% = 12.5%.

If the league rate is higher, say 12%, then the batter is worse than average and the pitcher is better than average, so the probability is lower: (10% * 10%) / 12% = 8.3%

So if you wanted to assume that a batter’s chance of hitting a home run on a given pitch is unaffected by the count, so that reaching a full count and hitting a home run on the full count pitch are independent, you can come up with a formula:

VF = Verlander full count rate
YF = Yuni full count rate
LF = League full count rate
VH = Verlander home run rate
YH = Yuni home run rate
LH = League home run rate

Chance of a full count home run =

((VF * YF) / LF) * ((VH * YH) / LH)

vivaelpujols
Guest
6 years 4 months ago

The odds of Verlander going to a full count and the odds of Betancourt going to a full count are inclusive, so instead of multiplying them together, you should do a weighted average. So the odds of a Verlander/Betancourt at bat going to a full count is 10.5%. You then figure out the odds of a Verlander/Betancourt at bat ending in a home run, which is 1.9%. Multiply the two together to get .2%. That assumes that the odds of a 3-2 count ending in a home run are the same as in all counts, but whatever.

When I use Rusty’s method,

((.14 * .06) / .13) * ((.02 * .01) / .03)

I get .04%

Jon
Guest
Jon
6 years 4 months ago

..and the math jocks come out of the woodwork to flex their skills. haha

kamikaze80
Member
kamikaze80
6 years 4 months ago

more or less agree. the math in the article is completely wrong.

to illustrate it by example, lets say the rays have a 60% chance of winning a given game and the white sox have a 60% chance of losing a given game. you would NOT multiply 60% by 40% to get a 24% chance of the rays winning a game against the chisox.

MikeS
Guest
MikeS
6 years 4 months ago

Hey! I don’t expect a world championship out of the White Sox bot they better win more than 40% of their games this year.

Attractive Nuisance
Guest
Attractive Nuisance
6 years 4 months ago

“Multiply that out and the probability that all of it happens during one plate appearance is roughly: .0002%.”

I’m no statistics major, and I know this is not the point of your post, but I do not believe this is not the correct process for determining the probability of a Yu-Betcha homerun on a 3-2 count off Verlander. That would be the correct process if all those happenings were independent, but they are not.

The logical fallacy you have employed, would be the same as me saying the following: The odds of the Mariners winning today is roughly 50%. The odds of the A’s losing today is roughly 50%. The odds of Snell getting a win today is 30%. So the odds of the Mariners winning, the Athletics losing, and Snell getting the win are .075.

The variables you mentioned do not correlate as highly as the variables I mentioned, but they still correlate highly–especially the odds of Yu-Bet and Verlander both working the count full in a given at bat between them.

You would be better off determining if there is any correlation between full-counts and Yu-Bet homeruns. If Yu-Bet hits homeruns at the same clip on full counts that he does in general, multiply the odds that Yu-Bet hits a homerun off pitchers with Verlander’s characteristics (you will need to define a pitcher with his characteristics, possibly by FIP?) and then multiply that by the odds of Yu-Bet working the count full against a pitcher with Verlander’s characteristics.

Waiting
Guest
Waiting
6 years 4 months ago

Look. He didn’t take Stat 100 in college. That’s all.

kamikaze80
Member
kamikaze80
6 years 4 months ago

then he shouldnt write articles revolving around SAT level statistics.

lincolndude
Guest
6 years 4 months ago

You could do the same thing with stats for individual players. For example, Garrett Jones hit 2 homers yesterday, so he’s got two in the bank now. Baseball’s a weird thing.

don
Guest
don
6 years 4 months ago

Placido Polanco finished his first game back with the Phillies 3-5 with 6 RBIs and a WPA which was both (1) negative and (2) lower than every other batter on the team, including Halladay.

The odds against that particular series of events seem pretty long, too.

Thomas J.
Guest
Thomas J.
6 years 4 months ago

Hahaha, you people are all psychos. Statistical math is just a construct anyway.

intricatenick
Guest
intricatenick
6 years 4 months ago

Both perception and reality are constructs. Actually, saying something it is a construct doesn’t have any independent, unique meaning since everything is. Math involves buying into assumptions, just like everything. If you don’t buy assumptions you could say everything is a construct, but you bought into the assumptions enough to actually use language, so be quiet.

Steve K.
Guest
Steve K.
6 years 4 months ago

12.6% full count and 2.7% home run rate in the AL since 2006, so about a 6.1% chance of Verlander running a full count on Betancourt and a 0.8% chance of Yuni homering once there given that home runs are about 25% less frequent with a full count, so somewhere around a .05% or 2000 to 1 chance of this happening.

ryan b
Guest
ryan b
6 years 4 months ago

I disagree with the “Wins in the Bank” as presented here and in Dave’s post from last year.

A big problem with this is strength of schedule. Detroit starts out the year with a series with KC. Most of us I think would not be suprised if Detroit sweeps this early three game series. Now, for simplicity, lets say we had them pegged as a .667 win percentage team at seasons end. According to Wins in the Bank, we should add a win to their season end total, as we would only have expected them to win 2 out of 3 from the Royals. This makes no sense….we don’t expect them to have a .667 win percentage at the onset of every new series. We would expect them to probably win all three at the royals, while not sweeping against the Twins, White Sox and other teams.

Wins in the Bank completely ignores the opponents faced so far.

B N
Guest
B N
6 years 4 months ago

I was thinking the exact same thing. I’m also not quite so sure what the point of these kind of posts are anyways. To my knowledge, most of us are already aware of various approaches to estimate projected team wins that are a heck of a lot better than “assume an equal probability of winning each game, and then the expected wins will be = winsInBank + gamesLeft*winProbability”. I’m willing to assume that CHONE wouldn’t use that math, for instance.

The premise is sound, trying to evaluate how much a win today is likely to mean in the total standings. But unless one can adjust for quality of competition then it’s not worth all that much. There’s also a second monkey wrench: injuries. Not all injuries are drawn from some memoryless distribution. There are plenty of walking wounded or fragile veterans where the question is not if, but when, they get injured. If the Braves win a lot of games to start the season with Chipper Jones playing, you CAN actually assume a regression at some point. Chipper will get injured, he will miss games, and is replacement will be mediocre. In your aggregate estimates, this is taken into account. When you start disaggregating and then assuming that the same “true talent” will exist, but forgetting that you’re a few games closer to a DL trip- you’re off base.

While some injuries might be random, like blowing out an elbow, others are the result of a progressive wear and recovery cycle. To me, this states that a hot start is probably worth more for a team with memoryless injury rates rather than ones that become more likely over time.

matt w
Guest
matt w
6 years 4 months ago

“Wins in the Bank completely ignores the opponents faced so far.”

This seems mysteriously universal in sabermetric stats — and this isn’t a slam at RJ or Dave, it really is absolutely universal. The stats don’t take into account quality of opposition, so a pitcher gets the same credit for a K whether he strikes out Brian Bixler or Albert Pujols. Why don’t we use metrics that take into account the difference between a player’s performance and the average performance you’d expect from someone facing those opponents?

I disagree with your actual numbers though — even if we’d expect a true-talent .667 team facing a true-talent .333 team to win 3 out of 3 more often than not (and I’m not sure if we would), we still wouldn’t expect them to win 3 out of 3 all the time. Their average expected wins would be somewhere between 2 and 3. So if they sweep the series, they should have a fraction of a win in the bank.

Jason B
Guest
Jason B
6 years 4 months ago

You don’t have to remember too far back for a real life example of this. Many people pointed this out last year when the Jays got off to a hot start; they had done so by facing relatively weaker opponents and hadn’t played the Yankees, BoSox, or Rays little if any during the first third of the season or so. Once they started playing many more interdivision opponents, they predictably faded. It was the “Yes, but…” that many commenters and pundits alike recognized in their quick start.

Padman Jones
Member
6 years 4 months ago

I like the idea that a pitcher would get more credit for performing well against better hitters, but I think the assumption that we just sort of have to roll with is that such things even out over a full season or career. If you throw 1000 innings, you’re bound to face a fairly representative sample of talent, and the fact that you may have performed worse against the upper ends of the hitters’ bell curve (assuming talent is normally distributed) should be balanced out by facing the easy competition as well.

B N
Guest
B N
6 years 4 months ago

I’m hard pressed to say they do even out, Padman. Let’s look at the course of a career. If you’re pitching in the NL, you are going to generally have better stats than you would in the AL. That difference wouldn’t go away over the course of a career, because 1/9 of your hitters are going to to have an OBP which is close to half of their AL counterparts (comparing pitchers vs DH’s).

Within a season, it’s even worse for looking at team outcomes. Even if over the course of a season, your opponents were normally distributed in their talent levels, within the season they won’t be. You could easily end up with 2 months to end the season with either easy opponents or hard opponents. Moreover, talent isn’t even static. You’re probably best off playing your “easy” opponents at the end of the season because they’re going to rest their hurting players and they may have traded away some of their better players at the trade deadline. Since projections generally don’t account for player movement, the estimates get even further removed from the reality.

John R. Mayne
Guest
John R. Mayne
6 years 4 months ago

Wins in the Bank also ignores the level confidence we had a priori (before the season started) and the effect on winning on expectations. Winning or losing absolutely should affect expectations, even in very small sample sizes, unless we had massive confidence in our pre-season predictions. Given the SD for even very good predictors, we don’t have that kind of confidence.

I recall reading studies about teams with very hot 10-game starts, and I believe they routinely outperform pre-season expectations over the rest of the season. Bayes’ Theorem once again pops up; there is *some* value even in small sample sizes. Saying that a hot – or cold – start should mean *nothing* to the evaluation is simply mistaken. Obviously, some media gets overexcited, but the response to that should not be to ignore the reality that it matters.

–JRM

MichaelCoughlin
Member
MichaelCoughlin
6 years 4 months ago

Is it too much to just say, “That’s like a one in a bajillion chance of happening” and leave it at that. I know that this is the kind of place that you want to really know the true probabilities and all, but sometimes… :)

oPlaiD
Member
oPlaiD
6 years 4 months ago

Couldn’t we expect the Orioles to be 6-4 during a certain stretch of games where they play mostly at home against low-level opponents, even if they are still a 75 win true talent team?

B N
Guest
B N
6 years 4 months ago

No, because the Orioles don’t play well at home either. ;)

Patrick
Guest
Patrick
6 years 4 months ago

Oh my gosh, thank you Rusty for pointing that out. I was just about sobbing to myself over here.

geo
Guest
geo
6 years 4 months ago

Maybe I’m wrong and the point of this was really to examine the statistical probabilities of a particular hitter/pitcher matchup, but I read it was examining the believability of a crummy player homering off a good pitcher. I was more curious about the list of pitchers Yuni has hit one or more of his 32 career homers against. Cherry-picking some names, that list includes: John Lackey, Francisco Liriano, Daisuke Matsuzaka, Johan Santana, Mark Buehrle (twice), Josh Beckett, Gavin Floyd, Jon Lester, John Danks (twice – what’s with those Chisox pitchers anyway?), BJ Ryan, Derek Lowe, Kenny Rogers (presumably in his dotage), Joe Blanton (twice), David Wells, JP Howell…and assorted other pitchers of various ages, sizes, and ilks. But some pretty good names on that list. So what does that mean? Heck if I know.

Matt C
Guest
Matt C
6 years 4 months ago

What I think is even crazier is I’m pretty sure he was in an 0-2 hole to begin with, then fought back to a full count, then fouled off a bunch of pitches before that. So I wonder how many HRs Verlander gives up after going ahead 0-2 and how many Betancourt hits after falling behind 0-2? I bet those odds are pretty low too.

wpDiscuz