Shin-Soo Choo’s Batting Average on Balls in Play
Shin-Soo Choo is a BABIP wizard. Sort for batting average on balls in play since 2008, and sitting there in second place is Choo. After four years of BABIPs in the high .300s, he dropped back to earth in 2011 but still finds himself above the league norm these days. Even with over 2000 plate appearances under his belt, it’s fair to ask: what does Choo’s true-talent BABIP look like?

Just ahead of him on that list is Austin Jackson, who might fit the high-BABIP ideal a little better — after all, he has the wheels to make infield hits out of grounders to the hole. Choo has some speed, but by Bill James‘ speed score, he’s been slower than league average two of the last three years. It’s not speed that’s powering this train.
Instead, it’s line drives. His career line drive rate (21.9%) is well above the league average of around 19%, and he’s above 22% most years. But it’s not all line drive rate — if you sort for line drives, Choo is 39th since 2008 in line drives.
It’s also about Choo’s batted ball profile. He’s hit 1.32 ground balls for every fly ball so far, and we know that the batting average on ground balls is better than those on fly balls. His 1.25 GB/FB ratio is actually 112th highest since 2008.
If you take his GB/FB ratio, and combine it with his line drive ratio, you start narrowing down his list of comps, though. We’ll cut the list off at 95% of Choo’s contributions in both categories (1.188 GB/FB and 20.7% line drive rate), and there’s a list of 34 qualified batters since 2008 that have done something similar in those two categories. These 34 players have a BABIP of .3217. Choo’s career BABIP is .352. There’s still something missing.

Let’s look at Choo’s work year-by-year. Using slash12′s xBABIP calculator, we can see an interesting trend emerge from Choo’s batted ball mix.
| GB/FB | LD% | GB% | FB% | IFFB% | HR/FB | IFH% | BUH% | xBABIP | BABIP | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2008 | 1.14 | 0.228 | 0.411 | 0.361 | 0.08 | 0.161 | 0.04 | 0.5 | 0.3300 | 0.367 |
| 2009 | 1.17 | 0.216 | 0.423 | 0.361 | 0.038 | 0.127 | 0.043 | 0.75 | 0.3333 | 0.370 |
| 2010 | 1.29 | 0.197 | 0.452 | 0.35 | 0.026 | 0.146 | 0.041 | 1 | 0.3297 | 0.347 |
| 2011 | 1.39 | 0.224 | 0.451 | 0.325 | 0.026 | 0.104 | 0.065 | 0.3465 | 0.317 | |
| 2012 | 1.44 | 0.257 | 0.438 | 0.305 | 0.063 | 1 | 0.3454 | 0.333 | ||
| Career | 1.32 | 0.219 | 0.445 | 0.337 | 0.036 | 0.129 | 0.045 | 0.714 | 0.3371 | 0.352 |
There’s something about Choo’s mix of power and speed, paired with his batted ball angle, which makes him well-suited for good BABIPs. This makes sense intuitively, after all — if you’re a slugger muscling up for fly balls, you’re more likely to hit cans of corn than if you have a level swing designed for line drives. This is how you get a career xBABIP over .335, anyway.
But look closer at the yearly xBABIP/BABIP split, and you’ll see something strange. When Choo was BABIP’ing close to .400, he ‘deserved’ a BABIP closer to his career xBABIP level. He hit more fly balls those days, and fewer line drives. He still ‘should’ have had an excellent BABIP, but maybe not one near .350. Now that he’s hitting the ball on the ground more, and reaching new heights in line drive ratio, though, he ‘deserves’ a BABIP near .350 and is receiving his career xBABIP. It has a nice symmetry to it. The feeling that he was a little lucky before is balanced by the surprise that he might be a little unlucky now.
Still, we wonder what this means for his career going forward. Our own Bill Petti found that BABIP has one of the worst year-to-year correlations of any stat — and that line drive rate had the worst. In his ‘good’ BABIP years, Choo has had a wRC+ around 140. In the ‘bad’ BABIP years, he’s had a wRC+ of 108 and 128 respectively.
Is he really 40% better than league average, or more like 10-15% better? His career xBABIP tells us that his current BABIP is the more likely one, even if his xBABIP this year is much nicer. On the other hand, with his power down below career levels this year, it’s a little unfair to say that he’s a BABIP-dependant player. If he got his power back to career levels this year, he’d be able to show his career wRC+ (131). Well, he just hit a home run today, and with it his up-to-the-minute wRC+ is 135, so maybe all this talk of his BABIP is best left for the fantasy blog.
But you still get the feeling that Shin-Soo Choo‘s BABIP will continue to be interesting. I asked Cleveland consultant Keith Woolner about Choo’s BABIP over lunch one day. He smiled and said nothing. Maybe he knows Choo’s true xBABIP?
I drafted him in the 5th round (keeper league) this year and was pretty frustrated in him until about 1-2 weeks ago. He has really taken off recently and his average sits at .270 as I type. He was still getting BB and a few SB when not hitting, so he is a guy I would look to buy low. He already homered off of Verlander today with a BB thrown in.
Pfft. Choo hits too many popups.
What if this was actually Joey Votto posting? How crazy would that be? Talking smack on fangraphs. Ha.
I invented that joke
This other poster is an imposter.
But he’s right about the pop-ups. I laugh at Choo’s 19 career IFFBs! Hah!
I drafted him low two consecutive years (’09 and ’10) but passed him up the last two years. I don’t think he’s been quite the same since the DUI.
I agree that the DUI effected his performance last year as he admitted to pressing in order to prove his value. Though I think that is all behind him at this point.
great work.
You have a distribution of observations. You have a mean. You have a standard deviation. If the distribution is normal(ish) the empirical rule tells you what percentage of observations should be within x standard deviations of the mean. Even if the distribution is far from normal Tchebycheff’s theorem tells you the minimum percentage of observations within x standard deviations of the mean.
So how many standard deviations is Choo above the mean? And given the population of MLB season stats is BABIP a trait (high year-to-year correlation) or a state (high year to year variation; low year-to-year correlation)?
I don’t have the SD numbers on Babip, but I made clear that he was an outlier in the piece. I also linked to research that shows that BABIP has a low year-to-year correlation in the piece.
Tchebycheff’s theorem? whoa!
First name: Pafnuty
The problem is that we know BABIP does not have a stationary expectation across batters – hitters with different profiles (GB vs. FB prone, fast vs. slow) will obviously have different “true talent” BABIPs. Comparing to the mean doesn’t necessarily tell you whether a hitter with a high BABIP has been lucky or is just a high BABIP hitter, because the mean BABIP of all players is not a particularly good predictor of a given player’s expected BABIP.
Your last comment about Keith Woolner makes me salivate. I have always wondered what sorts of info the teams are hoarding and I don’t mean that in a pejorative sense as I would do the same.
not as much as you think. they have hit fx and field fx and that probably covers most of it.
the gap in information between the mlb offices and the public has closed and closed quickly.
Hit/FX alone makes the gap huge. It is the difference between this being a speculative piece about Choo the hitter and an actual definition of Choo the hitter.
“I asked Cleveland consultant Keith Woolner about Choo’s BABIP over lunch one day. He smiled and said nothing.”
What a dick.
F’N LASER SHOW
“We be drivin’”
http://youtu.be/EkZ8jc8_hOo
Hey Eno, maybe you can help me out. Have you looked into the discrepancy in BABIP between hard and soft contact? It’s startling. The criteria is kinda subjective but I am more and more convinced that BABIP isn’t being used properly if the contact intensity dimension is ignored. BABIP needs to be refined with some kind of contact F/X. I hate the cliched luck/unluck conclusion. Seems much of it IS in the batter’s control. See below (from mastersball):
ALL HARD WEAK
2011 Line Drive 0.714 0.730 0.683
2011 Ground Ball 0.238 0.576 0.185
2011 Fly Ball 0.139 0.404 0.045
2010 Line Drive 0.708 0.723 0.691
2010 Ground Ball 0.239 0.552 0.187
2010 Fly Ball 0.158 0.426 0.062
The calculator takes ‘hard contact’ into account by using LD%, but the problem is that LD% is not really correlated from year to year. So it’s hard to give a batter credit for lots of line drives if they don’t correlate well from year to year. Obviously, in this case, it looks like he’s a line drive hitter. So it’s not surprising that he’s got a .330+ xBABIP for his career. But then there are times when he’s had a good LD% and a bad BABIP like last year. So luck is a part of it.
A big issue is that the way marginal batted balls get classified depends on whether it was an out or not. If the outfielder catches it, it gets ruled a fly ball, but he doesn’t, it’s a line drive. It causes problems with xBABIP because LD% becomes a partial proxy for BABIP. If you get luckier on marginal FB/LD’s you post a higher LD% and a higher BABIP, even though nothing is really different about the batter.
MangoLiger…. heh
Eno, I’m assuming you didn’t look at the info I posted? It separates out hard and weak for FB , GB and LDs on their own. Not all LDs are hard hit and not all grounds are soft. Urge you to look at that data. Hard hit GBs go for hits 60% of the time vs 20% for weak etc. I can’t take BABIP seriously without contact analysis. Can you?
Stefan, that is some revealing data, possibly a very nice bridge to once hitf/x becomes available. Do you have a link to the actual stats?
I always enjoy when year to year LD% comes up. I feel like that isn’t well known enough. People love to point to LD rate as an indicator of something when it is really almost useless. Nice job staying away from it here.
And the funny thing is that Choo was quite the sabermetric darling (as best as I can recall) when I suspect a good deal of his value was coming from seems like might have been an unsustainably high BABIP. In 2009-2010, he put up a total 71 runs in value from his bat alone on respective BABIPs of .370 and .347. What’s a good guess for his batting runs if BABIPs were down around .300? Maybe 45 batting runs?
Actually, looking at Josh Willingham’s 2007 season, which is pretty similar except for a .308 BABIP, he only was worth 17 batting runs… so it could be Choo’s BABIP “skill” is worth 1-2 wins per season.
Is there a chance that Choo’s BABIP doesn’t come down to the typical numbers and averages? A chance that he is in control of the ball’s destiny?
I’m not an expert on swing mechanics, and maybe it is something I was able to more easily pick up on while spending a couple years watching Choo’s countrymen play ball, but his swing looks different. Even when Choo is “powering up” he keeps two hands on the bat right through his swing.
Take a look at this video during Spring Training. Two hands on the bat right through. Almost like a golfers swing actually.
Now, take a look at this video of Ortiz. Even when he was missing, his top hand seemed to be just going for a ride more then contributing to the swing.
So possibly we are looking at the wrong things when evaluating BABIP. Possibly we need to look at bat control.
I agree with you about Choo’s swing. I’ve always thought of it as different, and a month or so ago, I saw another guy swing (don’t remember who) and thought “Wow, he is the first guy I’ve ever seen whose swing reminds me of Choo’s.”
Love him hitting leadoff + hope he’s in Cleveland for years to come. F Boras.
I think it’s something that’s seen more often in Asian players. Fukudome comes to mind, Ichiro too somewhat, but I remember seeing a very similar swing from Kosuke during his stint in Cleveland. It may be the two hands going through all the way, but to me he always looks like he’s rotating his whole body a lot more than other players too.
This is where scouts earn their paychecks, yes?
The line drive correlation or lack thereof still really bothers me. Both grounders and flies correlate year-over-year at a steady clip, but line-drives for reasons that i’d imagine would be largely dependent on classification express wild volatility.
With that said, I’ve always had a sneaking suspicion that hit dispersion, location specifically, contributed to BABIP. Reasons Provided: None.
I think part of the discrepancy here is that you need to break down flys into fliners and vanilla flys. A guy who hits lots of LD probably makes more fliners than a guy who is looking to pop balls over the fence.
None of this is really complete until we start looking at the competition these guys face. Its entirely possible that LD% (Talent) is completely consistent, and we’re just seeing a fluctuation in the quality of pitchers faced.