FanGraphs Fantasy Baseball


RSS feed for comments on this post.

  1. Love this piece. Great time to try to get Kid Weeks low. In some leagues, get him for free off the wire.

    Comment by pianow — June 19, 2012 @ 1:26 pm

  2. Said this before, will say again, BABIP is unhelpful in this context unless you break down contact intensity for each type of batted ball. In 2011 hard hit GBs went for hits 58% of the time while weak hit ones only 19%. Hard hit fly balls 40% of the time vs 5% for weak … We do have Contact/Batting F/X numbers avail. You can learn the speed off the bat and ball trajectory. The guy who left BP to work for the Astros did a series on this 11/2011. Please refine BABIP to incude contact intensity and quality before trying to use BABIP as a predictive stat!!!

    Comment by stefangfg — June 19, 2012 @ 1:40 pm

  3. I’ll post this here and see if anyone has any insight, re: Jemile Weeks.

    When he stole his 10th base of the season, he was leading the AL in SB (if I remember correctly). Since then, he has gone 28 games with a SB. He has gotten on base 43 times in 126 PA over that span, for a .341 OBP. But he’s had 0 SB and 3 CS.

    He had a “minor ankle injury” just before his last SB, and then a hip strain in early June. I’m not sure if he’s still not fully healthy, or what happened to his speed? Without the SB, he becomes *much* less useful in fantasy.

    Comment by JR — June 19, 2012 @ 1:48 pm

  4. The minute you add hard hit / soft hit into anything, you add even more stringer bias than you do with line drive percentage in there. If anything, I’d like to see a speed factor in there that’s not IFH, because that seems really random. Put Bill James speed score in there over IFH and it’s better.

    Put hard hit / softly hit, it’s more bias. If you’re talking about velocity and angle, you’re talking hit f/x, which is not available to the public.

    Comment by Eno Sarris — June 19, 2012 @ 1:51 pm

  5. Bautista’s BABIP is crazy. I get that hitting infield flies will suppress that, but still, .199? Sheesh.

    Granderson’s LD rate is 26% – could a BA rebound (to say .270 or higher) be coming?

    Comment by kid — June 19, 2012 @ 2:05 pm

  6. Maybe this has already been done, but I wonder what sorts of BABIPs three-true outcome guys generally have. Cause none of those three outcomes affects BABIP.

    Maybe everything Bautista hits hard this year is going over the fence? :-)

    Comment by JR — June 19, 2012 @ 2:17 pm

  7. What you call stringent I call refinement I guess. LDs btw are not always hard hit balls. Weakly hit LDs however tend to go for hits at almost the same rate as hard hit ones: in 2011 73% vs 68%. But weakly hit GBs and FBs rarely go for hits. Of course there is some bias if you are not actually using the hit f/x, but the discrepancy between weak and hard contact for GBs and FBs is SO HUGE if you discount the accuracy of the hard/weak determination by 50%, you still have a statistical refinement. And people won’t be scratching their heads about why their guy is so lucky/unlucky. Guys who hit the ball harder will get hits 25% more of the time on GBs and FBs.

    So there is some bias without access to hit f/x (is it released after the year ends?), but it is better than using a stat, BABIP, that’s devoid of crucial variable and leads to silly and misleading conclusions, ie, Hos my god he has been so unlucky for the past two months, how can that happen? The correct deduction is that Hoz is hitting weak ground balls, trying to pull the ball. You’ll confirm that deduction by looking at his Pull vs Oppo data and comparing it to last year.

    Comment by stefangfg — June 19, 2012 @ 2:21 pm

  8. This is excellent, Eno.

    Comment by Andrew — June 19, 2012 @ 2:51 pm

  9. ‘stringer’ not ‘stringent.’ as in, you have to pay someone to watch the game and decide which balls are weakly hit ground balls. I guarantee you that adds noise. If we watched the same game and recorded what we thought were weak line drives and hard hit line drives, you’d see very different results. Look at BIS. they have Line Drive (Fly ball), Fliner (liner), Fly (fliner) and all sorts of crazy definitions between a fly ball and a line drive… and the more is decided by an eye ball, the worse the confusion will be. If we could get batted ball angle off bat (vertical angle) and velocity, we’d be getting somewhere. I’ve seen batted ball distance and horizontal angle, but I’ve never seen this weak grounder vs hard-hit grounder you’re talking about, and if it comes from a stringer, I’m skeptical it refines much.

    Comment by Eno Sarris — June 19, 2012 @ 2:54 pm

  10. I do think three true outcome affects BABIP a little — in the denominator. Fewer balls in play. Also, a TTO hitter probably is a fly-ball hitter, which means lower BABIPs (see Bautista’s low BABIP).

    As for Granderson, I think there’s some chance he betters his BA for sure.

    Comment by Eno Sarris — June 19, 2012 @ 2:56 pm

  11. And in the case of Cabrera in particular, if his BABIP falls closer to his .347 xBABIP, his fantasy value will plummet. He’s a bit of an empty batting average.

    I enjoyed the article, but the above kind of comment is a pet peeve of mine, as it fails to distinguish between the regression that is obviously going to occur, and a truly worrying fall in value. Obviously, Melky Cabrera, like every other player in the majors, is unlikely to hit .364 for the rest of the year. But according to ZIPS, he should hit .304 for the rest of the year– and that’s based on a projected BABIP of .333. If his BABIP instead matches his xBABIP of .347, and he otherwise meets his ZIPS projection, he will hit about .315 for the rest of the year, with an additional 8 HR and 10 steals, for a final line of around .340/97/14/73/20. That’s pretty damn valuable. It’s actually better than I would have expected from eyeballing his standard stats.

    Comment by AF — June 19, 2012 @ 3:04 pm

  12. Great column, but is there any way you can round off those xBABIP and xBABIP-BABIP numbers to 3 or 4 decimal places? All of those extra digits don’t add any information and simply make the numbers hard to read.

    Comment by Dingbat — June 19, 2012 @ 3:24 pm

  13. Jemile Weeks has already been owned by 5 teams (myself included) in my shallow bench 12 team roto league. Seems like a lot of guys want to believe that he’ll get better, but he just can’t be started right now. If he’s not even stealing bases, he is utterly useless.

    Comment by BStu185 — June 19, 2012 @ 3:30 pm

  14. Here’s the link:

    With GBs and FBs you have a 40% hit discrepancy to deal with between hard and weak contact. You can drive a truck through that much “noise” and still come out with valuable cargo. Subjective, sure, but don’t fall prey to the fallacy of only looking for your keys under the streetlight. There’s valuable insights out there in the subjective darkness. At the very least, these empirical observations should force you to heavily qualify your use of BABIP, to recognize it’s limitations and understand how faulty a lucky/unlucky deduction is. Cameron came close to recognizing this with his timmy piece. Timmy is getting hit HARD so his BABIP is gonna be high. He’s not unlucky, he’s serving up meatballs and guys are making hard contact quite a bit.

    Aren’t you guys powerful enough to get access to hit f/x? Time to push for refinement or get left behind I think.

    Comment by stefangfg — June 19, 2012 @ 3:33 pm

  15. Looks like Coco Crisp just missed the cutoff with a disparity of .059. His LD% is down and GB% and FB% is up, which must take a little off its value, but a .219 BABIP must suggest some sort of progression. Does anyone see any stats suggesting anything different?

    Comment by Nate — June 19, 2012 @ 3:53 pm

  16. OK, I know I’m probably missing something obvious here…but how can a BABIP be *lower than* a guy’s batting average? Bautista is sporting a BABIP well below his real BA and this is befuddling me. Help a confused brother out…???

    Comment by MikeInNJ — June 19, 2012 @ 10:09 pm

  17. To get BABIP, you subtract HR from hits and K’s+HR from at-bats. So if you have a lot of homers and not too many strikeouts, it’s possible to have a BABIP lower than BA.

    For example, say you hit .250 with 11 HR and 20 Ks in 100 AB. Your BABIP is 25-11/(100-10-20) = .200, which is less than your .250 batting average.

    Comment by AF — June 19, 2012 @ 10:20 pm

  18. Ahh, thanks. Good to know I wasn’t missing something obvious. That’s a bit different than what the glossary says which implies BABIP=H/BIP.

    “Batting Average on Balls In Play (BABIP) measures how many of a batter’s balls in play go for hits. While typically around 30% of all balls in play fall for hits, there are three main variables that can affect BABIP rates for individual players:”

    Comment by MikeInNJ — June 19, 2012 @ 10:25 pm

  19. I have no idea what this means “In the name of full disclosure, I only have data classified as hard and soft for my personal use.”

    Who gave him that data? Did he classify it himself? Where did it come from.

    Here’s the great thing about ground balls and fly balls: they are easy to classify, and they are predictive ( ).

    Here’s the bad thing about line drives: they are really hard to classify, and they are not predictive. (

    So you want to take ground balls and make them harder to classify? A weak versus a hard ground ball is by nature harder to classify. To me, that makes them more like a line drive. Which is less predictive.

    You can say that a hitter has hit a bunch of weak ground balls and that’s why his BABIP is bad. It’s another thing entirely to say that he will CONTINUE to hit weak ground balls and will continue to be bad. And then a third different thing to say that what you classified as a weak ground ball will continue to be classified as a weak ground ball.

    I barely want line drive rate in the xBABIP formula because line drive rates are not predictive and therefore muss the formula up. If you want to make ground balls and fly balls less easy to determine — I return to the fact that you and I would have really different ideas about what a weak ground ball, strong ground ball and a line drive are — then I doubt that they are more predictive because of it.

    If Todd wants to show that x degrees of vertical angle and x mph of speed off the bat predict that the hitter will continue to hit at x degrees of vertical angle with x mph of speed (and have xBABIP), then that might be interesting. That doesn’t seem to be what he’s saying. I have no idea what a weak ground ball is to him.

    Comment by Eno Sarris — June 20, 2012 @ 2:14 am

  20. This is a great point. Don’t have much to add.

    Comment by Eno Sarris — June 20, 2012 @ 2:16 am

  21. I left those digits because I find them mostly irrelevant. Just look at the general placement. This xBABIP is not so refined that we ‘know’ it to even four decimal points.

    Comment by Eno Sarris — June 20, 2012 @ 2:17 am

  22. How does speed help your babip outside of IFH and BUH? If it’s a ground ball out of the infield, it doesn’t matter how fast you are, it’s a hit. I think having IFH and BUH setup the way they are in the equation, it does reward players using speed to their advantage.

    Comment by slash12 — June 20, 2012 @ 8:57 am

  23. Taking LD’s out of the equation really cripples it. Guys like Joey Votto, and Matt Kemp regularly put up huge LD%’s and that’s the reason they consistantly hit for high average. Yes it fluxuates a LOT, but there’s still some skill to hitting line drives. If you take out LD%’s almost everyone is going to fall in the (roughly) .290 -> 320 range or so, GB% and FB% just don’t have a large predictive power towards BABIP.

    Comment by slash12 — June 20, 2012 @ 9:03 am

  24. It’s true about line drives. I guess what I’m saying is that if we want to refine anything, I’d like to refine line drives so that they become more predictive and stable. I’m guessing stringer bias is at the heart of it.

    AS for speed, I’m not saying that speed helps outside of IFH or BUH — I’m saying that speed is not necessarily accurately reflected by IFH and BUH because there are inconsistent opportunities for IFH and BUH. So if we used, just as an example, bill James’ speed score, that might be more stable and more accurately reflect their true talent speed.

    I didn’t mean to speak ill of your work. It’s a great formula, and obviously I like it. This commenter had a specific issue so I talked about where I might go with the formula if I had better multi-variable regression skills.

    Comment by Eno Sarris — June 20, 2012 @ 12:03 pm

  25. I note your comment on Bryan LaHair, which makes almost the opposite point, even though xBABIP predicts about the same amount of regression: “The slide might not be as bad as some think.” What the LaHair comment gets right and the Melky comment gets wrong is making the call in relation to what “some think.” Everyone knows that hot hitters will regress. What the advanced stats help you do is assess whether a particular hot hitter is likely to regress more or less than what you’d predict just by eyeballing his conventional stats.

    Comment by AF — June 20, 2012 @ 12:07 pm

  26. Mike, what you’re missing is that HRs are not balls in play. So it’s not H/BIP, it’s H-HR/BIP.

    Comment by AF — June 20, 2012 @ 12:08 pm

  27. Ah Eno you keep harping on the subjective bias of the information, which is just a circumstantial thing. We can take that out of the equation with access to hit f/x. As far as predictive power, it should interest you to know that adjusted BABIP taking into account speed off bat and angle has an r squared double that of “dumb” BABIP, so double the predictive power. 80 mph seems to be the breaking point between hard and week contact. Please read the two article series by Mike Fast over at BP in Nov of last year. See this conclusion:

    Such an adjusted BABIP metric has a better split-half correlation for pitchers with at least 300 balls in play in 2008, with a correlation coefficient of r=0.26, compared to r=0.13 for normal BABIP. In fact, if we further adjust the BABIP by also crediting the pitcher with .536 of an out for every ball in play hit harder than 80 mph, we can improve the split-half correlation coefficient to r=0.34.

    Comment by stefangfg — June 20, 2012 @ 12:39 pm

  28. I have no problem with using Hit F/x to refine BABIP. It is my understanding that MLBAM is keeping Hit F/x numbers for its members alone. I do know that Baseballheatmaps has batted ball distance (but not time, so we don’t know the arc), and that HitTrackerOnline has batted ball velocity (and some other data), but the ‘whole picture’ that I’m looking for from Hit F/x is not available to non-baseball entities from my inquiries.

    Comment by Eno Sarris — June 20, 2012 @ 12:44 pm

  29. This may be a dumb question but I am assuming this linear regression for xBABIP uses the data for all MLB players for multiple seasons and not a select group of hitters. Then this would suggest it represents an equation for the interaction between all hitters and pitchers. If this is the case, can the same concept, xBABIP-BABIP be applied to pitchers to analyze individuals? I feel like I’ve read an article on here that articulates the notion that pitchers have more control over BABIP than hitters (i.e. inducing weakly hit grounders), if true it would limit the usefulness of this metric, but are there some limited statistical applications of xBABIP to pitchers?

    Comment by jdm — June 22, 2012 @ 5:44 pm

  30. I am guessing that it would be a poor metric to use considering that BABIP is highly variable and ERA in relation to BABIP would depend be highly dependent on the interaction of LOB% and BABIP and isolating BABIP’s affect on ERA and considering xBABIP in a season wouldn’t be all that great of a predictor/meaningful but I’m just a little curious to your perspective on it.

    Comment by jdm — June 22, 2012 @ 5:59 pm

  31. I think what could be even more useful, particularly for pitchers, in evaluating players and determing whether BA will regress or improve is taking a close look at BABIP by league and even division.

    Especially for pitchers NL vs AL could make a difference, or AL East vs AL west could also make a difference, particularly since AL East Parks such as Fenway and Yankee Stadium play hitter heavy (doubles off the monster anyone? That could drive a BABIP).

    I guess what I’m saying is that if we are trying, in a way, to predict a players BABIP over the course of the rest of the season in order to try and figure out whether he will sustain a BA or not, it might be helpful to know that a particular player is going to play a certain percentage of his remaining games against a certain set of opponents and in a certain set of parks and versuse a certain set of pitchers.

    If player A is going to play 60% of his games at Fenway and Yankee Stadium versus Petco and Safeco. Or if I guy is gonna get a bunch of games in the second half versus the royals pitching staff versus the nationals staff.

    Considering this might shed light on why a guy has a .368 BABIP versus another guy with a .250 BABIP. Sometimes it’s who your playing, 30-40 AB a year in the AL central versus Verlander can’t be good for these kind of metrics, you know?

    Comment by Sam — June 25, 2012 @ 7:22 pm

  32. Agree with Dingbat on this one. There is something to be said for 1) cleaning up the clutter to make it easier to read, and 2) presenting batting averages in the customary format (three decimals and no leading zero). In Excel that’s easy enough to do with the #.000 format, but I don’t know where this data is copied from (I’m assuming you didn’t manually enter all 9 decimal places).

    Comment by schmenkman — June 27, 2012 @ 7:18 am

  33. *three decimal places, that is

    Comment by schmenkman — June 27, 2012 @ 8:39 am

  34. Taking linedrive% out of xBABIP essentially makes it useless. The reason that Joey Votto is good is because he hits linedrives. Just because data are qualitative doesn’t mean that they aren’t useful.

    Comment by jessef — June 30, 2012 @ 10:12 pm

  35. Sam,

    Who is getting 30-40 ABs a year vs Verlander? At the most they might see him 4 times (12-16 ABs). More likely in the 6-12 range.

    Comment by Josh — July 18, 2012 @ 3:39 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Close this window.

0.179 Powered by WordPress