Some Thoughts on Batted Ball Data

Colin Wyers wrote a post today about potential bias in batted ball data. While I don’t have anything in particular to say about the results of his bias study, I have to disagree with his conclusion and debunk some of the information provided about the differences between Stat Corner tRA and FanGraphs tRA, which he uses to illustrate his point:

For starters, the difference in tRA between FanGraphs and Stat Corner is a poor stat to illustrate GB/FB/LD bias because there are other differences in the way both sites calculate the stat. Let’s take Felix Hernandez this year, for whom BIS and Gameday have very, very similar batted ball profiles for 2010.

         GB      LD     FB 
BIS     67.6    13.5   16.6 
GD      65.8    13.2   15.8

Now, here’s the difference in FanGraphs tRA vs StatCorner tRA

FG – 4.62
SC – 5.05

Almost a half a run difference. Why are they so different? It’s probably the component park factors, mainly on LD% and HR%, I would imagine.

Actually, I’ll plug both of those stat lines into the FanGraphs tRA calculator and see what I get: 4.62 and 4.70. So, about .08 of the differences is because of GB/FB/LD differences and the other .35 is park factors (or potentially slightly different weights).

Furthermore, if you look at individual player GB% correlation from 2003 to 2008 between BIS and Retrosheet data, you get .94. That’s among all players, whether they pitched 1 inning or 200 innings. Here’s the others:

GB% – .94
FB% – .85
LD% – .72

It’s not like the two data sources are telling you completely different things. For the most part, they agree, especially on GB%.

Baseball Info Solutions also rotates their scorers, to try and avoid any scorer bias as Ben Jedlovec stated here:

BIS Scorers are assigned “randomly”. We’re not using a random number generator, but it’s almost as effective. Scorers have a designated number (Ex. Scorer #11) which are then rotated through different slots in the schedule. If scorers 7 and 8 are scoring the late (west coast) games one day, they’ll be rotated to early games the next time around. There’s some miscellaneous switching to accommodate vacation, etc. too. In the end, everyone’s getting a good mix of every team in every park.

We also have several different quality control methods in place to make sure that scorers are consistent with their hit locations and types. We added some new tests this season using the hit timer to flag the batted ball data, so the 2009 data is better than ever.

Ben continues with:

BIS gets an almost entirely new set of video scouts each season. If you’re seeing the same “bias” in the same parks year after year, I can’t see how it would be related to the individual scorer.

It’s also important to note that BIS has an additional classification of batted ball data, Fliners, which is not displayed on FanGraphs and lumped in with Line Drives and Fly Balls. Fliners come in two varieties, Fliner-Line Drives and Fliner-Fly Balls.

Colin tackled the line drive issue before on the Hardball Times, in which Cory Schwartz of MLBAM responded:

our trajectory data is indeed validated as thoroughly as all of our other data: not just once, but three times: first, by a game-night manager who monitors the data entered by the stringer, second by a next-day editor who reviews trajectories against video, and third by Elias Sports Bureau. We take great care in the accuracy of all our data, including trajectories.

None of this is to say that your original premise is not true: line drive vs. fly ball is indeed a somewhat subjective distinction that may be influenced by a number of factors, not just press box height. But I disagree with your assertion that the accuracy of our quality is inferior in this (or any other) regard.

Now we know that there is subjectivity in batted ball stats, but in Colin’s conclusion he writes:

In the meantime, consider this my sabermetric crisis of faith. It’s not that I don’t believe in the objective study of baseball. I’m just not convinced at this point that something dealing with batted-ball data is, at least wholly, an objective study. And where does this leave us with existing metrics that utilize batted-ball data? Again, I’m not sure.

For me, this is a bit of an extreme conclusion to make. For stats like GB% I think there is little to be concerned about, but once you get to LD%, I think you should realize there is some subjectivity involved. Is it worth disregarding entirely or having a “sabermetric crisis of faith” over? In my opinion, probably not.

We all want best data possible and there are some exciting projects underway to collect more granular and precise data, but in the meantime, I don’t see any reason to dismiss the data that is currently available. Better batted ball data will certainly lead to more accurate results; I don’t think it will show completely different results.

Authors Note: This was an expansion on my thoughts from a comment I posted on insidethebook.com




Print This Post



David Appelman is the creator of FanGraphs.


29 Responses to “Some Thoughts on Batted Ball Data”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Sky says:

    “So, about .8 of the differences is because of GB/FB/LD differences…”

    Should be .08.

    — —

    Side note: anyone else find it amusing when data analysts who are great with concrete data start debating subjective/soft issues? It’s so damn frustrating.

    Vote -1 Vote +1

  2. Mike Fast says:

    There are clearly problems with the data. Isn’t the right thing to do to investigate these problems and attempt to clearly define where they exist and quantify the impact they have? That is what Colin is doing. I don’t think he’s advocating throwing out all the batted-ball analysis that has been done in the last five years or so.

    It was drilled into my head as a physics student that no measurement has meaning unless you also know the error on that measurement. Unless we know the error on UZR, +/-, tRA, etc., we don’t really know what they mean.

    There are different ways of approaching the quantification of the error. A WOWY study like Tango did with UZR is one good way. But that method does not necessarily tell us if particular players, like Ichiro Suzuki in Seattle or Andruw Jones in Atlanta are subject to biases that are important for them or players on their teams but negligible when considered in relation to the overall group of fielders.

    Also, I’m very skeptical of the defense of the data in the comments you provided from MLBAM and BIS. Comments about rotation of video scouts and training of stringers does nothing to help us quantify the error. We know there is error, and Colin’s effort is what is needed to help us find out how much. He should be encouraged and assisted rather than shouted down.

    Vote -1 Vote +1

    • Sky Kalkman says:

      I’m guessing everyone’s a lot closer to agreement than it currently appears based on the discussions at various sites. And that they wouldn’t disagree much with you.

      It appears a big source of disagreement is in interpreting the degree to which Colin thinks current batted ball data is usable based on his article. (Funny enough, this is the same sort of interpretation issue that batted ball data suffers from.)

      And I think one of the main points that David A. has addressed is the tRA by FG/SC thing. He was quoted in Colin’s article as saying…

      “There are a couple things which are different between the StatCorner version of tRA and the version implemented on FanGraphs. The main difference is we’re using Baseball Info Solutions batted ball stats instead of Gameday batted-ball stats. The other difference, though probably not as major is we’re using different park factors.”

      Based on THIS article by David A., the park factors are, apparently, MORE major than the batted ball disparities (or at least more major than the quote implies.)

      Vote -1 Vote +1

      • Mike Fast says:

        I think the tRA issue needs clarification, and I’m glad David A. responded with his doubts about Colin’s conclusions about that. However, Colin did present 3 years of data for two pitchers, and David showed us maybe one or two starts worth of data for one pitcher. Colin’s sample size was small enough to begin with; David’s is extremely small. I’m not ready to accept that the batted ball disparities are minor for tRA simply on that basis.

        Vote -1 Vote +1

      • Mike, that’s fair too. I can take a look.

        Vote -1 Vote +1

    • I’m not saying the right thing to do isn’t to investigate what the problems in the data might be and it’s not my intention to “shout down” Colin’s work on trying to uncover potential issues with the data.

      I had no issues with the research, I thought the point illustrating the differences between tRA on FanGraphs and Stat Corner was a poor one, and I thought the conclusion was dramatic.

      I don’t think he’s advocating throwing out all the batted-ball analysis that has been done in the last five years or so.

      I don’t think he’s actually advocating that either, and I may be in the minority, but that’s kind of what I got from the conclusion. A “you can’t trust any of this stuff” type of conclusion.

      Vote -1 Vote +1

      • wobatus says:

        I think an issue is with the degree to which folks treat this stuff as gospel. It is obviously useful info.

        Vote -1 Vote +1

  3. Mike Fast says:

    Thanks, Dave. (This is a reply to your 1:24pm post, but I can’t get a Reply button to show up there.)

    I’d particularly be interested to see about Wandy Rodriguez 2008-2009, where the LD% discrepancy between Gameday and BIS is huge (roughly 20% vs. 10%).

    Vote -1 Vote +1

    • Mike, and Colin for that matter, just so you know, I greatly respect both of your guys work and when it comes to data accuracy, I think we’re all in the same boat because we’re all using the same data.

      The point of this post was not to bury my head in the sand and say the data is fine, it was to try and provide more information on a matter that has been discussed a number of times and to try and clear up the tRA, which I’m taking a closer look at now.

      Vote -1 Vote +1

    • So just some quick work with removing the park factors and using BIS/Retrosheet data from 2003-2008 shows a .95 correlation between all pitchers on tRA with the different batted ball data sources.

      Between the two there’s an RMSE of about .5 when looking across data sources for pitchers with at least 100 innings pitched, but I think they are scaled slightly different (showing an average difference of about .4). Retrosheet data tends to show a lower tRA I think because of the way popups are classified.

      Vote -1 Vote +1

    • Peter Jensen says:

      I’d particularly be interested to see about Wandy Rodriguez 2008-2009, where the LD% discrepancy between Gameday and BIS is huge (roughly 20% vs. 10%)

      Mike – I don’t have the descrepancy as being anywhere close to what you state. Fangraphs has the BIS LD rate for Wandy at 23.3% in 2008 and 17.9% in 2009. Retrosheet has him having 84 LD out of 408 hit balls in 2008 ( 20.5%) and 96 of 589 in 2009, (16.3%). The difference in each year is 9 extra LDs by BIS or about .25 extra runs per 9 innings. That is in absolute runs. Normalized to the league wide differences as Colin describes in his article would make the bias half of that number.

      Vote -1 Vote +1

  4. dave says:

    I mentioned this in the comments in the BP article, and certainly it won’t address the historical data (although it can be used to compare both sets of BIP data), but when HitFX becomes available, won’t that provide a much more reliable BIP trajectory dataset? It seems like the solution to the measurement issue is already almost here.

    Does the April 2009 HitFX teaser data have BIP stuff or just speed off the bat/trajectory? We could even check between that and the subjective BIP data for April 2009 and see what it looks like.

    Vote -1 Vote +1

  5. Phrontiersman says:

    As one of said video scouts for this season, I can attest to the constant avoidance of team bias. Our schedules are extremely varied, and we never see the same team more than 5 times in any given month. As for batted ball subjectivity, we are provided with guidelines and examples after which we model our scoring decisions. The fact-checking and “proofreading,” of you will, is constant and multi-layered. What we are producing is high quality data, as accurate as we can make it after many viewings of every single ball in play. I can understand the concern; we are human, not robots. But rest assured against bias and unreliable data. I can vouch for it, just as any other scout this season.

    Vote -1 Vote +1

    • Colin Wyers says:

      And as noted in the article above, MLBAM can and has made similar assurances about the quality controls on their data as well. But when we look at the data, we still see evidence that someone has a park bias.

      The whole point of sabermetrics – at least as I understand it – is to, at the end of the day, trust what the data is telling us, rather than what we think the data should be telling us.

      Vote -1 Vote +1

    • Mike Fast says:

      Video scouts have no control over the camera angle, and stringers at the park have no control over the press box height. No one here, I don’t think, is accusing the scouts/stringers of laziness or ineptitude, or the data providers of purposefully providing shoddy data.

      The photogrammetric research by Matt Thomas at Busch Stadium has shown that a lower altitude for the observer leads to increased errors. It is not surprising to see that reflected in the data. There may be other systematic problems with the data that have nothing to do with effort.

      You all can be doing the best you can do, and there are can still be significant issues with the data. Identifying them is the first step to correcting them and getting better data for us all.

      Vote -1 Vote +1

  6. Steve says:

    Why would there be any discrepancy in the ground ball data? Seems like it should be pretty easy to classify a ball as a grounder.

    Vote -1 Vote +1

    • Mike K. says:

      If a SS is playing his normal position, and a LD hops 3″ in front of him before he snags it, is it a line-drive or a ground-ball? After all if the SS was playing a half-step further in he would have caught it.

      Vote -1 Vote +1

      • Steve says:

        Cool man I guess that makes sense. Just seemed like it would be relatively easy to determine one from the other but that scenario didn’t cross my mind.

        Vote -1 Vote +1

  7. Neil says:

    Really basic question – what’s the basis for classifying a ball as a line drive and not a fly ball (or the reverse), anyway? Or as a fliner, for that matter? What are their parameters? And how can I tell the difference, by sight?

    Vote -1 Vote +1

    • AlexPoterack says:

      I’m no expert, but as far as I know, it’s about what it sounds like; if the ball’s trajectory is mostly vertical, and a fielder has lots of time to get under it, it’s a fly ball; if its trajectory is mostly horizontal, and it takes a perfectly positioned fielder or a great play to snag it, it’s a line drive; if it’s somewhere between the two, it’s a fliner. If that sounds very subjective, it is, which is the source of the whole problem here.

      To tell it by sight, I guess a good rule of thumb is that if it seems like a hit right off the bat, it’s probably a line drive.

      Vote -1 Vote +1

    • neuter_your_dogma says:

      Line drives, like porno, are hard to define, but I know them when I see them.

      Vote -1 Vote +1

  8. OlSalty says:

    Overall I think his point is a pretty good one, there is clearly some park bias going on here. I don’t think people are accusing the scorers of not doing their due diligence, but maybe the effects of park factors on a ball in flight are just too complex for an eye to discern and can affect the way balls are classified differently for different parks, amongst other possibilities.

    I don’t think we should go throwing out all of the batted ball metrics, the premise behind them is sound but the data is just not so good right now. But it’s worth considering that the conclusions we draw from the current data should be expressed with perhaps a bit less certainty about what they’re actually measuring until these issues with the data collection can be resolved.

    Vote -1 Vote +1

  9. chene says:

    Maybe someone with access to the data can help determine this but where would you most likely find the difference in value in a fielder, especially an OF? My guess would be on the plays that are hit slightly higher than a typical line drive and slightly lower than a typical fly ball. My reasoning behind this is that a typical well hit line drive will either be hit within a few steps of the fielder and thus caught, or find a gap somewhere quickly. A fly ball will probably be run down due to the hang time.

    Unfortunately, these type of plays are probably where you find the greatest variance in how they’re scored. So if this is the case, the resulting error is likely to to be much bigger than if you’re adjusting for data issues in total.

    Of course, I may be completely off base with this hypothesis.

    In any case, since more plays are ruled line drives by BIS, does this mean that the value of fielders are inflated using those stats versus those using GD as their data source?

    Vote -1 Vote +1

  10. pft says:

    The only difference between GB and LD is that the GB hits the ground earlier.

    What you want to measure is velocity, or just time for the batted ball to get past the IF. < 1 sec is hard, 1-2 is medium, 2-3 is slow, over 3 or not at all is very slow. For FB over an IF'er head, they should measure air time and classify in a similar manner.

    Uncertainty exists, but efforts need to be made to minimize and estimate it.

    I much prefer dealing with observable events like BIP than theoretical run estimators which assume every team to be league average with a lineup to be made of league average hitters at every spot in the lineup.

    The only thing we can measure w/o uncertainty are things that can be counted.
    The only rate stats without uncertainty are rate stats based on observable counting stats.

    Adjustments and models that over simplify need to be used with caution. I LOL when I see someone argue that Player A is better becuase his WAR is 3.5 and the other guy playing for a different team and league is only 3.2. The uncertainty is at least +/1, but I have never seen anyone try to estimate it for this or other stats.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>