Evaluating defense using HITf/x

This is a look at what’s possible, not a serious attempt at a defensive evaluation metric. We’ll get there someday (and hopefully by someday I mean “some day this month”), just not today.

Our own Harry Pavlidis has the best look I’ve seen so far at the sheer depth of data available from the preview HITf/x data we’ve been given courtesy of Sportsvision. It’s the most data that I’ve seen made available to the public about what happens to a batted ball after it leaves the bat. But how do we get from there to an evaluation of defense?

What we knew about batted balls before HITf/x

The answer is, not very much. Typically data providers put a batted ball into one of four buckets:

  • Ground ball
  • Line drive
  • Fly ball
  • Pop-up

This is simply not very descriptive for our purposes, as I’ve stated before. And if that’s not bad enough, different data providers often don’t agree on the difference between a fly ball and a line drive. For example, is a Texas Leaguer over the infield into shallow right a fly ball or a line drive? What if the outfielder is able to race up and snag it? As best we can tell, the former is more likely to be called a line drive and the latter a fly ball, even if they follow the exact same flight path.

What we want to know about a batted ball

That’s simple. To evaluate the play of an outfielder, we would preferably know the following about batted balls hit to the outfield:

  • What direction the ball is hit.
  • How far the ball is hit.
  • How fast it gets there.

Can we get there from what we have available to us via HITf/x

Right now, the answer is: sorta. I took a look earlier in the week, and what have right now is the angle (horizontal and vertical) as well as the speed off the bat of batted balls. What we don’t have is spin. How important is the spin? Here’s an example of the path of a batted ball, launched at 35 degrees with an initial velocity of 95 mph:


The blue line is the path the ball would take if there was no spin; the red line is the path the ball would take if there was 2000 rpm of backspin. With spin, the ball travels almost 50 additional feet, and stays in the air about a second and a half longer. That’s a significant difference.

(This of course only takes into account the spin of the ball along the flightpath, ignoring any spin to the sides. Sidespin is of course very important to the path of a batted ball—picture a long, deep drive that you just know would be a home run, if it wasn’t slicing into the stands and ending up as a much less exciting foul ball.)

Can we estimate spin? There has been some helpful progress made in this regard, almost entirely by people who aren’t me. (I hope to learn more in this regard at this weekend’s PITCHf/xSummit.) Until then, we’re left with an imperfect picture of the flight of a batted ball.

What we can tell from an imperfect picture

First, we’ll look at the effect of flight time on DER. (This differs from the chart earlier in the week in that 2000 rpm of backspin were included in the estimates.)


And from another point of view, we’ll look at distance in feet travelled:


Obviously there is some substantial overlap between the two; the correlation between time and distance is a very robust.

What we still need

So how do we get from here to a defensive metric? The first thing we need is the direction the ball is hit laterally, which HITf/x helpfully provides. The next thing we need is an idea of who was on the field when each batted ball is struck. This can presumably be parsed from the Gameday XML data that is freely available.

Probably the biggest thing we are missing is just more HITf/x data. That’s necessary to establish a baseline to compare a fielder to, as the more data we have, the smaller we can slice the data we have and the more precision we get.

And of course, as mentioned above, our estimates of the flight path of a batted ball can improve. But we now are a lot closer to having that sort of information than we ever were before.

Retroactive Review: Ace
Looking back at some of Justin Verlander's most interesting moments.

Why is this a big deal?

One of the most vexing problems in sabermetrics is how to split responsibility for hits and runs between a pitcher and his defense. Our understanding has advanced only slightly, in fits and starts, since McCracken opened up the whole can of worms to begin with.

Simply having better data won’t solve this problem by itself, but it will give us a powerful new set of tools in at least finding the right questions to ask. I’m very, very excited about this, and I hope you are, too.

I am still learning an awful lot about this myself, and I plan to learn a lot more this weekend. Hopefully I’ll be recovered enough by this time next week to pass on what I’ve learned. We may not have a defensive metric that uses HITf/x yet, but we’re very close, and I’m confident we will soon.

References & Resources
Probably the greatest resource for anyone looking to study baseball using physics is Professor Alan Nathan’s website. Pay special attention to his course notes on the subject – you get Powerpoint slides, Excel spreadsheets and more.

The graph in the article of the flight path was generated using one of the spreadsheets available at that site.

Also invaluable is Robert Adair’s book, The Physics of Baseball.

As an aside that only a few of you will care about – I do believe I’ve figured out how to parse the required data about who was playing what position from the Gameday XML data provided. I am not, however, certain of this. As of this writing, the final query to put the data all together has been running for a solid hour and likely will not be done for a while longer. If I am correct and the data checks out I will be more than happy to share it with interested parties.

Print This Post
Sort by:   newest | oldest | most voted
Peter Jensen
Peter Jensen
Colin – I am glad you are going to make it to the Summit this year.  I look forward to meeting you.  Last year I worked with the Sportvision raw footage for 2 games in an attempt to manually calculate the Hit f/x parameters and also to try and estimate spin and hit ball landing location from those parameters.  I was successful in showing that Hit f/x parameters could be calculated from the existing footage, a result that I presented at last years Summit.  I was unsuccessful at the latter task, my conclusion being that there was too much variation… Read more »
Just curious, why can’t you just use a metric of where the ball lands and how long it takes to get there?  Wouldn’t that take all thr guesswork out of the whole classification of fly ball, liners, etc?  It seems to me that with the advanced tracking systems we have we can plot every point on a given ballpark, and also calculate pretty accurately how long a ball takes to land on that area, and from that we should be able to determine how often such a ball becomes a single, double, out, etc.  From there it can easily be… Read more »
Alan Nathan
Alan Nathan
Actually, I’ll have quite a bit to say about spin at the summit on Saturday.  By combining hitf/x and hittracker data, I can back out the two spin components (backspin and sidespin).  I am finding backspin values for home runs looking more or less like a normal distribution, centered at about 2000 rpm and with an rms of about 600 rpm.  I was not successful in finding a simple relationship between the backspin and the initial velocity magnitude or direction.  The backspin more or less increases with launch angle, but there is a lot of scatter about the general trend. … Read more »
Alan Nathan
Alan Nathan

I’ll have quite a bit to say about spin at the summit on Saturday.


Peter, for some reason, I remember read an article lately which I thought had said that obsticle had been overcome and some company was actually tracking that information.  If I come accross it, I’ll post it here.


I guess this is the article I thought about.  It looks to be a year or two away from the common people.

Colin Wyers
Colin Wyers

jedlovec3 – I can’t speak to the BIS data specifically. When it comes to hit location data, all I’ve ever had direct access to is the Retrosheet and Gameday data. That said, I may have mis/overstated my claims there, even as regards that data.

To all that will be at the summit, look forward to meeting you all.

Alan Nathan
Alan Nathan

Re:  DaninPhilly—From my perspective, the issue is how well we can predict hang time and landing point from hitf/x data alone.  As Colin pointed out, to do that requires some knowledge of the spin of the batted ball.  Of course, if we have the full trajectory or even just the landing point and hang time, the issue of spin is a moot one from the point of view of baseball analysis (although still an interesting issue from the point of view of baseball physics).

Peter Jensen
Peter Jensen

DaninPhilly – No one is currently mapping where a ball lands. MGL has a project to determine hang time but the information will not be publicly available. Eventually we will have access to both pieces of information, perhaps in the next year or two, and, as you suggest, that will be all we need to know.  What people are talking about here is how to do the best we can with what information we have now.

Scot Gould
Scot Gould
I have suspected that the Rays (and possibly some other teams), have used vectorizing of the flight of the ball, which the game tracking information provides, and not hit charts to realign their outfield defenses. By measuring both the eventually landing site of a hit and time of flight, from multiple years of data at the Trop, they have calculated the position for the outfielders which maximizes the probability of catching any flyball weighted on how much value the hit would have should it drop in. This has allowed them to bring their outfielders in to catch the more frequently… Read more »