## xOBA and Using Statcast Data To Measure Offense

For many years, I couldn’t wait to get my hands on batted ball velocity and launch angle data. When Statcast data became public last year, I wasted no time jumping in and playing around with the data. Last season, I began to develop a series of stats using information from Statcast, the fruits of which are what I call xOBA and xBABIP, with various spin off stats such as xRA (expected runs average for pitchers).

Over time, I hope to delve deeper into these stats, how I think they may be useful, and some interesting results I have found, but before all of that, I bet you’re curious how they are calculated.

I begin by splitting the launch angles for every batted ball into 5 degree by 5 degree windows. The vertical angle is provided by the Statcast data, while the horizontal component must be estimated using the Gameday location coordinates. Using Gameday to estimate the horizontal angle isn’t the best system out there, it certainly introduces a certain degree of error, but it is the best I have to work with at the moment.

Once split into 5 degree x 5 degree windows, I am left with a grid that is 36 units wide and 36 units tall (180 degrees/5 = 36). However, obviously half of those horizontal grids are far out of play, so it actually drops to about 20 wide by 36 tall, giving me about 720 windows.

Next, I split each of these windows into exit velocities using 2mph buckets. Batted balls seem to top out around 120mph, so that makes for 60 buckets per window, and when multiplied by the 720 windows, it amounts to around 43 thousand theoretically usable buckets to work with. However, in practice this drops to around 15 thousand or so, since many of the buckets are highly improbable combinations.

The largest buckets, at the moment, have several hundred to a thousand balls in them, while many buckets have 1 or 2. That is the nature of the process, though. I decided to accept that many of the buckets will be underrepresented in order to have higher resolution for the more likely combinations. I suppose in some future iteration, the smaller buckets could be combined in some manner, but for the moment they are not.

For each bucket, I find how many go for 1, 2, 3 or 4 bases, where reaching on an error is considered 1 base, and divide each by the total population of the bucket. I then multiply these odds by their associated linear weights, which I grab from the FanGraphs guts page, to give me an xOBA value for each batted ball. Walk and hit by pitch events are given their appropriate values as well in the process. I also sum the probabilities for 1, 2 and 3 bases for each bucket, giving me an xBABIP value for each.

After doing all of these steps, I am left with a table containing information about the batter, pitcher, all of the PITCHf/x and Statcast information, plus these xOBA and xBABIP values for every plate appearance dating from the beginning of 2015 up through today. I can then find averages for these values using player IDs, teams, all sorts of splits including pitch type or location, lefty/righty, home/away, early/late. Just about anything you can imagine.

On to the big question, why use xOBA? Well, it appears to me that batters have no control over the fate of their batted balls once they leave the bat, but they do have a great amount of control over how the ball leaves the bat. Good hitters hit the ball harder, successful hitters hit the ball on angles that avoid fielders and fall for base hits. These stats, xOBA and xBABIP, attempt to measure the degree of skill a batter exhibits through his batted balls, while assuming league average results for those behaviors. I believe, although I may be wrong, this process is defense and park neutral, therefore, I hope, the only aspect of the game being measured is the batter’s skill. More specifically, their skill to hit the ball both hard, and on the correct angles.

Okay, so lets compare xOBA and wOBA and see how well the numbers match up. Taking all of the batters with 10 or more plate appearances in 2015, I created the following chart comparing xOBA and wOBA for the 2015 season.

I think we can agree xOBA and wOBA . Unfortunately, wOBA becomes very wonky when batters have fewer than 10 plate appearances, and even going a bit higher than that, wOBA can give pretty chaotic results. If I push up the plate appearance threshold to 30, the R-squared value goes to .848, and if I get rid of the plate appearance requirement altogether it drops to .645. I’m not exactly sure what accounts for the remainder of the variation, especially as plate appearances go up above 30 and 50. Perhaps shifting, park effects, or error introduced by the way horizontal launch angle is calculated.

Alright, enough of all that, how about looking at a few actual players? As of the morning of May 4^{th}, excluding players with fewer than 80 plate appearances, here are my top 10 batters ranked by xOBA:

Name | xOBA | wOBA | Difference |

David Ortiz | .474 | .426 | .048 |

Daniel Murphy | .458 | .444 | .014 |

Joe Mauer | .440 | .378 | .062 |

Chris Carter | .439 | .437 | .002 |

Aledmys Diaz | .435 | .474 | -.039 |

Brandon Belt | .431 | .400 | .031 |

Manny Machado | .428 | .451 | -.023 |

Josh Donaldson | .424 | .438 | -.014 |

Michael Conforto | .422 | .421 | .001 |

Bryce Harper | .421 | .402 | .019 |

There is a lot of agreement between xOBA and wOBA for many of these players. The biggest standout is Mauer, so I pulled up his batted values for the season.

Several of his hits that had a high likelihood of being a home run were caught as fly balls. For instance, he hit a fly ball in the top of the 8^{th} inning on April 7^{th} versus the Orioles, 99mph off the bat, 32 degree launch angle, -24 degrees horizontal angle (left field). Batted balls hit like this are a home run 80% of the time (24/30). Joe Mauer has many batted balls that suffered a similar fate. Another, this time on May 2^{nd}, he hit a ball that had an 85% chance of being a double and a 12% chance of being a single, but was caught for an out. April 15^{th}, he hit a ball that had a 67% chance of being a double and an 11% chance of being a homer, but was caught for an out. If these three batted balls landed for a home run and two doubles, his wOBA would be .425 rather than .378. By the standards of xOBA, Mauer should be having a better season and, if he continues to hit balls the way he has done so to this point, he should, hopefully, produce even better numbers than he has to date.

This is a small taste of the sort of information that can be gleamed from the xOBA stat. Next time I hope to discuss its sister stat, xBABIP.

Print This Post

Andrew Perpetua is the creator of CitiFieldHR.com and xStats.org, and plays around with Statcast data for fun. Follow him on Twitter @AndrewPerpetua.

I am interested in seeing the outliers on the wOBA and xOBA graph.

For instance, I would bet that Ortiz’s lack of speed is mostly to blame for the difference between his xOBA and wOBA. It is interesting that compared to wOBA, xOBA is the more pure hitting stat, because wOBA muddies the water with baserunning and BABIP. Faster players will stretch singles into doubles, while players like Ortiz will do the opposite. xOBA won’t see this of course.

Very interesting read. I look forward to hearing about xBABIP!

Great insight with the speed element. That is definitely something that should be looked into, although I’m not sure where to get the speed information from. I know statcast measures speed around the bases, maybe I can dig around and see if I can get my hands on any of that. Working in home to first, home to second, and home to third speed numbers might add a lot of the missing information. With Ortiz, perhaps the Green Monster holds him back as well.

Regarding outliers, some of the biggest are eliminated by pumping up the plate appearance threshold. Here is what it looks like bumping up to a minimum of 30PA. http://imgur.com/G6rwkIm

I haven’t looked at it, and maybe I should have already, but many those very low ranking people are probably pitchers, and eliminating them will could take away even more of those low end outliers.

Thank you for the support!

Park effects would be interesting to incorporate as more data is collected in future years. Saying that 80% of the balls hit like Mauer’s go for a home run is nice, but if you could say that number is x% at the ballpark where Mauer hit the ball would be great information as well.

Would also account for thinks such as a giant 37 ft wall in left field. There’s a lot of balls hit at Fenway that are home runs anywhere, but have a 0% chance of going out there because they don’t have enough height to get over the wall.

What about the shift hurting Ortiz’s wOBA.?