As you’ve probably noticed, David unveiled split data as the newest addition to the site yesterday. This is something that has been in the works for quite a while, and David worked long and hard on getting this on the site. For the first time, we’ll be able to really break down how a player performs against different pitcher types, as things like xFIP by handedness of batter have not previously been available.
However, as RJ noted a bit this morning, we do want to encourage wise use of split data, because these are the types of numbers that can be abused at times. In practicality, any split is going to be a smaller subset of a larger sample, and when you reduce your sample size, you increase the amount of noise in the number. There’s no way around that.
In fact, you can slice and dice numbers enough ways to always find some way that a player performed abnormally. Whether it’s batting average against lefties on Tuesdays or FIP in alternating months, these are the kinds of numbers that really mean nothing. They are the kinds of splits that give rise to things like the “lies, damn lies, and statistics” cliche. When looking at split data, we’d suggest limiting your conclusions to effects that are well known – platoons, parks, pull or opposite field results, etc…
Finally, you also want to keep the overall performance of the league in a specific situation in mind when looking at split data. We’ll get league averages by situation on the site in the not too distant future, but here’s a sneak peak at some batted ball league averages (2002-2009), so that you can compare players against a baseline for each type of struck ball:
Bunts: .376/.376/.377, .336 wOBA
Grounders: .231/.231/.253, .214 wOBA
Flies: .217/.212/.602, .328 wOBA
Liners: .727/.723/.974, .734 wOBA
It really is stunning how important hitting line drives is. Unless you’re regularly pounding fly balls over the wall, any other batted ball type is just not very productive. In fact, when you look at the BABIP split for fly balls, you see that 87 percent of non-HR flies result in outs. Line drives are where it’s at.
We’ll have more on the proper way to use split data over the next few days. Enjoy them, find interesting nuggets hidden away, but also remember to use them judiciously. You don’t want to voluntarily cut your sample size in half if you don’t have a reason to.