When You Should Ignore the Data

When Jim Leyland was setting his lineup for Game 3 of the ALDS, he looked to data for guidance. What he found was that Ramon Santiago was 7-for-24 in his career against CC Sabathia, giving him a .292 average against the Yankees ace. How much that played into his decision to hit Santiago second, we can’t say for sure, but he did mention this fact to reporters before the game and he did hit Santiago second last night. It’s probably safe to assume that Santiago’s history against Sabathia played some role in his placement in the lineup.

When Ken Rosenthal reported this on Twitter, I threw out a response about batter/pitcher match-up data in general, saying “Specific batter vs pitcher data is probably the worst use of statistics in the entire sport.”

A lot of people took umbrage at this comment, and when Ramon Santiago proceeded to go 2-for-3 off Sabathia — including a double that momentarily gave the Tigers the lead — many were happy to point out that Leyland’s move to insert Santiago worked, and thus, his decision to look to batter/pitcher match-up data was justified. There are quite a few problems with this scenario, however.

1. Santiago’s “success” against Sabathia relies on one viewing offensive capability through the lens of batting average. Santiago did enter the game hitting .292 against Sabathia, but he had never drawn a walk against him and had just one extra base hit, so his overall line against Sabathia was .292/.292/.333, good for a .625 OPS. Unless we’re still evaluating hitters like it’s 1884, Santiago’s previous performances against Sabathia should not have convinced anyone that he was likely to do well against him last night.

2. Batter/Pitcher match-up data has been shown to have no predictive value. In The Book, Tango/Lichtman/Dolphin devote an entire chapter — Ch 6, “Mano a Mano” — to looking for evidence that previous results of specific batter/pitcher match-ups would predict future results in those same match-ups. It wasn’t there. Despite looking at the 30 most extreme examples of matched-pairs where the batter had dominated the pitcher over a three-year period, the group was barely better than average in the fourth season against those same pitchers. When looking at the flip side, where pitchers had dominated the hitters, the results were the same. Most interesting is that there was little difference in actual future performance by the 30 hitters who had dominated their rivals versus those who had been dominated by opposing pitchers. Even at the extremes, specific batter/pitcher data showed no real usefulness in projecting future results.

In reality, we shouldn’t be overly surprised that this data doesn’t really tell us anything. Even when looking at multiple years, you’re generally ending up with something in the 20-30 plate appearance realm, a ridiculously small number of confrontations from which to be drawing conclusions. But, the problems with batter/pitcher data go even deeper — in order to get a larger sample, you generally have to find players who have been matching up against each other for many years.

For instance, 16 of the 26 plate appearances Santiago had against Sabathia before last night came in 2002/2003, back when Sabathia was an inexperienced thrower trying to establish himself with the Indians. He’s a massively better pitcher now than he was then, and it’s hard to believe that anyone should care about what happened between those two 10 years ago. In fact, in the last four years, the two had faced off just three times, and Santiago had gone 0-for-3 and hit into a double play. Not only was the batter/pitcher match-up data of questionable use, it was almost all entirely from a time when the two players were at very different points in their careers.

This is the kind of data that just isn’t useful, which is why I decried its usage on Twitter. However, I want to make clear that I’m not saying that there are no scenarios where I believe a specific batter could have an advantage over a specific pitcher, or vice versa. We know certain hitters do better against certain pitch-types, and that platoon splits are very real, so we’d expect a left-handed masher to do very well against a right-handed side-armer. I’d even be open to hearing good arguments about why a specific player could have success against a specific pitcher beyond generalities like handedness and velocity.

However, I’d suggest that this is an area where the evidence would need to be based on something other than the data. Like high school statistics, the numbers are essentially useless, which is why no one spends any time quoting the results of a player’s high school performance in the draft room. That doesn’t mean that we can’t differentiate between amateur players, but that we’ve recognized that we need other tools beyond their performance to help us understand who is likely to succeed and who is not.

The same is true here. If you want to make a case that a specific batter has an advantage over a specific pitcher, go ahead and make that case. We’re not saying that there are no situations where that reality exists — we’re just saying that relying on the past results of batter/pitcher confrontations is not going to help you find those specific situations. The data tells you what happened in the past, but it shines no light on what will happen in the future, and for the purpose of deciding who should play and who should not, it should just be ignored.

We hoped you liked reading When You Should Ignore the Data by Dave Cameron!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Dave is the Managing Editor of FanGraphs.

newest oldest most voted
mister_rob
Guest
mister_rob

who was the alternative and what are his numbers vs CC/CC type pitchers?

jake
Guest
jake

Raburn…career OPS vs lefties much better than Ramon. Been the third best hitter on the Tigers’ since AS break this year.

mister_rob
Guest
mister_rob

The same Ryan Raburn who holds a 167/231/250 slash line with 9 K’s in 26 PAs against Sabathia?

Makes Ramon’s mid 600’s ops look Miggy-ish in comparison

T-Roll
Guest
T-Roll

Wow, way to completely miss the point, Mr. Rob.

mister_rob
Guest
mister_rob

What point?
Leyland chose Santiago over Raburn. Santiago career OPS is 200 points higher vs CC than Raburn’s. And Santiago went on to go 2-3 against CC
So, does anyone HONESTLY think that Raburn, who had previously put up a Ted Lilly-esque slash line in his career vs CC, and has had 4 hits EVER vs Sabathia would have done better than Santiago did? Its not like he benched Miggy. We are talking about Ryan Raburn, he of the sub300 obp on the year

Dont see how anyone could come to that conclusion. Therefore Leyland’s decision was the correct one

Matthias
Guest

Individually against CC, both these hitters have an incredibly small sample size. And, as mentioned, many of Santiago’s PAs came against Sabathia nearly 10 years ago.

However we can look at how these batters do against similar pitchers to get a bigger sample size and more confidence in our result, and it sounds like that data points to Raburn.

mister_rob
Guest
mister_rob

And which lefties would you consider “similar” to CC? Because to me, what Raburn has done against guys like Rowland-Smith means nothing in this discussion
How about Cliff Lee? Raburn has a 263 ops vs him in 20 PAs
How about David Price? Raburn has a 607 ops vs him in 14 PAs
Kazmir before he was bad? Raburn has a 643 ops in 14 PAs

Seems to me Raburn has a real problem hitting hard throwing lefties. But hey he lights up rowland-smith, bruce chen, and aaron laffey. Thats got to count for something

mister_rob
Guest
mister_rob

And gee, I just looked and (albeit small samples) Ramon Santiago has beat up on guys like Price and Kazmir
So maybe, just maybe, Santiago is a better hitter vs hard throwing leties than Raburn is

CircleChange11
Guest
CircleChange11

This, to me, is getting to where it’s at.

While the broadcaster says “Santiago vs. Sabathia”, doesn’t it seem logical that teams compile stats on their players versus certain pitchers/types/pitches?

Wouldn;t it be fairly obvious to them which players can’t hit a curveball or changeup.

The players mentioned that Santiago does well against are non-curveball lefties. So as long as it doesn;t move a lot, Santiago may be able to make good contact. When it’s slower, with more bend, perhaps not.

I was re-reading the chapter in BTN last night about PECOTA nad how it view player types.

It probably would not take long to look at how Santiago does against the fastball-slider-cutter lefties and how he does against the curveball-changeup lefties. We may find something useful.

I just don’t have much affinity for the “ignore it because it’s a small sample” stuff regarding situations that will never accumulate enough PAs to have high confidence levels. It’s a simplified (perhaps lazy) way of saying “we can never know”. Well, for a manager that has to make a decision “Aw shucks, it’s all luck anyway” is not a satisfactory reason.

This is where statistics gets interesting to me, how players do against various player types. Since baseball eseentially comes down to a pitcher-batter confrontation, it is perhaps the most important data in the game … but we act like we can never know much about it. I think the more we dig, the more we’ll find that we do detect patterns.

I’m doubtful that in Dave Duncan’s binder he has page after page of “It doesn’t matter, it’s just small sample size” or “anything can happen”.

CircleChange11
Guest
CircleChange11

Sorry, the point I was getting at is that we should not inherently self-limit our research and commentary to “versus lefties”. To me, that does not represent advanced analysis.

Certainly, we know that all lefties are not equal, let alone similar.

While it’s true that we’re all great lovers, we throw different pitches at different speeds with varying levels of ability.

Likewise, as we cna even see from our pages, there are batters that absolutely kill fastballs but little else, and other guys that murder changeups, but cannot hit a slider to save their life. Wouldn;t that play into the situation as much, if not more, than handedness?

mister_rob
Guest
mister_rob

That is what I was trying to get at, and earned a bunch of dislikes for my efforts. Maybe Dave should have presented who the alternative was, what he had done vs CC, or what he had done vs hard throwing lefties. and what Santiago had done vs other hard throwing lefties. But he didnt, maybe because it went against his point
The dislikes should be directed to Dave on this one. At the very least a lazy piece of writing

and someone let me know the next time Raburn goes 2-3 against CC. Even if it is July of 2017

dnc
Guest
dnc

“So, does anyone HONESTLY think that Raburn, who had previously put up a Ted Lilly-esque slash line in his career vs CC, and has had 4 hits EVER vs Sabathia would have done better than Santiago did?”

No, the chances are that Raburn wouldn’t have done better. Of course, if you play that game again, the chances are very strong that Santiago wouldn’t be able to do that either.

You can’t take a 2-3 in one game and use it as some kind of evidence. The performance is an outlier (just as Raburn going 2-3 would have been).

CircleChange11
Guest
CircleChange11

You can’t take a 2-3 in one game and use it as some kind of evidence.

If it’s not evidence then what is? It IS evidence … just not as much as we want, nor the quality we seek.

What managers are looking for is successful performance in various situations. We can say whatever we want, but Santiago continues to perform well against CC Sabathia.

Does that mean he’ll continue to hit well against him in the future? Well, it’s no guarantee … but I like his chances versus a similarly talented player that has had terrible success against him.

I’m trying to figure out what you guys want. You HAVE to play SOMEONE against CC Sabathia. Someone has to bat against him. You have Raburn and Santiago. If there’s a game 5 and for some reason CC starts, who do you play? Santiago or Raburn? Why?

——————————-

Dave brings up an interesting point that I wish he would delve into.

He mentioned that half of Santiago’s at bats are against the 2002 version of CC Sabathia. The insinuation is that because of this, the overall results don’t matter or are inconclusive.

Couldn’t one look at it like “He could hit him in 2002 and he can still hit him in 2011?”. CC has gotten a lot better over the last 8 years, and Santiago is having some success against him.

I don’t know either way, just pointing out how there are multiple ways of perceiving and anlyzing small sample data.

Like I mentioned earlier, all of us would love to have loads of data that make the confidence level very high, but in baseball we rarely have that. So, you start with true talent … which is why guys like Howard and Granderson still play against lefties, but when true talents are about equal (or both below average), then you start looking for the “little edge” you can gain.

We say that we don’t value small sample data or that we shouldn’t base decisions on it, but if we’re in the manager’s spot, we likely do the very same thing.