Do we use luck and randomness as a crutch?

On Wednesday, I posted an (entirely self-serving) article over at the CardRunners site, which could be summed up as half whining about my bad luck and half praising myself for entirely rebuilding my roster to compensate. In other words, nothing you’ll find interesting.

One of the commenters on my post, however, was leaguemate Chris Liss of RotoWire. One of his comments in particular stuck out, so I wanted to repost it here and then respond:

Dave Cameron had a good post about randomness in which he used the clearly random example of the NFC winning 14 straight coin tosses to illustrate that Dan Haren‘s .350 BABIP could in fact be dumb luck and not have any cause, e.g., bad mechanics, tipping pitches, etc. that people tend to ascribe in those situations. And that’s entirely true. BUT—it’s also wrong to assume that his .350 BABIP must be dumb luck. It might well be, and it might not be. There could be a problem with his location, mechanics, etc. that partially or entirely explains it.

I think a mistake that a lot of the sabr community makes is to assume that bad pitcher BABIP is always bad luck, or bad HR/FB rate is always bad luck. Sometimes, there is something wrong.

In fact, Todd Zola sent me BABIP data by count—and BABIP goes up reliably as the count gets more hitter favorably – like .315 on 3-0, and .285 on 0-2. It’s .305 on the first pitch. So let’s say a guy like Haren (or Aaron Harang or Dave Bush) gets a rep as an extreme strike thrower—then batters might swing more often at the first pitch, rather than take a pitch and get behind.

There are probably better examples out there, too—just wanted to point out that not everything that might be luck is in fact luck. The hard part is figuring out which is which.

I’ll absolutely agree with Chris that what may look like bad luck can—sometimes—be a legitimate problem. What often goes unnoticed is that part of what we “sabr” folk consider statistical regression to the mean is, in actuality, players making adjustments. If a player truly is tipping his pitches, he’s either going to be out of the big leagues before we can see him “regress” or he’s going to make the necessary adjustments to stick around long enough for us to actually see him “regress.” Of course, that’s not all that regression to the mean is—part of it truly is just mere statistical theory in the mold of Dave Cameron’s coin flip analogy — but it is definitely a part.

The Haren example

In Chris’ comment, he says, “I think a mistake that a lot of the sabr community makes is to assume that bad pitcher BABIP is always bad luck, or bad HR/FB rate is always bad luck. Sometimes, there is something wrong.” While this is absolutely true, Chris has made it seem (at least to me) that the instance where “something is wrong” comes along more often than it actually does, or more often than we’re truly able to identify it.

Perhaps he’s just taking this stance because he perceives the guys on the other side of the argument to hold the polar opposite view, and his actual views are more balanced, but I think, in this instance, Chris is overstating how often the “there is something wrong” scenario actually occurs. To continue using Haren as our example, let’s lay out what we know:

  1. Haren has posted a .355 BABIP to this point in 2010
  2. With a career BABIP of .305 career over 1,400 innings, his 2010 figure is very abnormal
  3. It takes roughly six full seasons for BABIP to normalize

Given these known facts—six full seasons!—you’re going to need to show me some very compelling evidence that Haren will not regress. That’s not to say that the evidence doesn’t exist, just that if we’re not buying into a Haren regression, we’re either being extremely foolish or we have some very convincing evidence at our fingertips. It’s entirely possible he’s tipping pitches or is having trouble with his mechanics, but the odds of him regressing are simply too great to ignore if we don’t have proof to the contrary.

The one other thing we need to consider is that if Haren is indeed tipping pitching or struggling with his mechanics, it’s highly unlikely that it would only manifest itself in his BABIP. Analysts will often say that BABIP is all luck, but that’s not really the case. If we were to put a Little Leaguer on the Diamondbacks and allow him to throw 200 neutral-luck innings, I guarantee you he’s posting a BABIP above .500. BABIP is in large part luck given that the pitcher in question is a bona-fide big leaguer.

In the case of the Little Leaguer, that .500 BABIP is going to come along with an 0.0001 K/9 and a 15.0 BB/9. He’s not a legitimate big leaguer, so a high BABIP is expected. Haren, though, is posting monster strikeout and walk numbers. Guys who post monster peripherals don’t consistently have high BABIPs. It just doesn’t happen. I defy you to show me one example in the history of baseball of a pitcher with a 9-plus K/9 and sub-2 BB/9 but whose BABIP stayed over .350 in the long run.

And even if we’re only talking about the short-term here, if Haren’s high BABIP is a result of tipping pitches or doing something that bona-fide big leaguers can’t get away with, it’s highly unlikely that he’d also have peripherals worthy of a 3.32 xFIP—because those problems would affect his other numbers too! While it might be “wrong to assume that his .350 BABIP must be dumb luck,” it’s highly, highly probably that it is dumb luck. Unless you can show me evidence that it isn’t.

More musings on luck and randomness

This ties in with another of Chris’ comments on the CR post:

I will take issue with one premise though that I think is not entirely true—when your players play worse or better than they have historically that is not bad luck… it seems like people are alleging that buying a breakout player is dumb luck. It’s not. Maybe you couldn’t predict the extent to which he’d break out, but for example, as loathsome as it is for me to give Eric any credit, he deserves it for rostering Josh Hamilton. And he’s entitled to whatever massive numbers Hamilton puts up even if he didn’t specifically foresee them because that was part of the bargain he made when he bought him—that possibility.

Without getting too heavily into this (I disagree that players over or underperforming projections is completely independent of chance), I wanted to delve just a bit into distinguishing when we are truly predicting breakouts and when we’re merely getting lucky—and deciphering one from the other is no easy task.

I think fantasy analysts—and I’m implying no one in particular here—sometimes fall into a confirmation bias trap of seeing their breakout picks pan out and automatically calling it a success, even if the original analysis supporting the pick was shotty.

The Incompleat Starting Pitcher
The end of the nine-inning start and how we got here.

While I’m picking on Chris (kidding; I’m not really picking on Chris), one example of a breakout player that jumps to mind is Ricky Romero, who Chris drafted and has trumpeted his success with. Not to imply that the analysis was “shoddy” here (I don’t know what Chris’ analytical process with Romero was), but if we’re going to take credit for predicting Ricky Romero’s breakout, I think we need to make it clear why we thought he would break out. And it needs to be more than just “the ground ball rate last year really jumped out at me.” Dana Eveland had a better GB percentage than Romero last year, but he hasn’t broken out (quite the opposite, actually).

Again, this isn’t meant to be a shot at Chris in the slightest. I’ve made it clear in the past that I have a lot of respect for Chris, and he is the one winning the CR league right now. I’m quite sure there was more to it with Romero than just “he has a good groundball rate.” I would be interested in hearing about it, though.

My point is that I think we, as fantasy analysts, should be held accountable for our analysis and predictions. Or at the very least, we should need to explain our reasoning if we take credit for predicting a breakout.

Radio appearance

For those interested, I’ll be appearing on RotoWire’s radio show today at 11:30 am EST to talk with Chris Liss about these sorts of things.


Print This Post
Sort by:   newest | oldest | most voted
Mike Podhorzer
Guest
Mike Podhorzer

I love the “we should need to explain our reasoning if we take credit for predicting a breakout” line. I’ve always felt this way as well. Tell me why you expected a player to perform the way he has that proves you “right” and then I will determine if you deserve credit or not based on your answer.

You thought Josh Hamilton would not only rebound, but have the best season of his short career so far, because his swing is too sweet not to? Buzz, you lose.

Mike Podhorzer
Guest
Mike Podhorzer

Another problem with the non-stats guys or “balanced” guys is they don’t seem to realize that there actually are stats available for many of the things they mention.

Chris brings up Haren and some other pitchers possibly throwing so many first pitch strikes that get swung at and are inflating their BABIP. Isn’t that easily checkable using PitchF/X and gasp…stats?? Instead of endless speculating, once again we could check the facts.

eric kesselman
Guest
eric kesselman
I don’t think anyones suggesting breakouts are “completely independent of chance.” Very little is. I think the point (and perhaps people are being a bit too prone to jumping to extremes or accusing the other side of jumping to extremes) is that there is some skill involved here. Next, sadly, Im going to make a Lissian quant/quaint type objection. I’m all for using stats and having quantifiable reasons for our beliefs. However, sometimes we have strong intuitive senses about player break outs. This feeling may not occur often, but when it does, it is extremely accurate. I’ll spare you the… Read more »
Derek Carty
Guest
Derek Carty

Agreed, Mike.  I checked into that today, actually http://www.hardballtimes.com/main/fantasy/article/how-much-do-counts-affect-babip/

Eric,
TRAITOR! smile

Chris Liss
Guest
Chris Liss
I liked Romero’s robust ground ball rate in a tough division with lots of power hitters, but obviously, I was aware of his very solid K rate, that he was in a major growth phase (second full season in the bigs), was a good prospect, etc. But the ground ball rate to me jumped out as an insurance policy of sorts, and sure enough, he’s allowed just 8 HR so far this year. But it’s not just stats as Eric says – it’s a feeling about a player from watching him, from tracking his games, from seeing what he did… Read more »
eric kesselman
Guest
eric kesselman

Yeah, I wasn’t happy about it either.

Also, I’m not at all suggesting you can’t make meaningful conclusions entirely based on numbers, reasons, or analysis. I’m just saying we should ALSO be relying on our intuitive sense when we have a strong feeling.

Derek Carty
Guest
Derek Carty
Thanks for the reply, Chris.  I know you said there were better examples than count-based BABIP, but since that was the one you gave, I thought it’d be interesting to check it out.  If you have better examples that you think would be testable, I’d be happy to run a study on them. I also have a follow-up question for you that I’m curious to hear your response to.  Do you feel the same way about “lucky” BABIPs as you do about “unlucky” BABIPs?  That is, if you believe it’s possible for a guy like Haren’s high BABIP to be… Read more »
Derek Ambrosino
Guest
Derek Ambrosino
I think I’ll write a bit more in depth about the idea of “taking credit” for predicting a break out for Wednesday’s column, but in a nutshell, it’s very tricky. Branch Rickey famously said that luck is the residue of design. Now, one way to interpret that in the context of this discussion is in relation to players’ seemingly lucky or unlucky performances. But, the other way of looking at this is from a fantasy GM point of view. To win, you need some luck, some legit break outs, so to speak, and some smoke and mirrors. Making wise decisions… Read more »
Nick Steiner
Guest
Nick Steiner
Fantastic article Derek, you hit the nail on the head with Haren.  The same thing happened with Smoltz last year – people readily accepted his BABIP as a refleciton of his skill because they had a good explanation (he’s old, has lost his stuff, etc).  But guys simply don’t strike out 8 per 9 and walks 1.5 per 9 in the AL East if they are old and lost their stuff (well Smoltz was old and lost his stuff, but he was still damned good, as evidenced by his time in the NL).    I think the other thing people… Read more »
aweb
Guest
aweb
6 years to normalize BABIP is clearly far too long to take it seriously, if that is the case. 6 years is a long, long time for a pitcher – velocity drops, movement might change, new pitches get learned, defenses change, injuries happen. How is this 6 year figure determined? 6 full seasons is a pretty good career. “I defy you to show me one example in the history of baseball of a pitcher with a 9-plus K/9 and sub-2 BB/9 but whose BABIP stayed over .350 in the long run.” – this statement is set up to be impossible,… Read more »
Oscar Heller
Guest
Oscar Heller
Excellent article in general, and the part about taking credit for breakout seasons deserves an article(s) of its own. The point I really want to drive home is that in a very real way, results do not matter for evaluating decisions: what player you draft, what trade your favorite team’s GM makes, etc. To evaluate the quality of a decision-maker, the only thing you should look at is the decision-making process, not the result. If a GM signs a pitcher because (and only because) he won 15 games the past year despite a terrible ERA and peripherals, and the pitcher… Read more »
matt
Guest
matt
I think one of the problems is that a lot of people in the sabr community tend to be falling into the Duellish mindset of “Everything that can be invented has been invented.”  Now, the real the smart thinkers continue to push the stats forward, bettering them regularly, but many people seem to look at something like xFIP and say, hey, it’s better than ERA, it must be perfect. I think Chris is merely acknowledging the fact that we don’t know everything, everything hasn’t been invented, so if Pangloss is going to give me what player x is worth in… Read more »
Blair
Guest
Blair

For interest sake:

For the top 700 SP in IP(in history), I ran the correlation between K% : BABIP,BB% : IP, K/BB : IP.

There is a minor positive correlation between k%:BABIP, and a larger negative correlation between BB%:BABIP.  And the Largest correlation (.3) between k/bb : BABIP.

What does this tell us?

Pitchers that have great K/BB rates have a higher expected BABIP, and are less likely to regress to “league average”. 

So, yes, Haren should regress, but when we see pitchers who spike in K/BB & BABIP simultaneously, we can’t attribute the BABIP to “chance” alone.

Nick Steiner
Guest
Nick Steiner

Blair – the correlation by itself doesn’t mean anything.  The slope (and p-value) are more important.  Can you please post those numbers?

Matt
Guest
Matt
@Oscar, But results do matter, besides in the obvious real way, they matter to the decision making process.  Unless we know everything (and if there’s ever a time where it’s safe to use never it’s to say that we’ll never know everything) we can’t be sure that our decision making process is in fact sound.  If I pick tails in a coin toss and lose 14 times in a row, that’s a weird coincidence.  If I pick tails and lose 114 times in a row, I need to be aware of that and maybe pick heads next time because maybe… Read more »
Derek Carty
Guest
Derek Carty

Matt,
I was just having this conversation with Eric Kesselman.  Results are important, but – as you imply – more in a macro-perspective where we can look back and see a large sample of what processes produced what results.  In that respect, results are extremely important.  But the result of a single event, which a lot of people tend to focus on and make judgments off of, is relatively meaningless.

eric kesselman
Guest
eric kesselman

I agree with most of the above. Results DO matter, but you need the right sample size. I think people might be a bit too hasty to use short term results to prove something, and I also think people might be too dismissive of some results, claiming we ‘know’ where the guy’s value really is.

Question: How are we to deal with results over short samples sizes? Having no experience with say, football, how would someone use football stats to say anything meaningful?

Tim
Guest
Tim
I find the whole results v. evaluative process discussion to be extremely interesting. And an important piece of this discussion, to my mind, is draft picks. In any sport, GMs are routinely harassed for making “poor picks.” It’s important to look at those picks with the convenience of 20/20 hindsight. Greg Oden, at the time he was drafted, was considered to be a potential franchise center. Durant was more of a wild card. We can’t criticize the Portland front office for what was, at the time, very likely the right pick. (Sorry for the non-baseball reference; the draft just has… Read more »
Chris Liss
Guest
Chris Liss
*A player’s season is like rolling a die. It can only end one way, and when it does end with a defined result, it obscures the fact that the season was still in one very real way a random result. If I roll a die once, it can’t accurately communicate its full range of outcomes the way a hundred rolls would.* To me this is a fundamental misunderstanding that many in the sabr community have. A player’s season is NOT like rolling a die. A player isn’t an All-Star baseball or stratomatic card with certain defined probabilities in each at-bat… Read more »
eric kesselman
Guest
eric kesselman
The point of the quote is merely that we care about the expectation of results, not the actual results. That’s pretty hard to disagree with, and I don’t think you do. Your point (which you’ve made before in other forums) is just that we can’t KNOW in baseball what the expectation of results are. Unlike a die roll, a poker hand, or a strato simulation, we can’t just mathematically solve it for the real answer because we don’t have perfect information in baseball. Instead, we are trying to infer who players ‘really’ are, and this often involves heavily relying on… Read more »
Derek Carty
Guest
Derek Carty
Agreed with what Eric says, Chris.  Additionally, you say “A player’s season is NOT like rolling a die,” but in many ways, it is.  Even if we were to know with absolute certainty that Player X has true talent Y, we’re not going to say that there is a 100% chance he plays at Y level, because he won’t.  Even if we know true talent with absolute certainty, there is still going to be variation around Y, because we’re looking at a finite sample size of 600 AB.  That’s simply how the world works. The reason a lot of analysts… Read more »
Chris Liss
Guest
Chris Liss
Yes, that’s what I’m doing – I’m aware that Haren’s BABIP is probably bad luck for the most part, but leaving open a small sliver of possibility that it’s the result of a decline in some skill. But Haren’s kind of an easy case. What about Aaron Harang or Dave Bush – I’m leaving open a much larger possibility that those guys aren’t just unlucky but actually bad. How much do you discount those guys off their xFIP? Also, can we please dispense with this kind of garbage: *Even if we know true talent with absolute certainty, there is still… Read more »
Chris Liss
Guest
Chris Liss
*I understand this is a tricky problem, but I always feel like you want to just shrug your shoulders, say we can’t ever know perfectly, and therefore the inquiry isn’t worthwhile. I don’t agree.* Seriously, Eric – I’m shrugging my shoulders? No, I have my own way of determining whether I think a player will deviate strongly from his previously demonstrated skills. The quants treat breakouts of that sort as its own kind of variance. They just think I was lucky to get Ricky Romero. I think there’s more to it than that. In fact, i’d argue that they shrug… Read more »
eric kesselman
Guest
eric kesselman
While I often feel you skim my replies, you didn’t miss that. I don’t want to be forced to defend anyone else, or their techniques. That being said, I don’t think your characterizations are fair. The quants are perfectly capable of taking shots at break out candidates, and using methods similar to yours IN ADDITION to theirs. You keep trying to force them into one box. Bill and Robert for example had Feliz, Avila, Scherzer, Matuz, Wade Davis on their auction roster. I don’t think those players were picked because of past performance. Now maybe you are better at picking… Read more »
eric kesselman
Guest
eric kesselman
I’m sorry you’re annoyed, but you singled out a quote whose whole point was about focusing on expectation, and not results. I don’t think its unfair when you get stuff back about basic expectation and variance. Now I admit your point wasn’t really to counter that principle (as I noted in my last post), but I do feel that you tend to present your objections more in a ‘We can’t know the truth’ vein rather than in a ‘this is a tricky problem- how can we approach it?’ That’s all I mean by my ‘shrugging the shoulders’ comment. Anyway, here… Read more »
Chris Liss
Guest
Chris Liss
And by the way, the post I cited was arguing that results don’t matter, only the odds going in. Which of course presumes that it’s like roulette or poker. But it’s not, and results are one of the ways we determine how many sides the die had, or how many zeroes were on the roulette wheel in baseball. Because we can’t simulate infinitely many results and find the true odds, the results we have are important. They would only not be important if somehow we knew the odds without them. When Bautista hits 40+ HR this year, that says something… Read more »
Chris Liss
Guest
Chris Liss
Also *but I do feel that you tend to present your objections more in a ‘We can’t know the truth’ vein rather than in a ‘this is a tricky problem- how can we approach it?’ That’s all I mean by my ‘shrugging the shoulders’ comment.* Isn’t the exactly the opposite true? I actually try to figure out how to solve the problem of players having unpredictable breakouts, whereas it’s the quants who chalk it up to variance. They’ve given up! So instead they work at the margins, trying to squeeze more efficiency out of the pricing of last year’s stats.… Read more »
eric kesselman
Guest
eric kesselman
If you read what I wrote a few comments up, I think you’ll find we’re largely on the same page regarding results. I wrote: “Instead, we are trying to infer who players ‘really’ are, and this often involves heavily relying on the results they’re actually generating. While there’s a ton of variance in there, we all know that there’s a correlation in there too. So we use results to try to figure out what we can meaningfully say about a player. The questions become: which results? over what time periods? How strong a statement can I make, and with what… Read more »
Chris Liss
Guest
Chris Liss
*Bill and Robert for example had Feliz, Avila, Scherzer, Matuz, Wade Davis on their auction roster. I don’t think those players were picked because of past performance.* Yes they were. Past performance plus historical projected trajectory (they’re not unaware that top prospects do take a leap). But for them to deviate from that would be to make a choice and not to be agnostic. But Bill acknowledged that he was 100 percent agnostic and didn’t even see the other side of the argument. How can an “agnostic” have a feeling about a particular player above and beyond his expected career… Read more »
Chris Liss
Guest
Chris Liss
First off, do you not acknowledge that the agnostics are punting on the “tricky problem,” not those who actually try to anticipate breakouts for particular players? I must have skimmed over your retraction. *Not to re-open the whole CR debate, but when you go past step 1 (which I feel you’ve articulated well), you’re kind of this black box. What is going on? How can I replicate it? What can I learn from it? What hypotheses can I test?* We’ve been over this time and again, but you don’t seem satisfied with my answers even though in practice you obviously… Read more »
eric kesselman
Guest
eric kesselman
ARggh. Just had a long post eaten. Let me try to re-create: I think you make some interesting arguments. I do think the genius/agnostic thing is a bit confusing at times. I just checked my spreadsheet pre-auction. I had Hamilton at $25.3. I suppose this is at least $5 over the market valuation. I guess that does make me a ‘genius’ on him, although I was ‘agnostic’ in the sense I didn’t target him, and wouldn’t go over my valuation. I just expected to get him given my valuation was likely on the high side. I think you could make… Read more »
wpDiscuz