Accepting Randomness

Most of the conversations about the Dan Haren trade boil down to how a person feels about pitcher evaluation. There are clearly still a lot of people that simply believe that whatever happens is the pitcher’s responsibility, so if he gives up a bunch of hits and some home runs, he’s doing something wrong and that should be held against him. High BABIP or HR/FB rates are evidence of throwing too many hittable pitches, or that his stuff has deteriorated, or that his command isn’t as good as it was, or some other explanation that we haven’t yet figured out. But, whatever it is, it’s definitely something, and it’s definitely real.

These opinions are generally held because of the outright refusal to accept randomness. The idea that something could happen repeatedly, without cause, is very hard to for a lot of people to swallow. But it’s true, and it’s a very important concept to buy into when trying to project the future performance of baseball players. Random happens.

For instance, did you know that the NFC has won 14 consecutive coin-tosses in the Super Bowl? Since 1997, the AFC has been on the losing side of the flip every single time. The odds of that happening are 1 in 16,384, and yet, it’s happened. Do you think the NFL is weighting coins? Do you think the AFC is perpetually hiring players who are terrible at guessing coin flips? Or do you think it’s just luck?

I’d imagine that most of us agree that it’s the latter. Because a coin has no ability to control what side it lands on, we are willing to agree that the results of what happens when it is flipped is random. However, as a culture, we don’t like to apply that same belief to people. They can make choices, adapt, and do things that affect the outcomes they are involved in, and so many of us assume that nothing that happens to a person is ever random.

Haren’s BABIP has been abnormally high in four of the last five months, dating back to last September. For many people, that’s enough to say that there’s a pattern that rules out any kind of randomness, and that the fact that he’s been giving up hits for what amounts to 2/3 of a season is evidence enough that he’s doing something wrong. However, when you look at the actual odds of that happening by random chance to some pitcher in MLB, you’ll find that it’s not unusual at all.

Using binomial distribution, we can see that the odds of a pitcher with a true talent level BABIP of .300 randomly posting a .350+ BABIP in any given month (of 115 BIP) is about 10 percent. Thus, the odds of that same pitcher posting a .350+ BABIP in any four out of five months is 1 in 2,200. Those seem like really long odds (though nothing compared to the Super Bowl coin, of course) until you remember just how many different five month stretches of pitching there are in Major League Baseball, especially once you introduce selective endpoints, where the time-frame is defined by looking for the beginnings of a potential pattern.

Given the number of potential different five month stretches we could look at across 350 pitchers using selective endpoints, it’s not a surprise at all that we can find a guy who has performed in a way that looks to be a rarity. The sheer quantity of players in the game, and the amount of games they play, means that we will always see performances that had little chance of happening. On its own, it is not evidence that randomness can be ruled out.

Maybe Haren is doing something wrong. Maybe there is a reason for all these no-hitters. Maybe there’s an explanation for Brady Anderson‘s 1996 season. We don’t know enough to conclusively say in any of these cases, but neither can you rule out that it may just be randomness at work. If you’re not willing to accept that, you’re going to see a lot of patterns where they don’t exist, and create explanations for things where there are none.




Print This Post



Dave is a co-founder of USSMariner.com and contributes to the Wall Street Journal.


212 Responses to “Accepting Randomness”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Kevin says:

    This could be a good introductory post for those who are just getting into sabermetrics, or those who choose to ignore them.

    Vote -1 Vote +1

  2. Joel says:

    I love how you could replace Haren’s name with Joba Chamberlain’s and it’d convey the exact same point. Except that people seem far more willing to throw Joba under a bus for 42 innings of bad results.

    Vote -1 Vote +1

  3. Choo says:

    Actually, per an article I will never forget (SI maybe), Brady Anderson attributed his 1996 season to a strict regimen of rollerblading around Baltimore in spandex while rocking out to U2’s Achtung Baby.

    I am not making that up.

    +25 Vote -1 Vote +1

    • Piccamo says:

      I remember reading an article where Anderson said that it was essentially all luck, saying that it wasn’t that many more HR’s than he had hit if you go week by week. Didn’t he also have appendicitis and a torn quad that year?

      Vote -1 Vote +1

    • Mike B. says:

      1996 was also the Year of the Magical Sideburns for Brady. Sure, he had those before and after, but ’96′s chops were just mystical.

      Vote -1 Vote +1

  4. I’m pretty sure Brady had that season due to some form of performance enhancing drugs.

    Maybe they were so special that they made him rollerblade around Baltimore in spandex while listening to U2.

    Vote -1 Vote +1

    • Choo says:

      Who knows for sure, but in Anderson’s case I would put my money on tropical fruit smoothies loaded with creatine and ecstasy, probably in one those priapic-shaped water bottles you see women sucking on during Bachelorette parties. He was a man’s man . . . which is to say, if a man kept another man locked in a trunk in the basement of his pawn shop.

      +7 Vote -1 Vote +1

    • aweb says:

      Except that there wold have been no reason for him to stop taking his magic elixir in 1997. Testing wasn’t happening, and there was a lot more money to be made by repeating that year (beyond his contract, which he signed after that year).

      Anderson may well have taken performance enhancers, but that doesn’t explain 30 extra HRs. If hittracker had the videos, I’d love to know how many “just enough” HRs he hit that year…I don’t remember him turning into a tape measure shot guy. I think it was a remarkable year of luck, no matter his new training regimen.

      Vote -1 Vote +1

    • Billy says:

      So you are saying Brady used steroids for the 1996 season ONLY? Ridiculous! No way in hell did the guy get on the juice, hit 50, and stop right away, when no one in the public was talking about steroids in baseball.

      Perhaps he was on roids. Maybe even a good chance. But NO chance it was for only one season.

      Vote -1 Vote +1

      • Billy says:

        And before there was Brady Anderson, there was Davey Johnson.

        Johnson hit 43 HRs in 1973. Only had 93 other career HRs, in over 5,000 plate apperances. Not sure why his name is never mentioned?

        Vote -1 Vote +1

      • Choo says:

        I think what he meant was that Anderson got bored of the Achtung Baby album prior to the 1997 season. He tried Zooropa and Pop, but as we all know, they just weren’t the same. Hence began the downfall of Brady Anderson. Thanks a lot, Bono!

        +13 Vote -1 Vote +1

      • Nobody takes steroids for one season and succeeds, one season of roids won’t help you enough. You’ll lose the muscle mass you gain since you no longer have the capacity to work out as much. Look what’s happened to guys like McGwire and Sosa after their playing careers.

        That wasn’t my point. Anderson was still a good hitter after that year.

        Vote -1 Vote +1

      • JK2 says:

        “Except that there wold have been no reason for him to stop taking his magic elixir in 1997. Testing wasn’t happening,”

        Maybe his testicles were shrinking, he developed back acne, his head started to blow up, and/or he developed roid rages and his wife or significant other told him to cut it out or she’d leave him. Or maybe it was one of the other known side effects like high blood pressure or jaundice. I would think that any of these common side effects of steroids might have been reason enough for him to stop.

        “Nobody takes steroids for one season and succeeds, one season of roids won’t help you enough. You’ll lose the muscle mass you gain since you no longer have the capacity to work out as much. Look what’s happened to guys like McGwire and Sosa after their playing careers.”

        You know that for a fact how? Steroids are known to have powerful and immediate effects and so it’s easy to see how a player who takes them and works out shortly thereafter could benefit greatly from them within one year. You also don’t know when Anderson might have started his cycles. And, I’ve been told by bodybuilders that the body mass built up by steroids can last well after steroids are stopped — depending on how much the athlete continues to work out after stopping the steroids.

        What I don’t accept is that power is randomness. Where a ball falls in relation to the placement of fielders can be randomness, but not how far a ball is hit with consistency. That’s a matter of skill/power.

        Vote -1 Vote +1

      • JK2 says:

        As for Davey Johnson, his record certainly warrants scrutiny in regards to steroids. However, I don’t believe steroids were as plentiful and available as they have been the last 15-20 years.

        Vote -1 Vote +1

      • Nathaniel Dawson says:

        @JK2, then why can a player hit 3 home runs in one game, but then later in the season go an entire month without hitting one? If it’s all about skill/talent, why doesn’t he hit 3 home runs every game? There’s just a lot of random occurrence in baseball that influences results, at bat to at bat, game to game, and year to year. And as Dave said, with the number of players playing and the sheer magnitude of the number of events that happen over the course of a baseball season, weird things are going to happen all the time. Because of the nature of probabililities, most players are going to end up around the mid-point, but you’re always going to get outliers in both directions that just don’t match up to what you would expect.

        Sabermetrics hates this, because it makes it damned difficult to come to firm conclusions about many things, but that’s just the nature of the game. Realizing this at least gives you some firm footing when looking at what has happened.

        Vote -1 Vote +1

      • JK2 says:

        Nathaniel, keep in mind I’m not saying that randomness has zero impact on HRs. Of course it might … but for the MOST part, HRs are a result of skill/power/strength and not randomness.

        Otherwise David Eckstein is just unlucky to have averaged 4 HRs/season for his career while Pujols is just lucky to have averaged 39? Is that what you are arguing? That this is just a result of randomness.? Tell me you’re not saying this and then maybe we can go on to have a true discussion … because I don’t think I get your point.

        Obviously players can be streaky. but over the course of a year and in the absence of other factors like injuries or advancing age, things like power will tend to even out and players will produce close to their career norms. And, of course there can be outliers — years in which a player greatly exceeds or greatly falls short of his career norms. But in the steroids era, when you have a year in which a player greatly exceeds career norms, especially in the power department, you have to question whether steroids were a factor.

        Vote -1 Vote +1

    • max says:

      Maybe Brady Anderson was a perennial 50-hr hitter who had one normal season and 10 unlucky ones. Unlearn!

      +21 Vote -1 Vote +1

  5. mcneo says:

    Random is simply the human condition of not knowing all the variables. If we knew all the variables, nothing would be “random”. Random is just an illusion. Of course, we don’t know all the variables in any real event, so we just average out the results and call it good. I certainly would like Dan Haren to be on my team. If a guy can get a strikeout once per inning, there’s a good chance he’s suffering from some bad randoms. But it could also be that his fastball isn’t what it used to be while his off speed stuff is still good. I’m only guessing because his fastball and cutter are both negative values this year and they were huge for him last year.

    Vote -1 Vote +1

    • His fastball isn’t any slower than it was last year. Less movement could be probable. Maybe it’s just bad location.

      Vote -1 Vote +1

      • Alex says:

        I honestly think the fact that so many pitches that were previously classified as sliders are now being classified as cutters is a major part of the problem. Looking at the profile of the two pitches, cutters tend to have a higher positive Z value, meaning they are generally flatter. Flat sliders (which are in fact being classified as cutters) could be an explanation for part of the BABIP woes and most of the HR/FB problems.

        Vote -1 Vote +1

      • That’s a very good point Alex. Hanging sliders usually go a long way.

        Vote -1 Vote +1

    • Nick says:

      If random is an illusion, what’s the variable we don’t know that is giving the NFC all those coin toss wins?

      Vote -1 Vote +1

      • hairball says:

        That’s easy. Height of toss, wind speed, length of grass at landing spot, etc. etc. etc.

        Vote -1 Vote +1

      • Wally says:

        But we know the flipper can’t control or take account of pretty much every one of the conditions that cause the outcome of a flipped coin. Thus, even if you know many or all of the things that control it, but you yourself can’t actually control it, what’s the real point in claiming “nothing is random?” It’s a distinction without a difference.

        Then you can apply the same thing to Haren. We know the kinds of thing that lead getting hits from balls in play (hit location, fielder location, fielder excicution, even your wind and grass length, etc.), but does Haren possess any kind of control over most of those factors? The answer is basically no. He has *some* control in ground ball rate or maybe line drive rate, but relative to his career those numbers have not changed nearly as much as his BABIP has over almost the last 12 months. In fact, his LD% is flat and his GB% has gone down. And while GB% is a good thing for pitchers because it limits extrabase hits, GBs actually go for hits more often than FBs. So of the things we know Haren can control, nothing has changed that explains his BABIP. Now its possible Haren can control something we don’t know of that also does not effect other stats that we know he has more direct control over, but I believe that to be remote possibility that would only explain a small fraction of his huge jump in BABIP.

        So what’s the difference between luck and not having control over all the events that cause the outcome?

        +11 Vote -1 Vote +1

      • The Hit Dog says:

        But the thing you forget is that there are factors which we cannot measure, like hit speed off the bat, that *could* be within the pitchers’ control. We’d just need better tools (HitFX) in order to determine whether those factors are indeed within pitchers’ control.

        Vote -1 Vote +1

      • baty says:

        …As Fangraph aficionados love to mock the player evaluators who clench arguments W/L records, ERA, and WHIP today, sooner or later the same community will begin mocking the reliance of statements using WAR, xFIP, and so on. It’s inevitable that stronger statistical comparisons to evaluations will eventually develop, even if each step is only the difference between 1 and a million to 1 and 975,000.

        Vote -1 Vote +1

      • Wally says:

        The Hit Dog,

        That’s true. It would be foolish to think we know everything there is to know, but we still don’t know them. So, to attribute HR/FB changes, particularly on the short time scale we see in Haren, to things such as you mention, is to argue from ignorance.

        Vote -1 Vote +1

    • joser says:

      Dr. Heisenberg would like to have a word with you.

      +24 Vote -1 Vote +1

    • Travis L says:

      You’re confusing your words. Deterministic versus nondeterministic is very different than random. The “we don’t know all the variables” is an explanation for deterministic.

      The two will often overlap (see modern encryption based on “random” keys). The keys are often generated by a pseudorandom number generator. They’re effectively nondeterministic, although some proof of concept cracks have been shown to exist.

      Further, the Heisenberg uncertainty principle negates your view. True randomness does seem to exist, at least on that level.

      But take solace, for men smarter than you have been unable to accept this. Quantum uncertainty (Heisenberg uncertainty principle) bedeviled Einstein from the date of its publication; he famously said, “God does not roll dice,” in response to the Dr. Heisenberg.

      Vote -1 Vote +1

      • Eric M. Van says:

        A significant percentage pf physicists, including Nobel Laureates, believe that Einstein was correct. For an explanation that Roger Ebert adored, go to the link I claimed to be my website (it’s actually his blog archive) and search for my name (last bullet point of my first post and the long second post).

        Vote -1 Vote +1

    • Rob says:

      Tell that to Heisenberg.

      Vote -1 Vote +1

  6. Rick says:

    This is a great post, Dave. There is a fundamental gap between those who understand randomness and those who do not. All of the work being done in behavioral economics bears this out. Getting this message out, educating the media & average fan about this dynamic and closing the gap, would do more for the sabermetric than just about anything else I can think of.

    It’s that idea that even if something has a 1 in 1000 chance of occurring, if there are 10s of 1000s of events, we’re going to see dozens of that thing happening. It would be interesting to see how many stretches of such bad luck we’ve seen over the last year & half. Can we establish that Haren is one of the X number of pitchers who we would expect to go through such a stretch due to randomness?

    Vote -1 Vote +1

    • Alex says:

      As someone who does understand randomness, I also think that some in the sabermetric community are starting to place to much emphasis on it (though a good portion of these people are the newer mainstream converts that just want a simple answer for everything). I’m more than willing to acknowledge that a portion of Haren’s problems are due to randomness. I would however hope that smarter people on that side of the debate could also realize that in all likelihood there is also a scouting explanation for the Haren’s seemingly impossibly high BABIP and HR/FB rates. While the chances of him maintaining BABIP’s that high for so long may only be 1 in about 2200, I’d also hazard a guess that the chances of the combination of BABIP and HR/FB as high as his has been for the past 5 or 6 months is a good bit less than 1 in a million. If it was just one or the other I’d be far more willing to accept that it was just luck. The combination of the two makes me thing there is something very real that has changed, even if luck is also part of the equation.

      Vote -1 Vote +1

      • Jeremiah says:

        I’ve also noticed the tendency to chalk things up to randomness. The other mistake I think some people make can be illustrated by the coin example. I suspect that there is some place where people can bet on the outcome of the Superbowl coin toss, and people are putting more money on the AFC lately. The thinking goes that because the NFC has been winning for so long, the “luck” has to even out, so the AFC is going to win this time. But that’s called the Gambler’s Fallacy; the odds of the AFC winning this year are exactly 50%.

        In Haren’s case, if his natural BABIP is .300, then that should be the prediction from this point forward. However, some people would expect him to have a BABIP closer to .250 so that, by the end of the season, it would “even out” to about .300.

        Vote -1 Vote +1

  7. Jon S. says:

    I think the Joba/Haren comparison is that they both could be experiencing some rotten luck. 42 innings is a sample size whose data is just begging to be f’ed up by randomness.

    Vote -1 Vote +1

  8. baty says:

    Yes… I like this…

    The conflict I deal with mostly is when to accept randomness as almost pure randomness… as sites like Fangraphs further develop more sophisticated methods in documentation, I always hope for a more descriptive reason to be unlocked, shedding even more light on what might be occurring… For instance being able to measure more complex occurrences, instead of just measuring based on what a player has the most control over.

    Thanks

    Vote -1 Vote +1

    • pounded clown says:

      Well put. Furthermore just how well do the parameters defining a stat like BABIP actually describe the conditions measured? At what point are your definitions tailored to the data available to you esp. when the technology exists to make more accurate measurements exist.

      Vote -1 Vote +1

  9. intricatenick says:

    You have to regress that coin to its true talent level.

    +15 Vote -1 Vote +1

    • intricatenick says:

      Isn’t the idea of a “true talent level” a fictional construct that imagines a world in which randomness does not rear its ugly head?

      I may be wrong about it, but the true talent level has not deviations from centrality – which sounds like a pretty certain world free from unpredictable effects – a lack of randomness if you will….

      I believe that Dan Haren would be awesome in that world.

      Vote -1 Vote +1

      • Patrick says:

        But in a random world, you need a mean around which your measured quantity fluctuates in order to have a useful model.

        The “true talent” is just a statement saying that we have a mean expectation around which this variable we’re measuring fluctuates.

        This is an idealized construction, but it’s a very relevant one.

        Vote -1 Vote +1

      • intricatenick says:

        I am not discounting its value as a useful fiction. It’s just that you can know never know exactly what it is and can only approach it asymptotically as your sample size increases. When some metric stabilizes is really just a shared assumption by those who use it. Much like an alpha of 0.05 in statistics. I like to ask my fellows in the trade at conventions “what’s so special about 5%” It usually gets a laugh.

        I agree. We have to use something to measure centrality and the idea has allowed us to make better and better predictions. All in all, I would agree that the utility of an idea should be its worth.

        Vote -1 Vote +1

  10. wobatus says:

    Babip is one thing, but I don’t think HR/FB rate is something that is strictly random around the same average 11% for all pitchers, at least I haven’t been convinced. Brandon League, 6 seasons in a row above 12.5%, with an average of 18.8% career wise, and yeah, some tiny samples (the 12.5 was the smallest since 4 innings his rookie year), but he just seems mistake pitch prone.

    Rafael Betancourt, again small samples, but 8 years in a row below 11% (career rate of 7.4%)? Even in Colorado. And he is an extreme flyballer (League an extreme groundballer). Sure, coins come up heads 8 times in a row, but I don’t think the hr/fb average rates are as applicable across the board as the .290ish babip allowed.

    Johan has also only had one year above 11%, and other than that below.

    It’s not that I don’t accept randomness. For babip allowed, i am convinced, although maybe some pitchers have some extremely slight effect there.

    BTW, i wonder why it is that for pitchers babip allowed is random around the .290 mean, but hitters have a greater effect.

    Vote -1 Vote +1

    • Wally says:

      Like with Haren’s high BABIP over 5 of the last 6 months, the times you observe large deviations from the average HR/FB is subject to cherry picking endpoints and having a large number of pitchers from which to observe these deviations. The way to test this would be to see if we find more pitchers than we would expect from random chance having these large deviations, and/or testing to see if HR/FB is relatively constant within a player’s career but looks random over players as a whole.

      To my knowledge people have done this and not found much evidence to support pitchers having control over HR/FB. This is why we have the stat xFIP in the first place. Like BABIP, HR/FB probably has a relatively small component that is under the pitchers control, but we don’t understand it as much as we do with BABIP (for which we know GB/FB tendencies, knuckleballers have some to do with it). So where is the logic in jumping to the conclusion that it is not random in the absence of any evidence to prove otherwise?

      Vote -1 Vote +1

      • Alex says:

        Pitchers almost certainly have some measure of control over HR/FB rates. The question isn’t so much about whether or not it exists as it is about whether or not the results can be seen through the fog of random variance (check out James “Underestimating the Fog”). Sure a large amount of the year to year change in HR/FB is due to random variance, but that doesn’t prove that pitchers have no control over it. Take Matt Cain for example. I’d have to think the chances of him maintaining the HR/FB rates he has for over 1000 innings are incredibly small, especially since they have been consistently low. And no, its not just his home park as his away HR/FB rates are still incredibly low. At this point I think its safe to assume that he will almost always have a well below average HR/FB rate, and if it does indeed go above the league average for a year or something its almost certainly due to random variance.

        That’s really the problem with a stat like xFIP. I know it works great for the entire population of pitchers. That’s what we would expect of a model that completely eliminates a skill that the vast majority of pitchers only have a marginal (at best) control over. The problem is that it may be very inaccurate for the outliers who do seem to have some control over HR/FB. Remember there was time when everyone agreed pitchers had no control over BABIP, then suddenly we realized they just had limited control (between .280 and .320 basically). I don’t know why people don’t want to accept the same thing for HR/FB (say between 8% and 12%).

        Vote -1 Vote +1

      • Wally says:

        Alex,

        We can go in circles with this forever. I know what you THINK to be true, and I know you can point to a handful of players that have stretches of high or low HR/FB. But that is not proof that pitchers do have control over HR/FB. And with Haren we aren’t even dealing with a pitcher that has had several seasons of high or low HR/FB, we’re looking at a change that has happened over less than a full year.

        “Remember there was time when everyone agreed pitchers had no control over BABIP, then suddenly we realized they just had limited control (between .280 and .320 basically). I don’t know why people don’t want to accept the same thing for HR/FB (say between 8% and 12%).”

        Because you’re arguing from ignorance. I leave the door open that its possible pitchers can control HR/FB to some extent, but as of yet I have not seen any proof that they do control it. Science and human knowledge in general has been wrong about a ton of things in the past, that doesn’t mean its wrong today about any one particular thing. You gotta do better than that.

        Do you have something more to present Alex? Because if its just look at pitcher A, B and C, or how we were wrong about X, Y or Z, then you’re never going to convince me.

        Vote -1 Vote +1

      • Alex says:

        The fact that you think I’m arguing for ignorance is a huge mark against you. Go read Underestimating the Fog and get back to me. The truth of the matter is that just because something exists, that doesn’t mean you can mathematically prove it. That’s just not how all data works. With HR/FB data, the overall variance is (likely) too great to mathematically prove that pitchers have control over HR/FB rate. However, as Bill James will tell you, that isn’t actually proof that pitchers have no control over HR/FB rates.

        This problem is especially bad in this case, because we’re looking at a fairly small selection of data. There’s a chance that once we have better BIP data for an extended period of time, we’ll be able to show some of this stuff better. Instead, Fangraphs currently only have Batted Ball data going back to 2002. Since the data is limited, its even harder to prove the point I’d like to.

        And for the record, I couldn’t care less about trying to convince you. You’re clearly the sort of person that likes to use these stats so you can play yourself up as being smarter than the casual fan. I’m as sabermetrically inclined as they get, but I also am freely willing to admit that our knowledge at this point is still extremely limited. I’m not going to completely dismiss scouting reports and chalk everything I can’t immediately explain up to luck. But hey, go ahead and do just that if it makes you happy, just realize it doesn’t actually make you smarter than the person you’re arguing with.

        Vote -1 Vote +1

      • baty says:

        Yeah, i mean… My argument is always..

        The fact that we don’t have proof to determine the cause doesn’t prove that the cause is absent.

        Vote -1 Vote +1

      • Wally says:

        Alex, stop pounding your fists over this “Underestimating the fog” thing. I read it, it doesn’t say what you think it does. And yes you’re making an argument from ignorance. Sighting anecdotal evidence that only suggests your theory MIGHT be true, but hardly proves it. You’re telling me HR/FB is under a pitcher’s control without ANY direct proof of that. If you’re ignorant of how that’s arguing from ignorance, I suppose we’ve reached a knowledge gap we can’t bridge in this format.

        “With HR/FB data, the overall variance is (likely) too great to mathematically prove that pitchers have control over HR/FB rate. However, as Bill James will tell you, that isn’t actually proof that pitchers have no control over HR/FB rates.”

        Exactly. I’m not saying they don’t have control over it. I’m saying YOU CAN’T PROVE THEY DO! And further I’m pointing out that it is incorrect to say “Pitchers almost certainly have some measure of control over HR/FB rates.” without ANY evidence to prove it so.

        “You’re clearly the sort of person that likes to use these stats so you can play yourself up as being smarter than the casual fan.”

        Hmm, you’re not resorting to personal attacks…. That’s nice. I suppose this goes along with your knowledge of logic…

        “I’m as sabermetrically inclined as they get, but I also am freely willing to admit that our knowledge at this point is still extremely limited.”

        Agreed. I by no means believe this is settled and pitchers don’t have control over HR/FB, I’m simply saying no one has proven they do.

        “I’m not going to completely dismiss scouting reports and chalk everything I can’t immediately explain up to luck.”

        That’s nice, what ever made you think I do? Is this the kind of response you give people when they ask for the data and analysis that supports your statements of fact? Create strawmen and make personal attacks?

        “But hey, go ahead and do just that if it makes you happy, just realize it doesn’t actually make you smarter than the person you’re arguing with.”

        Yes yes, and all the appeals to ridicules and ad hominems you can come up with will prove this statement of yours: “Pitchers almost certainly have some measure of control over HR/FB rates.”

        This whole argument is rooted in my basic response to that sentence above, which is, PROVE IT. If you’re as sabermetricly inclined as you say, you’d recognize that you can’t prove it, would admit as much, and not resort to personal attacks.

        Vote -1 Vote +1

      • Alex says:

        You’re asking for proof of something that is unprovable based on the current data available. If you did in fact read “Underestimating the Fog,” I would hope you realize how somethings that are true are still unprovable. In that sort of case, you don’t really have anything to point to exist basic reasoning and anecdotal evidence. Its not dissimilar to catchers defense and what James says about it in that paper.

        You’re acting like I didn’t bring anything to the table to back up my point beyond, oh I think this is what happening to Haren. As I said elsewhere, the chances of Haren’s BABIP and HR/FB being as high as they are while being independent of one another is well over 1 in a million. Taking that into account along with the scouting reports and the pitch f/x data, which seems to point to Haren’s breaking stuff being flatter, I think we have an easy explanation for what’s going on.

        I don’t have the time to run a mathematical study on HR/FB rate right now, but perhaps I will in the future and I’ll share it with everyone. It’ll be nice when we have more freely available data on batted balls. You just seem way too sure in your opinion, when all you’re basing it on is the lack of hard evidence. When we have anecdotal examples, scouts saying it, and a good explanation, I just don’t think the lack of proof is a good reason to dismiss something.

        Vote -1 Vote +1

      • Dave Cameron says:

        You do realize that your “one in a million” number is totally made-up out of thin air, right? You haven’t yet talked yourself into thinking that whatever you think the odds are reflect reality, have you?

        If you want people to take your argument seriously, stop telling everyone to read one article that we’ve all already read and go do the math yourself. Calculate the standard deviations away from the mean that Haren’s last year represents, then come back and we can talk.

        Vote -1 Vote +1

      • AJS says:

        @Wally: “Agreed. I by no means believe this is settled and pitchers don’t have control over HR/FB, I’m simply saying no one has proven they do.”

        But has anyone proven they don’t? Or that individual pitchers can’t? Why should Alex have to prove the affirmative when you can’t provide proof for the negative? Why is your position considered the default position?

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “You’re asking for proof of something that is unprovable based on the current data available.”

        So its unprovable, get over it.

        “If you did in fact read “Underestimating the Fog,” I would hope you realize how somethings that are true are still unprovable.”

        You’re only escape is in the word “somethings” here. Yes it would be naive to not think “somethings” are true that are currently unprovable. However, that is not evidence that THIS thing that is currently unprovable is true.

        “As I said elsewhere, the chances of Haren’s BABIP and HR/FB being as high as they are while being independent of one another is well over 1 in a million.”

        Is it? Could you show me how you calculated that?

        “Taking that into account along with the scouting reports and the pitch f/x data, which seems to point to Haren’s breaking stuff being flatter, I think we have an easy explanation for what’s going on.”

        We’re no morphing this discussion past that of pitcher’s controlling HR/FB and into what is going on with Haren. Things may be happening with Haren, but you can’t connect the dots between flat slidders and increased HR/FB. You don’t have that data.

        “You just seem way too sure in your opinion”

        That’s because it isn’t an opinion. You can’t prove pitchers have control of HR/FB. NOT AN OPINION!

        “when all you’re basing it on is the lack of hard evidence.”

        YES! Lack of evidence means YOU CAN’T PROVE ANYTHING!

        “When we have anecdotal examples, scouts saying it, and a good explanation, I just don’t think the lack of proof is a good reason to dismiss something.”

        This is total BS. You have a couple of anecdotes. NEAT. I’m sure I could find dozens of anecdotes that suggest the opposite, but I’m not even going to look, its pointless. Scouts saying something is equally stupid. Its observation without any formal record keeping of the required data to prove this hypothesis. Like I said, its watching spaghetti fly by your window and saying well “X person said that spaghetti is evidence of the FSM.” And then your reasoning is just handwaving. A few sentences of what sounds good to you isn’t evidence for anything. Particularly when its based on such flimsy data.

        I mean, flat sliders, ok, so are those HRs coming off of the flat sliders? How many of them is it? Are there significantly more flat slider HRs than HRs off other pitches? Can you even tell me that? If you can’t your argument doesn’t get off the ground.

        And you tell me I’m sure of my opinion? Sheesh, its not even an opinion, but what you got here…That’s an opinion. And you seem awfully upset I don’t buy it…. Look in the mirror pal.

        Vote -1 Vote +1

      • Alex says:

        Its not made up at all Dave. The fact that you would act like is incredibly disappointing. Its a very conservative estimate based on the fact that the chance of the BABIP data being random is 1 in 2200. Therefore, if we’re to believe that the BABIP and HR/FB data is independently that high due to pure randomness, the chances of his HR/FB data being completely random must just be greater than 1 in 455 for my 1 in a million estimate to be less than or equal to the actual probability. Are you really going to sit here and act like the chances of posting 5 HR/FB rates over 12.5% in 6 months, 4 of which were higher than 16.5%, is less than 1 in 455?

        I went with an extremely conservative estimate because I don’t have easy access to the raw data to make a correct calculation. If the chances of those HR/FB rates is below 1 in 455 feel free to prove me wrong.

        Vote -1 Vote +1

      • Alex says:

        I bow before you Wally. You’re clearly my intellectual superior because you can say you don’t believe in HR/FB being controllable without hard proof, while I make the mistake of believing in it without hard proof to the contrary. Clearly you’re opinion is far more reasonable than mine.

        Vote -1 Vote +1

      • Wally says:

        AJS,

        “But has anyone proven they don’t? Or that individual pitchers can’t? Why should Alex have to prove the affirmative when you can’t provide proof for the negative? Why is your position considered the default position?”

        For one, I’m not claiming they can’t. I’m claiming we don’t know they can. Its different.

        Secondly, how would you prove that individual pitcher’s can’t control HR/FB? Do we have a database with every pitcher that will ever exist? You generally just can’t prove negatives. Like I said before, this is because you don’t know what you don’t know. You’re never going to be able include every possible factor that could cause pitchers to control HR/FB, because you’ll never know everything to test. This is why science focuses on proving positives.

        Vote -1 Vote +1

      • Wally says:

        Good lord Alex,

        “I bow before you Wally. You’re clearly my intellectual superior because you can say you don’t believe in HR/FB being controllable without hard proof, while I make the mistake of believing in it without hard proof to the contrary. Clearly you’re opinion is far more reasonable than mine.”

        Its not that I don’t BELIEVE in anything, what ever it is. Its that there is no evidence to prove it. This is not an opinion.

        Vote -1 Vote +1

      • Alex says:

        Let me ask you one question Wally. Do you actually believe that we should expect Haren’s HR/FB rate to regress to the league average of around 10% or whatever from here on out? Even though he has an almost 1400 inning sample size that points to his HR/FB rate being above average?

        What about Matt Cain? Despite over 1000 innings of a HR/FB rate of 6.7%, do you think we should expect him to regress to league average from here on out?

        My position is that pitchers have a narrow range of control over their HR/FB rate, just as they do with BABIP. If you disagree with that, you should obviously expect everyone to regress to the league average regardless of past performance. Do you really believe that?

        Vote -1 Vote +1

      • Alex says:

        There is no evidence to disprove either. I haven’t argued there is evidence that proves it exists. I’ve argued for why I believe it exists despite the lack of mathematical proof. Call it a leap of faith if you want, but a decade ago you would have mocked me for saying the same thing about BABIP. Who would have ended up being right on that one?

        Sorry I don’t think mathematical proof should be a prerequisite to believe in something.

        Vote -1 Vote +1

      • AJS says:

        Posted this in the wrong place below — apologies!

        @Wally,

        I’m failing to see the distinction you draw. Obviously we can’t know what any pitcher ever will do. But we have a pretty robust data set in terms of what baseball pitchers have done so far. And the fact that some individual pitchers, year after year, seem to put up higher than average HR/FB and other pitchers seem to put up lower than average HR/FB can’t simply mean nothing. Remember, we thought that pitchers had no control over BABIP. Now we’ve found they have some control. Why couldn’t the same apply to HR/FB?

        Yes, I can’t prove that it does apply. But if you can’t prove it doesn’t applied (or, since you claim you can’t prove negatives — which, by the way, you clearly can, e.g. I’m sure you’d agree scientists have proved the earth is not flat — if you can’t prove it hasn’tapplied in the data set we have), why should we go with your position as the default position? Shouldn’t the default be neutral?

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “Call it a leap of faith if you want, but a decade ago you would have mocked me for saying the same thing about BABIP. Who would have ended up being right on that one?”

        This is a very juvenile attitude. You may in the end be proven right, but that doesn’t make my current position, which is to strictly go by the statistical proof, incorrect.

        “Do you actually believe that we should expect Haren’s HR/FB rate to regress to the league average of around 10% or whatever from here on out? Even though he has an almost 1400 inning sample size that points to his HR/FB rate being above average?”

        I would expect him to post HR/FB rates equal to that of what is seen in the ballparks he plays in. However, this being only one pitcher and a stat that fluctuates pretty wildly, it will likely not be exactly average.

        “My position is that pitchers have a narrow range of control over their HR/FB rate”

        Ok, prove it.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “Sorry I don’t think mathematical proof should be a prerequisite to believe in something.”

        Missed this.

        This is another juvinial attitude. We’re not talking about what you “believe” we’re talking about what we can prove and what we know to be true. If you can’t prove pitchers can control HR/FB, going forward you have to assume they will be league average, weighted for the parks they play in. You simply don’t have any evidence to support doing something different.

        Vote -1 Vote +1

      • Alex says:

        That doesn’t mean you’re wrong to go solely based on statistical proof in this case incorrect. That wasn’t the point. The point was a decade ago you would have been going on and on about how there was no statistical proof that pitchers had control over BABIP and you would have ended up very wrong. Maybe you shouldn’t be as reliant on statistical proof as you currently are.

        So then you’d expect it is as likely that he will post below league average HR/FB rates (adjusted for his home ballpark) as above average? You realize you’re ignoring a rather large amount of his personal data to come to that conclusion. I’m pretty sure its far more likely he’ll be below average than above average.

        As I continue to say over and over again, I can’t prove my position. As you’ve begrudgingly admitted, that doesn’t mean that my opinion is wrong. Is it that hard to accept that someone has a different opinion that you without statistical proof? Its not like you have statistical proof on your side either.

        Vote -1 Vote +1

      • Alex says:

        —————————————
        “Sorry I don’t think mathematical proof should be a prerequisite to believe in something.”

        Missed this.

        This is another juvinial attitude. We’re not talking about what you “believe” we’re talking about what we can prove and what we know to be true. If you can’t prove pitchers can control HR/FB, going forward you have to assume they will be league average, weighted for the parks they play in. You simply don’t have any evidence to support doing something different.
        ————————————–

        No I don’t and I’m pretty sure some of the more popular projection systems out there would disagree with you on this. I don’t see ZiPS regressing everyone back to the same HR/FB rate adjusted for their home park. They clearly base their HR/9 calculation in part on past HR rates. Then again I assume you don’t ever look at those since they use assumptions that have been proven statistically.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “The point was a decade ago you would have been going on and on about how there was no statistical proof that pitchers had control over BABIP and you would have ended up very wrong. Maybe you shouldn’t be as reliant on statistical proof as you currently are.”

        Err, what proves pitchers do have a little control over BABIP? Statistical proof. So am I not supposed to be relent on stat. proof for BABIP, but *believe* you for HR/FB? Your position is getting absurd. You’re telling me to do one thing because something like it was wrong once, though now doing that thing would mean ignoring the proof that this other thing is right now. My fucking head is going to explode with your logic. The reason people doubted BABIP was controlled, was because there was no proof that it was. They were not wrong in maintaining that. If people believed it was possible BABIP played a roll, they investigated it and found stat. proof to the contrary. At which point we all went “ah-ha, that’s good to know.” That’s the way it works, but up to that point, you have to, at the very least, be very reserve in your “belief” that something exists with out some real proof.

        “You realize you’re ignoring a rather large amount of his personal data to come to that conclusion. I’m pretty sure its far more likely he’ll be below average than above average.”

        You realize you’re ignoring far more data from the league as a whole right?

        “I can’t prove my position. As you’ve begrudgingly admitted, that doesn’t mean that my opinion is wrong.”

        “Begrudgingly?” I’ve maintained the stance that we don’t know if pitchers control HR/FB this whole time.

        “Is it that hard to accept that someone has a different opinion that you without statistical proof? Its not like you have statistical proof on your side either.”

        “My side” is only that of what stat. evidence we have, which is nothing affirmative. So yeah, we actually do have that evidence… People have tried to find correlations between HR/FB, they didn’t find anything. Thus, the creation of xFIP. Does that prove they are right? No. Like I said, you can’t prove a negative, you can only say X, Y and Z don’t predict A. If you can come up with M than does predict A, great! Do it. I’d love to see it. Prove me “wrong” in standing by all the current evidence, I love learning new things.

        Vote -1 Vote +1

      • Alex says:

        I guess all I can say is be happy being “right” until we get sophisticated enough statistics to prove you wrong. You’re likely always going to be behind the curve if something has to be statistically proven to you before you accept it. Theories are generally made before there is good enough data to help prove them.

        Vote -1 Vote +1

      • wobatus says:

        League has an %18.3 hr/fb ratio in his career. Every year, 6 years running, he has allowed more than 12.5% (and that was in the very smallest sample, 11 2/3rds innings-one would expect that with small samples every year, in one of those years he’d get lucky, like the o homers in 4 innings his rookie year).

        League has pitched 253 innings in his career. Since records on this have been kept, since 2002, only 3 pitchers have had a full season with a higher hr/fb. Brandon Webb, 18.9% in 2005. Derek Lowe, 18.8% in 2005 (Tim Hudson was also high that year). Odalis perez, 19.7% in 2003.

        In fact, only 4 seasons were above 18%. Throw in Greg Maddux at 18.2% in 2004.

        What do all 4 of those seasons have in common, along with League’s career? All 5, groundball rates above 50%. Hmm, i think we may be onto something here. Maybe some groundballers tend to have extreme swings, maybe when they are more mistake prone, hanging sinkers?

        So League’s career is quite the outlier. He does it every year, too (not over 18%, but in the teens, given enough innings).

        I am sorry, I am not convinced that it is just random luck with League. I know it MAY be. If he is down at 5% next year, I am going to assume THAT’s the outlier, or he learned something new.

        And everyone told me Jason Bay really was putrid in the field last year, his UZR said so, and it turned out that was “wrong.” :) I kid, I kid.

        BTW, I ask again, if I am allowed to pitch, with my 65 mph fastball, am i going to allow more hr/fb? Is this ALL pitchers, or just major leaguers? Even Ollie Perez and Scott kazmir (both above 11% btw)?

        Vote -1 Vote +1

      • Wally says:

        Alex

        “I guess all I can say is be happy being “right” until we get sophisticated enough statistics to prove you wrong. You’re likely always going to be behind the curve if something has to be statistically proven to you before you accept it. Theories are generally made before there is good enough data to help prove them.”

        This right and wrong BS is so juvenile. I’d be thrilled if people figured out what explains X% of the variation in HR/FB, like they did with BABIP. That’s great, the more knowledge the better. But sticking with the current evidence doesn’t make me wrong or behind the curve. It makes me LOGICAL. You have a hypothesis, some people share it, but right now its just that, an unproven hypothesis. Its not my job, nor sensible to try to learn every unproven hypothesis and evaluate its merits based on flimsy anecdotes and back-of-the-napkin reasoning. If you really got something, prove it to us.

        Vote -1 Vote +1

      • Wally says:

        wobatus,

        It would be fairly easy to test your hypothesis. Does GB% correlate with HR/FB%? What if corrected for park? What if corrected for some sort of skill factor like K/BB, K/9, or BB/9?

        This isn’t any sort of absolute proof, but just doing a quick correlation between GB% and HR/FB in 2009, we get a flat line with an R^2 of basically 0.

        Not looking hopeful, but you could keep looking.

        Vote -1 Vote +1

      • Alex says:

        It doesn’t make you logical, it just makes you reliant on statistical proof. Those two aren’t the same thing, as much as you’d like them to be. Statistics is more limited than you apparently want to admit, especially when we’re not actually talking about truly random independent events. Baseball is the sport most tailored to statistics, but that doesn’t make them a perfect match. Have fun living in your utopia where statistics perfectly capture everything in baseball. I’ll be living in the real world.

        Look, I was a lot like you a few years ago. Everything had to be statistically proven to be meaningful. Then I realized that there’s some things statistics can’t easily explain. I still love sabermetrics, don’t get me wrong, but its much more fallible than you want to admit. Good luck to you.

        I have never once said to you that its your “job…to try to learn every unproven hypothesis and evaluate its merits based on flimsy anecdotes and back-of-the-napkin reasoning.” I simply tried to present what I believe to be the case, with some reasoning behind why I’ve come to believe it (whether you want to believe it or not, the chances of Haren randomly going through his BABIP and HR/FB struggles are extremely small unless they aren’t actually independent). You’ve just continually bashed me because I can’t prove it statistically. That wasn’t what I set out to do. Sorry I can’t prove something that isn’t possible with the currently available data.

        Vote -1 Vote +1

      • wobatus says:

        Wally, it may not be all gb pitchers. And it may be that they are prone to spikes, but overall the rate is the same. It just seems to me that 5 examples since 2002 of guys over 18%, ALL of them groundballers, that doesn’t seem that random. may be, but I still am not sure. And League is that way every year. Again, I get the point. It could be someone calling a coin successfully 14 years running.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “It doesn’t make you logical, it just makes you reliant on statistical proof. Those two aren’t the same thing, as much as you’d like them to be.”

        This is a question that yields itself very nicely to be answered using statistical analysis. It would be quite foolish to ignore the best tool for the job. No matter how much you like that or not. If you don’t have the data to test your hypothesis, well that’s unfortunate. Presumably, other people do however (not the hitF/X, but much of the other data brought up here). And, if there was something to be found regarding pitchers controlling HR/FB, I would guess they have looked given how much discussion HR/FB has gotten in the statistical community.

        “Statistics is more limited than you apparently want to admit, especially when we’re not actually talking about truly random independent events.”

        Given how well you’ve displayed your knowledge of statistics in our discussion, I’m left to only to assume that you think methods of statistical analysis are more limited than they actually are.

        “Have fun living in your utopia where statistics perfectly capture everything in baseball. I’ll be living in the real world.”

        WOW, what a zinger…. I guess its this kind of BS about telling me in some unquantified since that stats are limited and getting in some insult is what will prove your point of view logical? Are you this naive?

        “I simply tried to present what I believe to be the case, with some reasoning behind why I’ve come to believe it ”

        Yes, and you’ve pushed this pet hypothesis quite hard while simultaneously insulting me and brushing off data that doesn’t fit your hypothesis. I suppose that is something that those who “believe” something do however.

        “You’ve just continually bashed me because I can’t prove it statistically. ”
        I’ve actually been quite reasonable with you. It was you, not I that broke off into appeals to ridicule and bald face insults.

        “That wasn’t what I set out to do. Sorry I can’t prove something that isn’t possible with the currently available data.”

        Exactly, you can’t prove it. You can point to few anecdotes that support your case, I can point to some that don’t. Yet somehow I’m this close-minded, statistical hawk, you need to ridicule in order pump up yourself worth or god knows what.

        You can’t prove it, your anecdotes are week and easily countered, get over it. There is no need to make yourself look like a child by throwing around insult after insult because I don’t believe you.

        Vote -1 Vote +1

    • djw says:

      But the appropriate way to approach this isn’t to merely identify outliers and wave your hands at them. The approach would be to identify how many outliers we should expect to see as far out as League/Betencourt given binomial distribution. If the number is similar, then their existence doesn’t demonstrate anything more interesting than 14 straight NFC coin tosses.

      Vote -1 Vote +1

      • wobatus says:

        I’m not going to be the person to do it (the research you discuss). I pick the outliers and someone else has to satisfactorily explain them away to me.

        I am not pointing at betancourt and league and saying it proves pitchers have control over it, but I don’t think giving up a lot or suppressing, year after year, is the same as a coin toss either. And i really don’t think that if there as many outliers as you’d expect to see it refutes the proposition that some pitchers may have control as well. The number of outliers you expect might simply randomly overlap with the number that actually do have some degree of control.

        I have read elsewhere, maybe Harball Times, that groundballers may have more homers to flyball than average. Their flyballs allowed tend to be mistake pitches anyway. League is at an extreme. 18%, and consistently high (with small annual reliever samples you’d expect more variance, not less, same with betancourt in the opposite direction). Webb is at 13.2% (even 11.9% on the road, which is only a tad high), Lowe is at 12.7%. But looking at the leaders in groundballs, it does seem completely random. So that scotches that theory (or at elast I won’t bother looking into it more).

        Outliers again? maybe. I don’t know. I’ll let someone else do the research. I assume most people here haven’t doublechecked the research on home runs to flyballs done by others and just assume that it is so. I lack the skill set. But from what i have read some pitchers may have some tiny control with babip, there may be some tiny amount of clutch hitting ability, the Angels do seem to have some ability to win more than pythag would suggest (we don’t know why, at least that’s what a fangraph article mentioned), and i am just not as convinced by the argument as far as hr/fb. And i am well aware League, Betancourt, Johan Santana could be outliers. Not sure how easy it is to prove that they in fact are. I doubt it, but i don’t have definitive proof.

        But like Alton benes said, i don’t need the weatherman to tell me it’s raining, i just stick my head out the window. Kidding. OK, next for my collection of Potter Stewart quotes.

        Tell ya what. i will go toss batting practice to Josh hamilton. I saw that HR derby at the break a couple of years ago. I bet my home runs to flyballs allowed would be pretty damn high. I imagine that what we are saying is major league pitchers give up 11%, not out of shape desk-jockeys.

        BTW, I am also not arguing that Haren is homer prone. Just that in general I have not been as convinced by the hr/fb being completely random (not talking about park effects).

        Burton Malkiel famously argued that stock price movements were random, that you can’t beat darts thrown at the WSJ quotes. of course, Warren Buffett and a few quants are just fabulously wealthy outliers.

        Look at the top 10 for fewest homers allowed per fb this year. Santana, Liriano, Josh Johnson, Danks, Kershaw, Verlander are in there (with Vargas, Fister, some others). Look at the bottom 10. Hamels, Haren and Shields are the only 3 that approach the general rep of the top guys in the top 10 (actually, they evidently don’t, maybe due to the homer rates). Maybe the high home run rate is what suppresses the rep, but the rest of the botom 10 is fairly putrid: Correia, Moyer, Bannister, Rodrigo lopez, Nick Blackburn, Wolf, Millwood. I am not sure that’s all just bad luck, as their entire careers have been fairly mediocre, some stretches for Milwood and Wolf and Moyer, of course.

        Again, all incredibly unscientific. Apologies. I still can accept babip randomness more than hr/fb. I think some guys have an effect, can’t prove it and don’t think the coin flip has satisfied me as far as the outliers. I am a stubborn hidebound old man. Now get off my lawn.

        Vote -1 Vote +1

      • wobatus says:

        BTW, what are the odds that, of all the seasons since 2002, there would be 5 seasons of guys with 18% or higher hr/fb (actually 4, but I am tossing in League’s career as a season), and they’d all have 50% or higher gb rates?

        Vote -1 Vote +1

      • wobatus says:

        Wait, I forgot, all those seasons, because those guys are groundballers, are smaller samples of flyballs. maybe that’s why they are outliers.

        Vote -1 Vote +1

      • djw says:

        I assume most people here haven’t doublechecked the research on home runs to flyballs done by others and just assume that it is so.

        So are you actually saying you thing the many different sabermetric studies that come to the conclusion that by and large HR/FB rate is not a repeatable skill are cooking the books, and no one has bothered to point this out? That’s a pretty strange and outrageous view, but I’m certainly not in a position to refute it.

        I saw that HR derby at the break a couple of years ago. I bet my home runs to flyballs allowed would be pretty damn high. I imagine that what we are saying is major league pitchers give up 11%, not out of shape desk-jockeys.

        Sure, but how is this relevant to a the population of pitchers who might reasonably get playing time in the majors. I’d be a pretty big outlier in height if I were Laotian. But I’m not, so it’s irrelevant if we’re talking about Laotians.

        Your comments on this subject don’t actually contain anything beyond hand-waving, speculation and non-sequitors.

        Vote -1 Vote +1

      • wobatus says:

        I am not saying that a bunch of people are cooking the books. I am saying that people are accepting it as a given. I am sure that the research says what it says, but I doubt, and like you and most folks here I am not in a position to say one way or the other, that there is some definitive answer, based on what we know, to the question of whether home runs to flyballs are completely random. In that same vein, a lot of people accepted UZR, I questioned it in regards specifically to Jason Bay and his season in fenway last year, and lo and behold adjustments were made to confirm my suspicions were correct, even though i didn’t have the data or the skills to confirm my suspicions on my own (btw, i think that is a credit to fangraphs and the folks that made the adjustments and do have those skills.

        All i can do is point to areas that make me scratch my head and say “huh” and maybe others will have the skills or data (well, I suppose I could acquire them, but I’m a lazy SOB) to say one way or the other whether there is anything to what I am scratching my head about.

        And you are right, my post was a lot of non-sequitors, etc. I just jot down what pops in my head sometimes. I am bad at self-editing.

        I get your point that I am not someone who might reasonably get playing time in the major leagues. There’s a pretty big spread between my abilities and any major leaguer. But there’s a pretty big spread betwen Ollie Perez and his current abilities than people that we might reasonably expect to pitch in the major leagues (absent a 36 million contract the Mets are trying to avoid having to completely eat). Ollie’s only giving up hr/fb at a 14% rate (with citifield as his home park). That’s not a huge jump, but it certainly seems conceivable to me that someone like that may have devolved to the status of batting practice (absent his customary wildness being a little less than ideal for batting practice), and perhaps homer prone.

        To concede that there’s any effect at all based on the quality of the pitcher (me, or Ollie Perez, versus Francisco Liriano, say) means we are just talking about degrees. People that were major league caliber can lose skill slowly (aging), or sometimes suddenly (injury, for example), and they sometimes are allowed to keep playing until their ineffectiveness is assured. The worst pitcher in baseball, over a given stretch, is going to have a pretty wide spread in his skill level from Roy Halladay or Cliff Lee. So while hr/fb may be mostly random, I do suspect that at a certain level there’s some pitcher causation, independent of his flyball tendencies.

        I still like the league example, although it could be easily random, i understand. Career of 18.3%, always in the teens, only 4 pitchers with seasons that high since the data started being collected, all of those seasons by guys with a well higher than average groundball rate. I know that therefore the samples are all small, making the highs (and lows) more likely in this group. Betancourt on the opposite extreme. I am sure someone can calculate what the odds are of League being so high, year after year, and Betancourt being so low, year after year, and i am sure it is within the realm of what we’d expect by randomness. But that may simply dovetail with the distribution we’d expect from there being some small (or maybe not so small) skill set involved, for some pitchers. But I’d think the odds of there being 5 seasons, or equivalents to a season (In league’s case, his career), being above 18% hr to fb over the last 8 years, and each of them being from heavy groundballers, being simply a random coincidence, is probably low. maybe i will teach myself how to figure that out. Jeeze, that sounds like a pain in the ass.

        It simply happens too often (there’s no such thing as clutch-wait, there’s some tiny effect, pitchers can’t control babip, wait, there may be a tiny effect for some, the Angels outperforming their pythag expected record is random, wait, maybe it isn’t completely random, Jason bay was awful last year in leftfield, wait, maybe he wasn’t), for me to just blithely accept that HR/FB is definitively random.

        Vote -1 Vote +1

      • wobatus says:

        DJW, I thought some more about your response to my comment about what would be the hr/fb ratio if I were allowed to pitch (or the guy who tossed batting practice to hamilton in the 2008 all-star game).

        As you say, I am not among the reasonable universe of major league starters.

        So I thought about guys who are at the tail end of their careers. They eventually wash out and retire. Or guys that simply never make the major leagues. Maybe Clayton tanner does, maybe he doesn’t, but he gave up a lot of homers last year (not sure how many hr/fb).

        Perhaps in the universe of major league pitchers there’s almost pure randomness due to the very selectiveness of the group. If you are in that group, you are very unlikely to be any better than anyone else at suppressing your hr/fb rate. Perhaps the greater your distance from inclusion in that group, the more you would move away from a high degree of randomness. Randomosity?

        Anyway, if one looks at guys who retired after 2008, tiny samples, but you have matt Morris at 16.7%, Eric gagne at 20.4%, Tom Glavine at 18.4%. Now, the rest of their peripherals were sliding as well, although Glavine and Morris had awful years with the peripherals before and been more succesful.

        So, is it the case that guys like that are just running out of steam, and guys that lose the ability to suppress the hr/fb ratio are removed from the pool?

        Certainly you can’t tell from my cherry-picked examples (and there were other guys who have retired and their hr/fb rate stayed low, with performance suffering elswhere).

        But, if these guys high HR/FB rates were just random flukes, it may be that teams were giving up on them too soon. Given the peripherals that likely is not the case-they sucked, were hurt, etc. Still, there will likely be examples of guys that teams give up on too quickly due to a random spike to HR/FB, if the randomness of hr/fb extends to the edge of the major league talent pool and beyond.

        Sorry for yet another inarticulate non-sequitor, but it certainly is intriguing to think about the implications of randomness and is there a point at which, ability wise, things become less random.

        Vote -1 Vote +1

      • djw says:

        It wouldn’t exactly blow my mind or shock me if it turned out that at the very bottom of of the talent pool had a HR/FB rate somewhat higher than the rest of the league. (I recall reading something in the hardball times or some such place that determined Knuckleballers were different, too). Still, I don’t see much value in pointing at outliers that confirm my suspicions, or speculating in general. You take your intuition a lot more seriously that I would take me; it’s a question that can only really be determined by research.

        Vote -1 Vote +1

      • wobatus says:

        I don’t take my intuitions or speculations seriously at all.

        The idea, though, is somehow that people are refusing to accept randomness. That’s not true of everyone questioning the extent of the randomness (and are not especially conversant in probability). I get that it is mostly random. The value of the non-random aspect, if any, may not be worth teasing out, and maybe it isn’t all that possible to establish. Maybe it’s hard to say definitively that League and betancourt are not outliers. They fall within the expected distribution of guys who have strings of successive seasons on one end of the average or the other. Going by the coin flip example, I guess it could be 14 years in a row, and it wouldn’t prove anything.

        BTW, I never really questioned in any way the randomness of babip allowed, maybe because with 9 guys in the field I always thought that hit balls falling in is very largely related to luck and fielding skill.

        And while I accept that, I am very curious as to why batters have a measure of control and pitchers mostly don’t. Why would that be? I don’t know.

        Vote -1 Vote +1

    • AJS says:

      @Wally,

      I’m failing to see the distinction you draw. Obviously we can’t know what any pitcher ever will do. But we have a pretty robust data set in terms of what baseball pitchers have done so far. And the fact that some individual pitchers, year after year, seem to put up higher than average HR/FB and other pitchers seem to put up lower than average HR/FB can’t simply mean nothing. Remember, we thought that pitchers had no control over BABIP. Now we’ve found they have some control. Why couldn’t the same apply to HR/FB?

      Yes, I can’t prove that it does apply. But if you can’t prove it doesn’t applied (or, since you claim you can’t prove negatives — which, by the way, you clearly can, e.g. I’m sure you’d agree scientists have proved the earth is not flat — if you can’t prove it hasn’tapplied in the data set we have), why should we go with your position as the default position? Shouldn’t the default be neutral?

      Vote -1 Vote +1

      • Wally says:

        AJS,

        “But we have a pretty robust data set in terms of what baseball pitchers have done so far. And the fact that some individual pitchers, year after year, seem to put up higher than average HR/FB and other pitchers seem to put up lower than average HR/FB can’t simply mean nothing.”

        Sure it can. We have a lot of pitchers in MLB, some could randomly put high or low numbers.

        “Remember, we thought that pitchers had no control over BABIP. Now we’ve found they have some control. Why couldn’t the same apply to HR/FB?”

        Just because that was true with BABIP doesn’t mean it is true of HR/FB. Remember, I’m not saying HR/FB is completely random, I’m saying we don’t know that it isn’t.

        Vote -1 Vote +1

      • Alex says:

        “Remember, I’m not saying HR/FB is completely random, I’m saying we don’t know that it isn’t.”

        Maybe I missed something, but it sure seems like you’re arguing that its completely random when you’re telling me I’m wrong for believing that it isn’t.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “Maybe I missed something, but it sure seems like you’re arguing that its completely random when you’re telling me I’m wrong for believing that it isn’t.”

        Its the “believe” part that causes problems. You’re free to believe what ever you want, of course, but you have not proof that what you believe is actually true.

        So your belief isn’t exactly correct or incorrect, it is simply not supported by the current evidence. You may ultimately be proven right or wrong, but that’s not the point here.

        Vote -1 Vote +1

      • Alex says:

        Your opinion isn’t supported by the current evidence either. It just hasn’t been excluded. My opinion hasn’t been excluded either. Its like arguing with a damn brick wall.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        My “opinion” as you seem to want it to be, even though it is definitely not an opinion, is backed by all current evidence, because it IS all current evidence. In fact, that’s why it is not an opinion.

        Talk about a brick wall…

        Vote -1 Vote +1

    • Wally says:

      Alex,

      “Its a very conservative estimate based on the fact that the chance of the BABIP data being random is 1 in 2200.”

      Not true. Because as Dave mentioned you had some 350 possible pitchers to hit that 1/2200.

      “If the chances of those HR/FB rates is below 1 in 455 feel free to prove me wrong.”

      The general practice of Alex:

      I make up something that sounds good to me, you prove me wrong.

      Vote -1 Vote +1

      • Alex says:

        I thought you understood probability and statistics. The probability of having both BABIP and HR/FB rates at that level (assuming independence of course, which we should if we think this is all random) would therefore be the probability of the first multiplied by the probability of the second.

        1/2200 * 1/455 = 1/1,001,000

        Tell me where that math is wrong.

        As I said, I don’t have access to the easy access to the data necessary to do the exact calculations, but do you really think the chances of Haren’s HR/FB rates being as high as they have over the last 6 months are less than 25% of the chance of his BABIP being that high due to randomness over the past 4 months?

        Vote -1 Vote +1

      • Wally says:

        But if you have several hundred chances to observe something, then the random chance of observing an event that would be expected to happen 1 in 2200 times in an individual case is greater than 1 in 2200 times for the entire population. Right? Like I flip a coin, and its a 50:50 that its heads right? But what if I flip the coin 10 times? What are the chances I get at least one that comes up heads? That’s the situation you are in.

        Vote -1 Vote +1

      • Alex says:

        I’m not sure I’m quite following your point.

        I didn’t say the actual chances of observing an event like that would be 1 in a million, I said that’s about what the chances are that Haren’s current run of BABIP and HR/FB were both due entirely to luck. I’m following the exact same reasoning Dave used in when he quoted the 1 in 2200 number. He’s not saying that’t the chances of ever seeing that, he’s saying that’s the chance of it randomly happening in this particular case. I’m saying the chances of both happening independently due to luck in this specific case is 1 in a million.

        Vote -1 Vote +1

      • BIP says:

        But you’re capitalizing on random variation after the fact.

        Vote -1 Vote +1

      • Alex says:

        Not really. If I had just brought up some random guy maybe you’d have a point, but this just happens to be the case with the example used in this article. I’d say the chances of both of these things happening simultaneously and independently is so small that it points to them actually both being related in part to an underlying problem with Haren right now. Sure luck is probably exacerbating that underlying problem, but it seems a lot more reasonable to me than just saying he had this terrible of luck in two (supposedly) unrelated categories at the exact same time.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        Right, if it were 4 out of the last 6 months, you’d still count it. Same with 4 out of 5 over .345 BABIP. You have a huge unquantified fudge factor by doing this ex post facto probability calculation.

        “I said that’s about what the chances are that Haren’s current run of BABIP and HR/FB were both due entirely to luck. ”

        That’s not true. You’re assuming you only had once chance at observing this “random event,” you actually have many, many more than that, due to the number of players in the league and squishy thresholds and end points you defined after the fact.

        Vote -1 Vote +1

      • BIP says:

        But the article is based on observation after the fact. If it were Cliff Lee with these HR/BABIP troubles, the article would be about Cliff Lee. If it were Ubaldo, it’d be about Ubaldo. Every pitcher in the league had a chance to put up these unusual numbers, and as far as I can tell, it just happened to be Haren who did.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “Not really. If I had just brought up some random guy maybe you’d have a point, but this just happens to be the case with the example used in this article.”

        But the whole reason he’s in the article is because he’s this random guy this happened to. You’re just kicking the can down the street.

        “I’d say the chances of both of these things happening simultaneously and independently is so small that it points to them actually both being related in part to an underlying problem with Haren right now.”

        Prove it. Prove the actual chances. If you don’t know how to do that, learn.

        Vote -1 Vote +1

      • Alex says:

        This really isn’t worth it. I just carried the calculation that Dave did in the original article to another level. The chances of Haren having this sort of run of bad luck in both categories without there being some sort of underlying connected reason (like balls being hit harder) is quite small. Now it could well be due to luck, as there have been millions of 6 month stretches over the history of baseball, but that just doesn’t seem likely the most likely explanation when we have scouts saying his stuff is flatter and pitch f/x saying a bunch of his former slider have turned into flatter cutters. I’ve never once claimed that to be a fact, I’ve just advanced it as an alternative hypothesis.

        Wally,

        I don’t know what they hell you mean by this…”Right, if it were 4 out of the last 6 months, you’d still count it. Same with 4 out of 5 over .345 BABIP. You have a huge unquantified fudge factor by doing this ex post facto probability calculation.” That’s why we use something like the binomial distribution that allows us to calculate the probability of x successes in n trials. What fudge factor is their in calculating the probability of 4 (x) months over 16.5% BABIP (successes) over a periond of 6 (n) months. For someone that supposedly makes his living off of statistical methods, you seem to misunderstand a lot of the basics.

        To the rest of you posts, I’m not assuming I had only one chance of assuming this random event. I’m calculating the probability of this particular even happening due entirely to luck. Its not different than the probability calculation Dave did in the article. If I said the chances of anyone ever doing what Haren has done were 1 in a million you’d have a point. All I said was the chances of this particular event were 1 in a million or worse.

        As for this little gem “’I’d say the chances of both of these things happening simultaneously and independently is so small that it points to them actually both being related in part to an underlying problem with Haren right now.’

        Prove it. Prove the actual chances. If you don’t know how to do that, learn.” as I’ve told you, I don’t have easy access to the necessary data. I’m also not sure how you can “prove” how one possibility is more likely than another, especially when one possibility doesn’t have a probability distribution associated with it. I would point to what I’ve seen on pitch f/x and what scouts say, but that doesn’t count to you. However I’ve estimated the chances of both happening without some connected underlying reason at 1 in a million. Do you really believe that’s a more likely scenario than his slider/cutter being more flat?

        Vote -1 Vote +1

      • Alex says:

        Also, let’s put the number of possible 6 month pitching runs we see into perspective here. Its not like we should expect to see a 1 in a million event take place relatively often just because their are a lot of pitchers with a lot of 6 month stretches. Let’s just assume 12 pitchers per team (we won’t even limit it to starters), so that’s 360 pitchers a year. There are 6 months per season, so that’s 6 possible 6 month stretches ending per year (previous May to current April, previous June to current May, etc.), so that gives us just 2,160 total 6 month stretches ending each season. So yeah, it shouldn’t be surprising to see a pitcher have the run of BABIP or HR/FB luck Haren has had this year every single year. However, the chances of seeing one pitcher with both runs of bad luck over the same stretch is a pretty rare event.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “That’s why we use something like the binomial distribution that allows us to calculate the probability of x successes in n trials. What fudge factor is their in calculating the probability of 4 (x) months over 16.5% BABIP (successes) over a periond of 6 (n) months. For someone that supposedly makes his living off of statistical methods, you seem to misunderstand a lot of the basics.”

        The problem is that you’ve created an arbitrary number of thresholds and time periods that aren’t necessarily predictive of anything, they just help you highlight this one random event. What if changed the criteria only slightly? How many possible pitchers do we exclude and then include? Basically, does it matter that it was 4 out of 5 at .350? What if its 3 out of 5 at .360 and 2 out of 5 at .330? How about 2 at .370+ and 3 at .320? Etc. You’ve defined specific end points for this one particular case that don’t necessarily have ANY meaning. You could further define this event of Haren’s down to having impossibly small chances quite easily.

        His BABIP and HR/FB going from most recent to furthest away by month are
        .396/17.2
        .343/5.6
        .370/18.5
        .310/18.2
        .352/12.8
        .301/17.4

        Within those numbers is a set of combination that only Haren has ever achieved. Lets say he need 4 out of 6 months to be over .340, but 2 need to be the most recent months, and the other two need to be between .300 and .310. Then he needs to have a 4 out 6 to be with HR/FB rates above 17%, and again 3 of them had to come in the last 4 months and one had to be below 6%, and the other had to be above 12.7%.

        I took this to an extreme, quite obviously, but it should illustrate why just looking at purely random event is not instructive. One can draw up arbitrary end points and thresholds to tell them just about what ever they want. Particularly when you hand wave away data to the contrary. Like why his BB/9 rate isn’t effected if he’s losing control of certain pitches. Or why his LD% isn’t effected if he’s giving up more hard hit ball. Or why his K rate has increased even if his slider is now effectively a split, taking away a weapon.

        You’re cherry picking data all over the place to support your conclusion.

        Vote -1 Vote +1

      • Alex says:

        How did I cherry pick data all over the place? I’m looking at the two pieces of data used in the article. I haven’t chosen anything myself. You’d be free to look at the probabilities of any number of x successes at whatever level you want to determine is successful in n tries and the binomial distribution will give you an idea of how rare such an event happening due to random luck should be (though you can’t jump around to different levels while doing so as you seem to suggest, though I assume you know that). I’m only doing what Dave initially did. I’m not saying its the best way to go about things or anything like that (as I’ve argued elsewhere, the binomial distribution isn’t necessarily the best model for this sort of stuff), I’m just saying that using Dave’s reasoning this is an incredibly bad run of luck for Haren. Like historically bad. Like shouldn’t have ever happened in baseball history bad. Unless of course, you know, there is some underlying correlation between Haren’s bad HR/FB rate and BABIP, which would make this way less unlikely.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        Dave used those thresholds to prove an entirely different point from yours. And yes, even those were cherry-picked for no good reason other than as an illustrative example of, generally, what NOT to do.

        You’ve taken them and run with them in exactly the opposite direction. Yet, they are still cherry-picked…second hand cherry-picked? Maybe so, but cherry-picked none the less.

        “Like historically bad. Like shouldn’t have ever happened in baseball history bad. ”

        See, this is you failing to get the point yet again. You can cherry pick just few thresholds and windows and get something that sounds extraordinarily impossible, when something quite similar actually happens all the time. That’s why I brought up the point regarding running the numbers with similar combinations. Plus, you’re ignoring the huge amount of opportunities you have to find this event. Even 1 in a million isn’t a very big number here. We have ~100 years of baseball, and say 200 pitchers per year. That’s 20,000 pitching years. I haven’t even introduced the sliding 6 month window yet. Since 6 months is basically a season we have 120,000 pitching 6 month windows. Then we get into issues of the arbitrary cut offs. Like what’s the real difference between 5 of 6 at .350+ BABIP (which Haren didn’t even qualify for BTW) and 5 of 6 at .345? You could slide these scales around in any number of ways without making a real difference in terms of total performance over the 6 month span, but making huge differences in number of players you include or exclude. This is your fundamental mistake when making claims like this is historically bad.

        “Haren’s bad HR/FB rate and BABIP, which would make this way less unlikely.”

        This is something that would be easy enough to test right? A correlation between BABIP vs. HR/FB. Just looking at 2009 though…not much there.

        Vote -1 Vote +1

    • Rich says:

      “. It would be quite foolish to ignore the best tool for the job.”

      And it would be even more foolish to ignore the fact that “the best tool for the job” still isn’t a very good one. In terms of baseball stats, we’re working with sharpened rocks here, not a CNC.

      Vote -1 Vote +1

  11. Jared says:

    I loved this article and I think it’s worth mentioning just how many factors can contribute randomness to a pitcher’s results. If BABIP is mainly a function of LD%, there is still the matter of HR/FB (which, if not totally random, is also partly a function of something the pitcher can’t control, like park conditions). And other factors that can further drive a pitcher’s results away from what his skill alone would dictate obviously include the performance of the batters they’ve faced on the given days they’ve faced them (which on its own fluctuates partly randomly) and the performance of the defense backing them on the given day (ditto to that being partly random).

    It’s very hard for people to grasp that a player’s results can undermine his performance over the course of an entire season (or multiple seasons), much less in a single game.

    Vote -1 Vote +1

  12. xeifrank says:

    One of your better articles. Good job.
    vr, Xei

    Vote -1 Vote +1

  13. dave says:

    this is quite a concpet, and in my opinion goes well beyond baseball. Religion…

    Vote -1 Vote +1

  14. Alex says:

    Alternatively, its wrong to assume it is in fact totally based on luck. Sure that could be the case, but I think a lot of the people advancing that possibility are simply getting tired of some saber inclined individuals who seem to claim that any such occurrence is due to luck.

    As for the the probability you calculated for Haren posting the BABIP he has recently, I think it may be more informative to find the probability of that sort of run of BABIP and his extreme HR/FB rate as both seem to point towards the same problem. Personally, it just seems far more likely that he is in fact giving up more hard hit balls, than all of this just being luck based. That’s not to see luck isn’t playing some part, but rather that its not the only cause.

    One other problem I see at times is that people want things to be more black and white than they are. It seems that every single time we see someone with a high BABIP for a period of time that goes back to normal eventually, people want to chalk it up to luck and in many cases it is. However, you can’t ignore that sometimes it was simply a pitcher who had a mechanical flaw, causing him to leave the ball up and get hit harder, who eventually fixed the flaw and got back to his old self. It’d be nice if everything was always black and white, but with something like baseball (as opposed to coin tosses) its really not that simple.

    Vote -1 Vote +1

    • Wally says:

      Alex,

      “Personally, it just seems far more likely that he is in fact giving up more hard hit balls, than all of this just being luck based.”

      If that’s the case, way do we not see his LD% going up as well?

      I agree its reasonable to think it might not all be random, and that we gain knowledge in all kinds of fields by trying to explain what might seem random at first, but its absolutely incorrect to assume it is not random in the absence of evidence that the player has some sort of control over these stats.

      Its the difference between coming up with a hypothesis you want to test and actually having the evidence to prove the validity of your theory.

      Vote -1 Vote +1

      • Alex says:

        Because LD aren’t the only hard hit balls? Maybe, he’s leaving the ball up more leading to more hard hit FB that are going for hits (perhaps because they’re flying over the OFs heads)?

        It is not incorrect to assume that its not all random in the absence of evidence that players have some control over these stats. Not being able to prove there is no correlation is not the same thing as proving there is no correlation. I’d advise you to take a look at Bill James article entitled “Underestimating the Fog.” Many of the newer people to sabermetrics seem to have glossed over this point. Just because we can’t prove something exists with the numbers doesn’t mean we’ve proved it doesn’t exist. These are very different concepts.

        Somethings we’d like to prove we simply can’t because the data isn’t good enough. The people who really understand sabermetrics get this, its the newer converts that seem to overestimate the abilities of the models we’re using. There was a time when everyone thought every pitcher should regress to a .300 BABIP. Nowadays, the guy who came up with theory initially has publicly stated he was wrong. Do you really not believe the same thing is possible with something like HR/FB?

        Vote -1 Vote +1

      • Wally says:

        “Because LD aren’t the only hard hit balls? Maybe, he’s leaving the ball up more leading to more hard hit FB that are going for hits (perhaps because they’re flying over the OFs heads)?”

        So, he’s giving up hard hit balls that aren’t LDs? That’s your hypothesis? Seems rather unlikely that hard hit balls would show up everywhere except LDs.

        “Not being able to prove there is no correlation is not the same thing as proving there is no correlation.”

        Again, you’re arguing from ignorance and creating a false burden of proof. You’re making a claim that pitchers DO control HR/FB, you have to prove it. I simply do not believe you’ve done so adequately. Thus, I’m not comfortable rejecting the null-hypothesis.

        “Just because we can’t prove something exists with the numbers doesn’t mean we’ve proved it doesn’t exist. These are very different concepts.”

        Right, and I’m not saying it certainly does not exist. I’ve specifically said several times now that no one has proven it does exist. I suppose you’d also argue that there is a flying spaghetti monster because no one can prove it does not exist? This is essentially the argument you are making Alex. It should be clear why science and logic do not operate by attempting proving things do not exist.

        Vote -1 Vote +1

      • Alex says:

        You’re missing the point. Look at the rate of 2B that Haren is giving up this year. Its far higher than it has ever been previously in his career. Its not just that a bunch of bloop singles are falling in against him and that’s whats causing his high BABIP, he’s giving up more XBH which backs up the idea he’s leaving balls up that are getting tattooed. And that data you’re looking at isn’t perfect. Remember the raw data has 2 different classifications of fliners that are getting pushed into LD and FB data. Its entirely possibly a higher percentage of his FB data is fliners, which would back up what I’m saying.

        “Again, you’re arguing from ignorance and creating a false burden of proof. You’re making a claim that pitchers DO control HR/FB, you have to prove it. I simply do not believe you’ve done so adequately. Thus, I’m not comfortable rejecting the null-hypothesis.”

        I’m not arguing from ignorance man, you’re just ignorant of how these things work. Like I keep saying, go read “Underestimating the Fog.” Its possible that while I’m correct, I can’t mathematically prove it because of the volatility of the data. This problem is exacerbated by the fact that we don’t have a ton of data to start with anyway, as Fangraphs Batter Ball data only goes back 8 full seasons. There will be a much better chance of proving it once more data is readily available.

        “Right, and I’m not saying it certainly does not exist. I’ve specifically said several times now that no one has proven it does exist. I suppose you’d also argue that there is a flying spaghetti monster because no one can prove it does not exist? This is essentially the argument you are making Alex. It should be clear why science and logic do not operate by attempting proving things do not exist.”

        Yeah, you’re leaving yourself a little wiggle room, but you seem completely unwilling to entertain the idea seriously until it can be mathematically proven and that just isn’t how you should approach something like this. It is too hard to prove things mathematically to simply exclude everything that can’t be proven mathematically.

        And seriously, nice reductio ad absurdum comparing my argument to the flying spaghetti monster. There is data that points towards pitchers having some control over HR/FB rates, there is no data pointing to the FSM existing. Professional scouts certainly believe control over HR/FB exists based on their observations. Please tell me about the observations pointing to the existence of the FSM.

        And if you believe that science and logic don’t try and prove things don’t exist, you’re clearly much more ignorant than you realize. Much of scientific pursuit is based on trying to prove that theories are wrong. If you can prove a theory is wrong, then you have to move on to a different theory. No one has proven that evolution is right for example. Instead, people have failed time and time again to prove it wrong, therefore the preponderance of evidence points to the theory of evolution being correct, even though it cannot be proven.

        Do you not see how that’s similar to my argument for HR/FB? I can’t prove it correct, so you tell me I must be wrong because I don’t have proof. The theory of evolution hasn’t been proven correct, but you still believe in it don’t you, based on observations and experiments. Those aren’t the same thing as formal proof, which really only applies to the science of mathematics.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        I can hardly keep up with all these unfounded assumptions.

        “Look at the rate of 2B that Haren is giving up this year. Its far higher than it has ever been previously in his career. ”

        Ok, its is significantly higher? Particularly once adjusting for a ~.050 increase in BABIP? Looking at 2B/H, the numbers seem pretty consistent around 21%. And his 2010 is lower than his 2008. I don’t see much here.

        “ts entirely possibly a higher percentage of his FB data is fliners, which would back up what I’m saying. ”

        That would be interesting, why don’t you get that data?

        “Its possible that while I’m correct, I can’t mathematically prove it because of the volatility of the data.”

        Yes its POSSIBLE, not proven. How many times do I have to say it while you keep insulting me and calling me ignorant of statistical methods with which I probably have an order of magnitude great knowledge and experience with than you? This is my job. And you’re telling me I don’t know how things work, because well, you want to keep telling me pitchers have control over HR/FB without ANY data to prove it. Sorry, Alex, I can assure you, it is you that fails to understand.

        “Yeah, you’re leaving yourself a little wiggle room, but you seem completely unwilling to entertain the idea seriously until it can be mathematically proven and that just isn’t how you should approach something like this.”

        Yes it is. You can’t prove it. That’s all there is. You can come up with a few anecdotes, and I’m sure if I cared to look I could come up with a few counter-anecdotes. But all that wouldn’t prove anything one way or another. So right now, the only knowledge we have is that HR/FB is not known to be under the control of a pitcher. Strictly speaking, that’s all you can say.

        “Much of scientific pursuit is based on trying to prove that theories are wrong. ”

        This is different than proving something does exist and I suspect you know that and have become disingenuous. If I want to disprove F=ma, I would have to find a situation where that theory breaks down, and come up with another theory that fits all known data. I’m still proving a positive, not a negative. That is much different from coming up with evidence that the flying spaghetti monster does not exist. Also, logically, you’re making the claim, you prove it. You don’t make a claim and wait for someone to disprove you. Again, this should be obvious.

        “It is too hard to prove things mathematically to simply exclude everything that can’t be proven mathematically. ”

        This is also not a logical argument. Something being difficult to prove doesn’t make not proving it any more valid.

        “There is data that points towards pitchers having some control over HR/FB rates, there is no data pointing to the FSM existing. Professional scouts certainly believe control over HR/FB exists based on their observations. Please tell me about the observations pointing to the existence of the FSM.”

        All data is not valuable data in reaching any particular conclusion. If your data is absolutely meaningless, such as what scouts say without any sort of recording of data or statistical analysis, than you might as well have nothing. Its the equivalent of saying I saw a piece of spaghetti fly by my window, so there is evidence for the FSM. Even in a best case, your data is simply (weak) suggestive evidence, not proof, of your conclusion. Which again has been my point all along, and you seem to wish to obscure this fact with all this talk about how hard it is to actually prove it. But it doesn’t matter if its hard, or impossible, if you can’t prove it, you still can’t prove it.

        “Instead, people have failed time and time again to prove it wrong, therefore the preponderance of evidence points to the theory of evolution being correct, even though it cannot be proven.”

        No, you don’t prove that something isn’t wrong. This is a double negative. You’re creating evidence that fits the current theory. Its still proving a positive, not proving a negative. You’re trying to play games with words without knowing what is actually done. Proving a negative is like of “pitchers don’t control HR/FB” or “there is no FSM.” This is nearly impossible in science because you will always run into the problem of you don’t know what you don’t know. If you’re in science, you’d know this. The vast, VAST majority of papers are showing positive results, because negative results could mean litterally anything. Just because I don’t find a correlation between HR/FB with all the different things I think a pitcher can control, there is still the possibility that I don’t know of the right predictor variable. Which is essentially your point. If we had better data, like hitF/X maybe we would be able to prove pitchers have more control over these different types of balls hit. But the absence of data to prove something doesn’t mean that thing is not true.

        “I can’t prove it correct, so you tell me I must be wrong because I don’t have proof. ”

        I never said you must be wrong. Don’t create strawmen.

        “Do you not see how that’s similar to my argument for HR/FB?….The theory of evolution hasn’t been proven correct, but you still believe in it don’t you, based on observations and experiments.”

        Did you just try to say that you have experiments or observations that show HR/FB is under the pitchers control? I’d be interested to see those.

        Vote -1 Vote +1

      • AJS says:

        @Wally — I’ll post this again down here in case you didn’t see it above.

        “And you’re telling me I don’t know how things work, because well, you want to keep telling me pitchers have control over HR/FB without ANY data to prove it.”

        But has anyone proven pitchers don’t have any control over HR/FB? Or, more to the point, that individual pitchers can’t control HR/FB? Why should Alex have to prove the affirmative when you can’t provide proof for the negative? Why is your position considered the default position?

        Vote -1 Vote +1

      • Alex says:

        You’re clearly so much smarter than I am. I guess I’ll just have to take solace in my mathematics degree from MIT.

        Clearly we’re arguing past each other instead of with each other. Its not worth the time from either of us to continue as we’ll keep going around in a circle. If makes you feel good to think you won this argument go ahead. Like I’ve said, the data as it currently exists isn’t good enough to prove my point. I hope once we get to the point where this is proven, you’ll realize how ridiculous you looked making this counter-argument based solely on the lack of mathematical proof.

        I haven’t been trying to argue that you’re ignorant of statistical methods. I’ve been trying to argue that you’re applying them incorrectly if you’re basically unwilling to seriously consider anything that can’t be proven statistically.

        Oh and the reason I don’t use the BIS fliner data is I don’t have access to it. My understanding is you have to pay them to get access.

        Vote -1 Vote +1

      • Bob says:

        Quick R squared analysis on the 82 pitchers who show up on the career leaders board. When running it on the HR/FB and GB%, the R2 is 0.16. When you adjust GB% by the IFFB% (GB% * (1-IFFB%)), the R2 is 0.17.

        Maybe when a batter lifts a sinker baller, the contact’s better. Or stronger batters are more likely to be able to lift (or try to lift) a sinker ball and thus carries the ball further.

        Or all of this is BS.

        Vote -1 Vote +1

      • Wally says:

        Bob, is the slope significantly different from zero? With a R^2 so low, I’d like to know that.

        Also, we know curveballs are easier to hit out given the same type of contact because they will leave the bat with backspin, giving the ball some lift. Maybe, we could look at CB% and HR/FB? Just a guess.

        Alex, I’m not applying stats anything like incorrectly. If you can’t tell me there is some sort of predictor variable for HR/FB is very unlikely HR/FB is something pitchers can control. And regardless of what you might find, the shear noise in the stat. leads me to believe any short term change in HR/FB is likely just randomness and not a true talent change.

        “you’re basically unwilling to seriously consider anything that can’t be proven statistically.”

        You’re creating a strawman if you think my entire argument against your hypothesis centers around the statistical studies for HR/FB. I’ve also brought up several raw stats that conflict with your anecdotes which you summarily handwaved away. For example, its rather foolish to think Haren is suffering from more hard hit balls, yet they don’t show up in LD% because ALL those hard hit balls are getting classified as fliners and thus FBs on fangraphs. You pointed out, for example, they could be fly balls that go over fielders head. Yet his 2B/H ratio is unchanged. Even lower than in several other years. So what ever is causing his BABIP increase, it seems to not be a specific type of hit that is being increased, but all of them. So, I’ve not simply said “you can’t statistically prove it,” I’ve also attacked the anecdotes which you believe suggest your hypothesis is reasonable. Your response to most of them has been to handwave them away with generally unlikely reasoning.

        Vote -1 Vote +1

    • scottj27 says:

      OK, so if Dan Haren’s had a run of worse pitching… then why are batters striking out more frequently against him?

      Vote -1 Vote +1

      • Alex says:

        Seriously? Because there is more to pitching than striking out batters. Oh, and the fact that his BABIP is so high actually helps raise his K/9. Since fewer of the balls in play off of him have gone for outs, it allows him the opportunity for more his outs to come on strikeouts. Also, did you ever consider the possibility that he changed his pitching style in order to get more strikeouts and the offshoot has been that he leaves more hittable pitchers up and in the zone, leading to more hits on balls in play and more HR?

        If you think its really as cut and dry as he’s striking out more guys, so he’s clearly pitching as well, then you’re completely missing the gray area.

        Vote -1 Vote +1

      • Nick Steiner says:

        BABIP, however, is usually very correlated with strikeouts. Guys who have good stuff and good control will usually induce weaker contact on balls in play.

        http://www.baseballprospectus.com/article.php?articleid=10281

        Vote -1 Vote +1

      • Alex says:

        I don’t have a BP subscription so I can’t read the meat of the article. The part I see though points to examples of guys with low K rates and low BABIPs though. General rules don’t prove things for specific cases either.

        Vote -1 Vote +1

    • jkrell1212 says:

      i feel like this article presents a false choice between ‘accepting randomness’ and asserting haren’s skills have deteriorated/hitters have adjusted to what he’s doing. why can’t it be a combination of the two?

      or am i completely missing it?

      Vote -1 Vote +1

  15. Joe says:

    To say balls in play is the equivalent to a coin flip, well, just doesn’t sit right with me.

    Vote -1 Vote +1

    • xeifrank says:

      They are both examples of a binomial distribution.

      Vote -1 Vote +1

      • Alex says:

        No, one is certainly a binomial distribution, while the other mimics it to a large degree. I think we should be able to agree that new ball in play data isn’t completely independent of past ball in play data, when we’re talking about the same pitcher.

        Vote -1 Vote +1

  16. highlander59 says:

    I made a very similar point in a post about Joba last night. The post in in the FG forum and is entitled “Ease Up on Joba”.

    The evolution of human intelligence involved the ability to quickly discern patterns in data. Most animals learn to come in out of the rain; only humans have the pattern-recognition “software” to note the connection between gathering storm clouds and rain. Rapid pattern recognition allowed our primitive ancestors to simplify a complex world and spend less time analyzing and re-analyzing potentially life-threatening events.

    However, the adaptive value of using pattern-recognition abilities to stay alive has an asynchronous element. Recognizing the connection between clouds, rain, and lightening has the potential to save lives. Over generalizing (Og gets bit by a snake; Og dies; all snakes are poisonous) is likely to have some negative effects but at a lower amplitude than the positive effects of rapid pattern recognition.

    This has led to a human race wired to see paterns where there are none; i.e., to interpret clusters within random data as meaningful. Hence, the belief that all cancer clusters must have some nefarious environmental cause.

    We are masters at recognizing patterns and we are masters at constructing “just so” explanations to explain the existence of the pattern.

    Our creation of gods and our explanation for the “poor” pitching of Joba both arise from the runaway patern recognition ability and the explanatory prowess that was so adaptive to our cave-dwelling ancestors.

    Vote -1 Vote +1

  17. Joe Morgan says:

    Me neither. The NFC has shown remarkable consistency in winning coin tosses.

    +5 Vote -1 Vote +1

  18. Dave Evans says:

    If Brady hit 50 while listening to Achtung Baby, imagine how many he’d hit listening to The Joshua Tree.

    Vote -1 Vote +1

  19. TIMMY says:

    If it is all random, why keep track of the stats?

    As an engineering background, I cannot just merely accept that it is randomness, until I have exhausted all other possibilities. This “process” is out of control statistically. The reason for that is most certainly not “randomness”.

    Vote -1 Vote +1

    • Alex says:

      And this really gets to the heart of the problem. The people who want to chalk all of these things up as randomness are the ones without a math or engineering background. They’ve suddenly found all these cool things and they don’t completely understand what they’re working with, but it just seems right and its fun to seem more knowledgeable, so they run with it. Recently I’ve likened it to the downfall of Wall Street. You had a bunch of untrained people using advanced mathematical models they really didn’t understand and they just placed way too much faith in them.

      Vote -1 Vote +1

      • Nick Steiner says:

        Alex you are quite quite wrong. Nearly everyone involved in sabermetric research would tell you that randomness is incredibly prevalent in baseball. That doesn’t mean that they are correct, but to say that the only people who accept randomness do not have a math background is laughable.

        The funny thing is that Dave used math in his argument to show that it’s very possible that Haren’s high BABIP’s are completely caused by randomness! You, on the other hand, have not backed up any of your points.

        Vote -1 Vote +1

      • Alex says:

        I didn’t say that only the people who accept randomness do not have a math background. That’s you putting words in my mouth so you can attack me. I said that “The people who want to chalk ALL of these things up as randomness are the ones without a math or engineering background.” Do you see the difference there? Dave for example doesn’t chalk this all up as randomness, he’s arguing that its a reasonable explanation, at least for some of it.

        Yeah Dave did use math in his argument to show that Haren’s high BABIPs are completely caused by randomness (and remember my comment wasn’t even directed to him). He completely left out the high HR/FB rates though. The chances of both of those things happening simultaneously and independently due entirely to luck is well north of 1 in a million (as the chance of the HR/FB rates being due to luck are well beyond 1/1000). He likely left that out because it makes his point tougher to make. Its easy to say, well hey, it was only a 1 in 2200 chance, so its not that unlikely over all these stretches. Its much harder to make that point when you’re dealing with something in the 1 in millions neighborhood. I think that does back up my points.

        Vote -1 Vote +1

      • TIMMY says:

        There is inevitably “randomness” in everything. But I have to agree with a poster above, that we call things random because we do not understand the variables. A pitcher can make the perfect pitch and have it still hit out of the park. That would be considered “unlucky”, and unfortunately “random” from some.

        To me it is very lazy to just determine that any process has been affected by randomness and leave it at that. Yes the odds of certain things were outlined above, but that is the chance of something happening based on known variables. If, for example, we found out that Haren was tipping his fastball the variables would have changed… odds would not have. One could go back to where Haren started tipping the pitch and determine if that is where the process went out of control (this is all hypothetical… I have no knowledge of pitches being tipped).

        If the author of the post came out with the pitch data showing that velocity, movement, pitch selection, pitch location were all statistically similar as Haren’s previous seasons… then I would be much more inclined to feel that Mr. Haren is a victim of circumstance.

        Vote -1 Vote +1

      • Wally says:

        Oh yes, this is just like THE DOWNFALL OF WALLSTREET!

        Give us a break Alex.

        Vote -1 Vote +1

      • Wally says:

        Alex,

        “I didn’t say that only the people who accept randomness do not have a math background. That’s you putting words in my mouth so you can attack me. I said that “The people who want to chalk ALL of these things up as randomness are the ones without a math or engineering background.””

        And who exactly is claiming it is certainly ALL random? And I’m sure you know everything about the background of ALL those people….More BS from our resident Math major from MIT…

        Vote -1 Vote +1

      • Alex says:

        LOL…you must live a very fulfilling life. Have fun continuing the attempts at mocking me. “OMG he believes pitchers have a measure of control over HR/FB without statistically significant proof!!!” The funny thing is I guarantee you Dave agrees that pitchers have some control over it. That’s why he says its better to look at all the metrics instead of focusing solely on xFIP.

        Vote -1 Vote +1

      • Wally says:

        Ha, Alex, you’ve been attempting to mock me all day. I guess we should say the same about you? No…

        And regardless of my mocking tone, you dodged the question. On what grounds to base your claim that: “The people who want to chalk ALL of these things up as randomness are the ones without a math or engineering background?”

        I’d love to hear it.

        Vote -1 Vote +1

      • Alex says:

        My intention was never to mock you, but rather to try and point out the flaws of your way of thinking. As I’ve said, I was where you are now a few years ago, but since I’ve come to realize there is more to this game than what can be statistically proven. Also, I quit posting hours ago, but you decided you just had to dig up other posts I made and continue with attempts to mock. To use a term you’ve thrown around a lot with me, that seems rather juvenile.

        I didn’t claim to have any sort of special information with regards to that opinion. I didn’t realize when I posted to agree with the opinion of another poster I had to type IMO before every sentence. My point was, you shouldn’t simply chalk things up to randomness just because you don’t have a statistically proven explanation. That’s never how I was taught to approach a situation like this. Just because you don’t completely understand how something works, that doesn’t make the results completely random. IMO (there does that make you happier) the people who tend to do this are those without math or engineering backgrounds since they don’t seem to understand the difference between failing to find proof that something exists and proving that something doesn’t exist.

        If you want to continue to mock me for whatever reason, have at it. I’m mature enough not to care what some faceless person has to say to my first name over the internet.

        Vote -1 Vote +1

      • Wally says:

        Alex, you haven’t just “mocked” me, you thrown bald face insults at me. And you continue to be condicending with this attitude that you were once like me but have some how seen the light, while you continue to create this strawman that I only deal with the strictest of statistical evidence (as well this most recent concept that those who disagree with you don’t have math/engineering backgrounds….btw I was a physics and bio major, and am now finishing a PhD involving a lot of bioengineering….). I do not. I recignize that the strict stat evidence is the most powerful tool, yet leave the door open to posibilities that are not yet known, while weighing evidence you’ve pressented for one of those posibilities. However, that evidence is extremely weak. It relies on cherry-picked data, ignoring all data presented to the contrary for very unlikely reasons (such as the belief that LD% is hiding hard hit balls because they are ALL fliners…yeah, right).

        “Just because you don’t completely understand how something works, that doesn’t make the results completely random.”

        It does however, make them random according to current knowledge. And it is illogical to attribute nonrandomness when very, very weak evidence, at best, suggests it might not be. Its further illogical to insult someone that points that out.

        “IMO (there does that make you happier) the people who tend to do this are those without math or engineering backgrounds since they don’t seem to understand the difference between failing to find proof that something exists and proving that something doesn’t exist.”

        And what is the evidence for this “opinion?” Or are you just being judgemental and elitist based on nothing? Yeah….this is a really, really stupid opinion.

        “I’m mature enough not to care what some faceless person has to say to my first name over the internet.”

        Ah, yes, and you’re also mature enough to throw instults at faceless first name….You’re such a big man, Alex….Please….

        Vote -1 Vote +1

      • Alex says:

        Go back and read what’s been written. I don’t think there is any point where I came out and mocked you. You continually pointed how I was arguing from ignorance and how I was acting juvenile. I simply tried to put my opinion into perspective. I tried to point out how I used to be where you are, but I’ve since changed my views. That wasn’t meant to put you down in anyway, I only wanted to give you an idea of how I got to where I am. And yeah, at one point maybe I went to far in talking about you live in a fantasy world where stats can be trusted to explain everything, but that’s about all I can see, and even then I think I have a point. Stats can’t explain everything as much as we’d like them to. If you really believe that we shouldn’t believe in anything until it has been statistically proven, I think you’re living in a fantasy world.

        As for the “evidence” for my opinion, I wasn’t aware that we were required to present evidence for every opinion we have. Based on what I’ve seen on these things around the internet, the people without math and engineering backgrounds are more willing to accept randomness or luck as an answer, as opposed to looking deeper for an alternate explanation. Its an opinion, maybe its wrong, but it happens to be what I think.

        Vote -1 Vote +1

      • Wally says:

        You never intended to insult or mock me Alex, come on… You asked me to go back and read what was wrote, well you’re not going to like the results:

        “You’re clearly the sort of person that likes to use these stats so you can play yourself up as being smarter than the casual fan.”

        “You’re clearly my intellectual superior because you can say you don’t believe in HR/FB being controllable without hard proof…Clearly you’re opinion is far more reasonable than mine.”

        “Then again I assume you don’t ever look at those since they use assumptions that have been proven statistically.”

        “ave fun living in your utopia where statistics perfectly capture everything in baseball. I’ll be living in the real world.”

        “I thought you understood probability and statistics.”

        “I’m not arguing from ignorance man, you’re just ignorant of how these things work.”

        “You’re clearly so much smarter than I am. I guess I’ll just have to take solace in my mathematics degree from MIT.”

        “The people who want to chalk all of these things up as randomness are the ones without a math or engineering background.”

        All of these are some kind of ad hominem or appeals to ridicule. I didn’t actually count, but I’d guess this is nearly 1/2 the posts you’ve addressed to me, or in one case, talking generally about “people” that have said something similar to what I’m saying.

        If you haven’t meant to mock/insult me, well then you’re not very good at understanding what you are actually saying.

        Also pointing out something you’re advocating is juvenile is not the same as an insult or ad hominem in general. You diverged off into this tirade about how I’ll look foolish when you’re eventually proven right. That isn’t the point. You may well be right, I recognize that. But there is extremely little to no evidence to prove you’re actually correct right now. That’s the point and why your statements are juvenile. Its immature or juvenile to not recognize this fundamental difference between what my position actually is and what you’re trying to make be, not to mention the repeated insults and attempts to mock me.

        “If you really believe that we shouldn’t believe in anything until it has been statistically proven, I think you’re living in a fantasy world.”

        See, its this whole idea your advocating that is immature. For one, I’m not relaying soling on the evidence from statistical studies, I also find your anecdotes to be very weak. Your evidence is just awful Alex, and it is likely this very reason why you feel you have to defend yourself with these kinds of statements where I’m “living in a fantasy world.” If you had decent enough evidence you’d just give your case, recognize its not absolute proof, but decent start, and leave it at that. But instead you resort to these ridicules.

        “I wasn’t aware that we were required to present evidence for every opinion we have. ”

        Again a juvenile attitude. Of course you don’t HAVE to have great evidence for any particular opinion. You can say what you want, but understand that depending on the strength of your evidence people will be more or less inclined to “believe” your case. In this instence, your case is weak, I called you on that, and more or less you’ve been ridiculing me for only relying on the strongest stat. evidence. While, in truth, I’ve been doing that in part, but I also believe your anecdotes are practically worthless, have shot holes in them, and given a few specific counters. So, we’ve spent the better part of 24 hours talking about this strawman constructed as an appeal to ridicule you’ve been perpetuating and trying to justify.

        “Based on what I’ve seen on these things around the internet, the people without math and engineering backgrounds are more willing to accept randomness or luck as an answer, as opposed to looking deeper for an alternate explanation.”

        Alex, this may be just your opinion, but its absurd. Do you have any idea what the math/stat/engineer/science background is for every poster that said stuff like this? My guess is you’re making assumptions in a HUGE majority of the cases. Like you said, we’re generally faceless first names, if that much, who you also likely only have a handful of posts worth of contact with. I’ve told you my background, and you’ve told me yours, but we don’t exactly have eachother’s transcripts here, and how many people even give you that much? I’m guessing its an impossibly small fraction to attempt to create this opinion from. Making such statements and especially sticking by them when people call you out for them, just makes you look like a fool trying bang his hands on the desk claiming he’s smarter than everyone else when people don’t agree with him. Just stop. Its JUVENILE.

        Vote -1 Vote +1

  20. Avery says:

    Great post Dave. Randomness can indeed be very difficult to accept, but it is certainly ever present in our world.

    WNYC Radiolab did a fabulous podcast on the subject, including a segment on whether or not athletes behave like flipped coins (i.e. randomly). If anyone is looking for a more thorough exploration of the topic, or just a great and entertaining hour of listening, check it out:

    Stochasticity: http://blogs.wnyc.org/radiolab/2009/06/15/stochasticity/

    and a follow-up specifically about the notion of athletes as flipped coins,

    Are we coins?: http://blogs.wnyc.org/radiolab/2009/06/29/are-we-coins/

    Vote -1 Vote +1

  21. John Clay says:

    One Tree Hill indeed.

    Vote -1 Vote +1

  22. String Theory says:

    There is no such thing as random.

    Vote -1 Vote +1

  23. mettle says:

    The “until you remember just how many different five month stretches of pitching there are in Major League Baseball” essentially refers to the stat’l method of correcting for multiple comparisons or something similar. It’s a great point, but I wish FanGraphs writers were equivalently responsible as Haren-knockers are now required to be on that point. I can’t remember the last time I read an FG article that corrects for multiple comparisons in that way. Consider, for instance, the article right after this, “The 2004 Phillies and Fly Balls”… completely silly in the same way.

    Vote -1 Vote +1

  24. baseball says:

    Check this out:

    His BABIP Last Five Seasons:

    Year Before After
    2010 .339 .370*
    2009 .233 .315
    2008 .256 .375
    2007 .234 .357

    Can we just came him Mr. Random and Mr. Unexplainable Phenomena?

    Vote -1 Vote +1

    • Nathaniel Dawson says:

      Before and after what? I don’t understand your post.

      Vote -1 Vote +1

    • Nathaniel Dawson says:

      Loved this article, Dave. There were some comments that made reference to scientific theory that address such an issue. Chaos Theory, quantum uncertainty, synchronicity, whatever. Any and all of those are so complicated and esoteric that trying to apply them to baseball would probably lead in directions away from anything that could give a better understanding of what is going on. There’s no way we could possibly know all the millions of minute, discrete influences such things might have on events during a baseball game, so we treat the unknowable as “randomness”, because at this level, it adequately encapsulates those influences in a way we can deal with.

      Randomness happens, it creates unexpected results, and a realization and acceptance of that helps our understanding of the game.

      Vote -1 Vote +1

  25. baseball says:

    call*

    Vote -1 Vote +1

  26. baseball says:

    I wouldn’t at all be surprised to see him post one of his usual, unsustainably low BABIP (like he does in seperate halves every year, pretty much) for the next two months of the season, to even out whatever caused the .350+ BABIP.

    *shrugs*

    Vote -1 Vote +1

  27. dbroncos31 says:

    Dan Haren K%
    2008: 23.4
    2009: 24.5
    2010: 23.2

    So while he may have a better K/9, he is not striking out more batters and in fact is striking out less than he ever has in the National League. I’m not saying that his BABIP will stay this high going forward, I’m just correcting anyone saying that he’s striking out more batters and is therefore pitching “better.” His numbers have declined across the board this year, lower K%, higher BB%, and of course the higher HR%. This is why people are worried. He’s having worse results while also pitching worse in the three categories he has some control over. If I was an AL team I’d be a little worried about giving up top talent for him.

    Either way the Angels still got a steal and he’ll likely be a good pitcher for them but I’m not surprised that Dave’s sources in the game aren’t as high on him as some of the comments here.

    Vote -1 Vote +1

    • S says:

      There isn’t any kind of noticeable difference there, certainly not one that would make us believe he has declined whatsoever. His xFIP is still 3.31, or top 5 in the majors.

      Only time will tell about his BABIP, but I wouldn’t bet any money on it continuing for much longer. It’s more likely than not just random. He also has had these spikes in his BABIP (first half/second half) for a while now.

      Vote -1 Vote +1

      • dbroncos31 says:

        Yeah, I’m not saying he’s not a good pitcher, this was more in response to someone saying earlier that Haren’s stuff hasn’t declined because he’s striking out more batters.

        Vote -1 Vote +1

  28. Jake says:

    The main point here, to me at least, is this: I can accept that randomness exists, but that does not mean that Haren’s BABIP rates are random, nor does it mean that his second-half swoons are random. I certainly can’t prove that they’re not, but you certainly can’t prove that they are. So it basically just goes to belief, and if some scouts choose to believe that they’re not random, that doesn’t necessarily make them ignorant.

    Vote -1 Vote +1

    • Jake says:

      Note, that was meant more to apply to his second-half swoons rather than his BABIP, although I guess they’re somewhat interrelated.

      Vote -1 Vote +1

      • Matthias says:

        The definition of randomness doesn’t imply any causation, like you’re saying. Randomness is defined as being unpredictability. So Haren’s BABIP may be high for some reason, but without knowing that reason, it is simply random to us as observers. We cannot predict which years it will be high or low, because we do not know if there exists a measurable cause for extreme babips. When a BABIP appears high, regardless of why that might be, we can only predict that it will fall back to normal levels because that is what almost always happens with high babips.

        Vote -1 Vote +1

      • Alex says:

        According to your definition it would seem as long as their is an explanation for them, they aren’t random. Thus, while they may appear random to us as observers, that does not in fact necessarily make them truly random. We just don’t have the ability to find the cause currently. If we due in fact figure it out in the future, the act of figuring out isn’t want made them not random. They were never random in the first place, they just seemed that way because we lacked the understanding.

        Vote -1 Vote +1

      • Matthias says:

        High BABIPs are still random occurrences to us because we don’t know the cause, or if there is a cause at all. That doesn’t mean we can’t someday pin the cause on some bad-pitching aspect, but for now it’s unpredictable. The only thing that seems to be predictable, based on thousands of pitchers’ data, is that babips well above .300 tend to fall. Perhaps Haren is doing some wrong, or perhaps he’s just unlucky, but either way we can expect his babip to come down, and with it his ERA.

        Vote -1 Vote +1

    • baseball says:

      Then explain why every year (from 2006-2009) he had an unsustainably low BABIP in the first half of the season, but followed it up with a crazy, unsustainably high one afterwards? Giving us, ultimately…a league average BABIP? Couldn’t we easily just point towards that in comparing his (unsustainably high) BABIP thorugh 100 games this season? That he experienced all of this before? Over and over again? And that it proved to be nothing but random?

      He’s known for these weird splits. Would it surprise anyone to see him post a .230 BABIP during these next two months, evening out the .350 BABIP? No. Would you expect it happening? Of course not, but hey…strange sh*t happens.

      Vote -1 Vote +1

      • Alex says:

        Because he starts to fatigue later in the season, as he’s personally discussed?

        Vote -1 Vote +1

      • baseball says:

        Except there is nothing to suggest that. Nothing in his velocity. Pitch usage. Batted ball data. K/9. BB/9. You name. Nothing changes in the way he pitches…nothing noticeable, anyway. That just may have been his explanation in an attempt to find one that wasn’t there.

        Honestly, he has had these stretches before. 80 consecutive games or 15 starts of a .350+ BABIP is nothing new for Haren. Neither is a .230 BABIP.

        All proved to be nothing. Random. Strange, bizarre, yes…but random.

        Vote -1 Vote +1

      • joe says:

        Ugh…this is the problem with people solely reliant on existing statistics to measure things.

        Fatigue doesn’t have to impact velocity or pitch selection… fatigue can easily impact the consistency of a pitcher’s delivery, which in turn might just have some impact on control and command? Similarly fatigue may not impact OVERALL velocity but could impact some pitches.

        Would fatigue that causes issue with command possibly impact BABIP?

        When you say nothing noticeable.. you seem to be reliant on aggregate statistics – how many times have you watched him pitch in these stretches?

        Vote -1 Vote +1

      • baseball says:

        I don’t deny the possibility. I’m talking about things that we can notice/see through data and stats. That’s all I meant there.

        I haven’t watched him pitch. I’m sure most here haven’t, either. For all we know, what you could be saying is true. Then again, it may not. Who knows?

        But the point is…he had an unsustainably low BABIP in the first half of the season from 2006-2009. He then followed that up with an unsustainably high one right after in the second half. What’s the explanation there? Why does it occur exactly at the same time? Like clockwork. If we are to assume it isn’t random, explain how he’s able to post an unstainably low BABIP in the first half, then. Explain that. Why is he able to post an unsustainably low BABIP in the first half? Why does the opposite happen right after (like clockwork) in the second half?

        I think it’s random. That’s just the logical explanation here. Just one of those bizarro, freak occurrences that can’t be explained…but happens.

        Vote -1 Vote +1

      • Jake says:

        You seem to be arguing the Monte Carlo fallacy almost exactly, unless I’m misreading you. If his BABIP unsustainably low in the first half, you should just expect his BABIP going forward to be at about .300, not become unsustainably high so as to perfectly even out his overall numbers.

        Again, I’m not saying that it’s not random, my point was that someone (here, the scouts who view him more negatively than the author, and thus allowed the Angels to get him for what seems like well below fair value) might think that it was fatigue (or some similar variable). Esp. if it follows a pattern (I don’t think anyone has ever argued before that the fact that something follows a pattern proves that something is random).

        Vote -1 Vote +1

      • Jake says:

        Okay, upon further review, I’ve decided that I did misread you to an extent. So apologies for that.

        Vote -1 Vote +1

      • joe says:

        The problem I have is how people just associate “who knows” with randomness. I don’t know why when you put 2 hydrogen atoms together you get an immense release of energy, but I’m not going to chalk it up to randomness. Just because conventional SABER stats can’t find an explanation, doesn’t logically prove randomness… it just proves there is no explanation with current SABER stats.

        You state that there is nothing to support it being fatigue related and rely on some fairly thin first order measurements. with all due respect that is either lazy or just shows a lack of knowledge of pitching.

        Is he running more hitters counts as he gets fatigued? Is it an issue with a specific pitch as opposed to velocity? Is it maybe he tires in a game earlier than when he’s ‘rested’ (for lack of a batter word) in the beginning of the season?

        Randomness is not the logical explanation, it just happens to be your explanation (and is as good as any explanation). The problem is your “data” and “stats” are limited and you are looking at aggregated #’s and not looking at enough detailed stats and/or context. How is the break on pitches, % of each pitch type thrown as strikes, counts he runs, first pitch strike %, etc.

        Again just because there is not an obvious explanation doesn’t mean it’s simply randomness…I prefer to keep an open mind and am willing to consider that while randomness is certainly a component of the variance, it may not be the only thing in play here. Unlike Dave’s characterization and generalization of those that disagree with him, there is a middle ground between pure randomness and no randomness.

        Vote -1 Vote +1

      • baseball says:

        Well, my point was…Haren has experienced this before. More than once. Hello, 2006-2009. All you have to do is look at some 80+ game stretches from 2006-2009 to see what I’m talking about. You’ll get 80 consecutive games/15+ starts of a really high BABIP. You’ll also get 80 consecutive games/15+ starts of a really low BABIP.

        A .350 BABIP this year through a little more than half a year?… been there, done that. Check 2006-2009. That was my point.

        Haren has experienced this before. Nothing we’re seeing now is any different than, say…what he experienced from 2006-2009. You see what I’m saying? That’s all I’m trying to say here.

        And if some don’t want to believe it’s anything but random, then I ask you to explain what happened from 2006-2009. Explain those 80 game stretches where he had a .230-.250 BABIP for 15+ consecutive starts. Since the focus is on his BABIP this season…at the moment….let’s go a little back and explain some similar/comparable incidents that he experienced while we’re at it. It only makes sense.

        I don’t claim to know the answers. Nobody knows what the answer is. We can just speculate. I’m willing to look at both sides of this thing. But if you’re going to focus on ONE year, this one…then you have to look at all his other comparable occurrences throughout the years. How is any of this different than what happened from 2006-2009? When he went through half a year+ (every year, literally) with a really high BABIP? It isn’t. That’s my point. His career BABIP is .305. It always “corrected” itself and he ended up with a league average BABIP. Always. For all we know, he’ll put up a .230 BABIP during these next two months, finishing with a league average one.

        Anyway. I’m inclined to believe it’s just random (a bizarre, freaky random occurrence).

        Vote -1 Vote +1

      • joe says:

        Two things:

        - There’s a significant difference between the concept of regression and randomness. Just because someone’s #’s are likely to regress doesn’t mean the results were random. It is possible to think that Haren’s #’s are not random (or only partially random) and still think they will regress in the future.

        - Statistics have improved but they don’t measure everything and many of them lack significant contextual measurements. When one says the stats are the same, so it must be randomness, the underlying assumption is that the statistics will catch non-random changes. That might be true 50%, 75% or 90% of the time, but I have a hard time believing it catches everything. I’m also not certain that all of the statistics outside BABIP are indeed the same (but then again I wasn’t the person claiming NOTHING has changed and the logical explanation is randomness).

        So why was his first half bad this year when over the past 3-4 years it’s been a 2nd half effect or randomness thing?
        - Maybe it is just variation
        - Maybe as he’s gotten older his body type has evolved and he’s having more difficulty repeating mechanics this year (again this may not be an effect on every pitch, it may mean more frequently missing desired location leading to a greater percentage of easier to hit pitches)
        - Maybe he wasn’t in the same shape coming into this year as he has been in the past (if you put any credence into fatigue playing into his recent 2nd half #’s)
        - Maybe he has been pitching in more predictable patterns and the scouting on him has improved
        - Maybe he’s pitching with a minor injury

        I’m inclined to believe it’s a mix of factors (though not necessarily limited to the ones above) and not pure randomness… the fact that I don’t know the explanation doesn’t mean the logical explanation is randomness. I think your last sentence summed things up – you’re inclined to believe. In the end there is no one logical explanation, it comes down to opinion and belief – the statistics just aren’t good enough to prove causation (doesn’t mean they are bad or are useless, it just means they have limits).

        Vote -1 Vote +1

      • Wally says:

        Joe

        “Is he running more hitters counts as he gets fatigued? Is it an issue with a specific pitch as opposed to velocity? Is it maybe he tires in a game earlier than when he’s ‘rested’ (for lack of a batter word) in the beginning of the season?”

        It seems rather illogical to think that running more hitters counts, etc., wouldn’t show up in other stats however. For example, running into more hitters counts, has got to hurt your K and BB rate.

        However, looking at his GB/FB ratio and LD%. Here’s what they do:

        April: 1.16/18
        May: 1.10/18.5
        June: 1.06/19.6
        July: 1.46/22.1
        Aug: 1.36/21.9
        Sept: 1.28/20.1

        Both GB and LDs go for more hits than FB, so this could help explain the BABIP difference 1st half to second. However, this is just kicking the can down the road. What causes this? Is this trend typical?

        Vote -1 Vote +1

  29. Matthias says:

    Dave: I love the point you’re making about looking at the bigger picture. Every year somebody is doing something crazy, whether it’s Raul Ibanez’s first half of 2009 in Philly, Brady Anderson’s 1996 season, or Doug Fister’s 0.240 BABIP at one point earlier this season. To just look at one player performing unusually well (or poorly) is disregarding the wealth of players who are performing within a standard deviation of expected.

    Here’s a nice analogy: Say you’re a blade of grass on the fairway, and Tiger is teeing off. What are the chances you’re the lucky one who gets to support Tiger’s golf ball? Likely pretty low, say 1 in 5,000,000. But what are the chances that Tiger hits SOME blade of grass on that fairway. Any one of them. Probably closer to 4,500,000 in 5,000,000. We not surprised when he hits the fairway, why should we be surprised that it’s any one blade of grass over another?

    Someone is going to have an unsustainably high BABIP this season, and that is guaranteed by randomness. Thank you for your enlightenment and acceptance of randomness as a powerful force in this world :-)

    Vote -1 Vote +1

    • xeifrank says:

      Right. Same thing with pythag win/loss records. Just based on randomness a few teams are going to either out or under perform their pythag records by 2+ standard deviations. People will try to put some causation to it, but that causation likely is just random noise.

      Vote -1 Vote +1

  30. baseball says:

    Is there any doubt that it’s random when you look at his history? Go check out his first half BABIP from 2006-2009. Then check out his second half BABIP. Complete opposites. One being unsustainably low, the other unsustainably high. All evened out to league average in the end.

    He had long stretches before where he allowed a .350+ BABIP through 80+ games. Then he had long stretches where he had a .230 BABIP. It never proved to be anything. It just proved to be random. A bizarre occurence, no doubt…but random.

    Vote -1 Vote +1

  31. joe says:

    Can you explain how Haren has been over .350 4 out of the last 5 months (I’m going off fangraphs split data)

    Jul – .398
    Jun – .343
    May – .370
    Mar/Apr – .310
    Sept/Oct ’09 – .352

    As usual Dave is playing fast and loose with the #’s…. let me guess .343 is close enough to .350? (and doesn’t change the overall point… so he’ll chalk it up as a ‘minor’ mistake and ‘not want to get into it”). What are the odds of a true talent .305 BABIP pitcher having 3 out of 5 months above .350… still 1 in 2200? or maybe a .305 BABIP pitcher being above .340 in 4 out of 5 months?

    The disconcerting problem is he probably didn’t need to tweak the #’s to get the point across…but it wouldn’t sound as impressive?

    It’s the same as the sweeping generalization that all who disagree with him must refuse to accept randomness… perhaps there is a good portion of people who think there is a randomness component as well as possibly controllable elements?

    Vote -1 Vote +1

    • Ray says:

      One .007 point difference is fast and loose now? Dave didn’t even say .350+, he said “abnormally high”. The later example did not explicitly refer to Haren.

      Vote -1 Vote +1

  32. theinternet says:

    Great article.

    Vote -1 Vote +1

  33. Carlos says:

    This is a fantastic piece, really, kudos.

    Vote -1 Vote +1

  34. Gamblor says:

    Can’t it be both? If there is no skill in keeping fly balls in the park you could pull a guy out of a beer league baseball roster and get him to face big league hitters. His HR/FB rate would be the same if there is no skill to it. I just don’t buy that.

    I think that just like babip, some guys are going to have a slightly different HR/FB rates. Ichiro’s babip is .357 over 6500 ABs. That is way above league average. I do think there is a ton of randomness involved there. He was at .333 in 2003 and .399 in 2004. Did he miraculously get better that year? I’m much more inclined to believe that was due to luck. I don’t claim to know the exact breakdown of luck vs skill – and it’s probably heavily skewed in favour of luck. But just like babip I bet there is some skill to HR/FB too.

    The stat I’d really like to see is batted ball speed. I bet that would shed some light on things.

    Vote -1 Vote +1

  35. CircleChange11 says:

    Accepting the “luck” or “randomness” variable in BABIP and HR/FB seem, to me, to be two different things.

    As a former college pitcher, and lifelong baseball guy (player, coach, fan), randomness in BABIP matches what I observe. You get the batter to hit a groundball, and whether it’s a hit or not is largely dependent on what direction the grounder is hit and where the fielders are positioned (and also the quality of fielder). We’ve all observed this. The difference between a grounder for a hit and a grounder for an out is just millimeters of contact on the bat.

    But, luck or dandomness on HR’s is osmething completely different IMO. We’re talking about balls hit anywhere from 330 feet to 450+ feet. There’s not a whole lot of luck hit on a ball crushed like that, and not many “good pitches” get blasted (very few batters hit a low and away change/slider/etc for a bomb … or deep). Most HRs hit are done so on pitches that are over the plate, thigh high. A pitcher has control over that. I know when I’ve given up a homer, I never felt unlucky, because I saw where the pitch was located. There have been plenty of times where I felt more responsible for the HR than I felt the batter was (as I did a helluva job hitting the sweet spot of the bat, or “laying it in there”).

    In order to give up a lot of homers, you have to contribute. The quality of hitters is high enough that if you consistently missing in the zone, that hitters will tattoo it. It’s not like many hitters get fooled and drive a ball 380-feet. You only need to observe BP to see what most hitters can do to a pitch right over the plate. It’s not a grat mystery.

    IMO, Haren is likely being inconsistent. He has great stuff and is always around the plate (High K, low BB), but he is also likely missing enough over the plate (instead of off of it) that when he does, a decent % of the time, hitters are driving the ball … and driving it well enough to leave the park.

    The aspect I would like to see reflected in pitching stats is a “quality of opposition” stats. It has been my experience as a player/coach, that the biggest difference in results for a pitcher is “who is pitching against” (lineup).

    I cannot imagine that it would be difficult to look at the HRs Haren has given up over the last calender year and see [1] the location of the pitch, [2] the pitch type, [3] movement of the pitch, and [4] quality of hitter.

    Even when you watch prominent HR hitters like Howard or Pujols, you don’t often see them hitting a pitcher’s pitch 400-feet … but you do see that capitalize on almost every “mistake”.

    HRs are most often a combination of [1] pitcher mistake and [2] hitter skill/ability. I think the contribution of the pitcher is minimized far too much in the stat of HRA. Hitters just don’t accidently or randomly crush a pitch to warning track or better.

    Vote -1 Vote +1

    • Wally says:

      The thing I find wrong with your nonrandom HR/FB case is basically that those 330-400 ft shots require inputs that are not particularly well controlled by the pitcher. For instance if you hit a ball with the force and angle to go 330 feet, you need to hit it in a very specific direction to get out of the park. And even then its likely only a HR in a fraction of MLB parks. If the wind is blowing in, forget it. But if the wind is blowing out, and that 330 shot becomes a 350 shot, the number of directions and parks it would be HR in increases. Maybe its particularly hot that day, so the hot air rising off the park helps lift the ball out. So, the only kind of flyballs that are always HRs are at least of the 400+ ft kind. I don’t have the numbers, but I’d guess those make a fairly small fraction of HRs, and that most are of the 330-400 range, that require factors that aren’t well controlled. Like hit direction. A change of just a few degrees left or right can make huge differences in the distance a ball has to travel to be a HR, but has anyone ever shown pitchers have much control of that? I don’t think so, but I don’t know that anyone has checked either.

      Vote -1 Vote +1

      • CircleChange11 says:

        I’m gonna say “bullshit” on A LOT of those factorss significantly affecting distance. How much wind blowing in (or out) is required to push a ball an extra 20 feet?

        Also, most hitters power is pull, so they hit the balls the farthest to the shorter part of the ballpark.

        It’s not like power hitters are these spray guys that just hit the ball in the air and hope for the right weather conditions.

        If baseballs were to react as extremely due to weather conditions, as you seem to think, then every day and in every city (outside of SD, that is) would play like Candlestick.

        Pitchers cannot “control” anything other than where the pitch is going and how had they throw it. But,there are many other things they can “influence”, such as contact quality, probability of groundball, etc … they can do this by the pitch type (movement) and location of the pitch. The things they cannot influence are [1] park factors, and [2] hitter quality. As we see repeatedly, pitchers even influence “contact” by the quality of their stuff.

        Control and influence are two different things. I think we should drop the idea of control since it so rarely actually comes into play in MLB. Everything is “shades of influence”.

        Vote -1 Vote +1

      • CircleChange11 says:

        Here’s what I mean by “Pitcher Influence” on OBA, HRs, K’s, etc …

        [1] We know that different hitting zones have different “BA’s”, and even each hitter has their own “hot” and “cold” zones.

        [2] We know that certain zones in the hitting area are more likely than others to result in HRs. I’d be willing to bet that if we pitch charted the location for all the HRs hit in MLB, that there would be a very strong corrleation, or a concentrated zone, of the hitting area for HRs.

        So, pitchers, by where they throw the ball, can influence how well it is hit.

        But, there still is an amount of “chance” or “randomness” when dealing with probabilities. For example, if we know that Pujols hits .458 on balls in the center of the plate, and that 29 of his 42 HRs came on balls in the center of the zone, we could still get him out sometimes by throwing “cockshots” … but there is also greater probability that he’ll be successful, than if we pitched him in a “different zone”. That’s what I mean by influence.

        Pitching can be difficult because you likely need to make 2-3 quality pitches to get a strikeout. Heck, you could make 3 quality pitches to a batter (1 fouled off), and then give up a HR on a pitch that misses its location, or a hanging slider, etc.

        This is where someone like Haren could also have high K’s, low walks, and STILL give up a lot of homers. When he misses, it could be in the center far too often. Even if he doesn;t miss that much (in terms of %), those misses, if they are in a certain zone, could be deadly even at a rate of 33-50%.

        If a pitcher gives up 20 HRs on balls that are low and away, we could say that he’s experiences bad luck. But if we’re looking at a pitcher that is giving up a lot of homers on due to too many pitches in the center of the zone, then we could say that the pitcher is pitching “as expected” because we know pitches in the center have a high probability for being hit a long way. The “chance” would also depend on batter quaity, even though most major leaguers can tee off on a pitch in the middle of the zone. Just as pulling in the infield raises a batter’s chance of a hit, so does throwing a pitch down the middle. Pitcher influence.

        Vote -1 Vote +1

      • CircleChange11 says:

        In the opening line I mention OBA (Opponents Batting Average). In order to keep from confusing this with On Base Average, I should have called it BAA (Batting Average Against).

        Vote -1 Vote +1

      • Wally says:

        “How much wind blowing in (or out) is required to push a ball an extra 20 feet?”

        Not very much actually. Say a ball is in the air for 5 seconds. Which isn’t that untypical. A 10 MPH wind, which is a pretty good breeze, travels about 14.5 ft/second. So, and this is a bit simplistic, the air the ball is traveling through could move a total of 73 ft in 5 seconds. Now, the wind isn’t going to be blowing at a constant speed relative to the ball for it whole journey (it would be blow the most typically at its highest point for example). And we have issues with vector addition. Meaning the ball is unlikely to be hit in the exact direction the wind is blowing. This of course can also be compounded if the wind moving around the stadium acts to lift the ball as well. But you can tell we’re on the right order of magnitude here. A decent breeze can put something like 20 ft on or off a ball hit in the air. As a somewhat simple experiment, throw a ball up straight up in the air on a windy day, see how far it moves.

        “Also, most hitters power is pull, so they hit the balls the farthest to the shorter part of the ballpark.”

        Eh, sorta. A lot of guys hit bombs to center of pull as well. Which is often in the 360-380ft range to the fence. Go take a look at some HR charts for any particular hitter. Then look at different years. You’ll see a great deal of variation. Sometimes they get lucky and hit several “just enoughs” out in the short part of the parks, and sometimes they don’t.

        Anyway, its now clear to me from reading your posts that you’re quite ignorant of a great many things, particularly how to stay away from building strawmen, as your little cndlestick example shows. So I be done here.

        Vote -1 Vote +1

      • CircleChange11 says:

        I pitched college and semi-pro ball in Chicagoland. I’m very familiar with the effects of wind on balls in the air (blowing out, blowing in, sideways, etc), as well as high humidity.

        As I said, over a long seson the amount of “wind blown” HRs a pitcher gives up is going to be a small percentage.

        The overall point that I was making is that any ball hit for 350-feet (or more) will most often be on a “hittable pitch” (in the center or toward the center of the zone).

        Also, the balls that willbe most affected by wind, will be balls that are higher in the air. To hit a ball both HIGH and FAR, it must be struck VERY well. You don’t often do that on a pitcher’s “pitch”.

        I love looking at spray charts. I think if you had a “leaguewide spray chart” and a ‘leaguewide pitch location” for HRs, one would find:

        [1] Most HRs are pull.
        [2] Most HRs are off pitches near the center of the zone.

        I fail to see what is complicated about this. It is both likely supported by data (at least the data I have seen), and common sense (what pitch — type and location — can all batters hit well? – relative)

        Vote -1 Vote +1

      • Wally says:

        Change,

        Wind blowing in/out/up/down/left/right, does not even out just because you say it does. A starting pitcher only throws in something like 28-35 days a year, and often only one day or none in a variety of away parks. That’s not a very big sampling of weather conditions, again, particularly for away parks.

        Take a look at Dan Haren’s (since he’s been the topic of discussion here) starts and the weather conditions.

        6 MPH oR
        10 MPH oR
        1 MPH iL LA
        7 MPH RtL
        11 MPH iR COL
        10 MPH oC CHC
        14 MPH oL HOU
        19 MPH LtR
        10 MPH RtL ATL
        19 MPH RtL
        9 MPH oC COL
        6 MPH oL LAD
        0 MPH
        8 MPH iR
        12 MPH iC BOS
        0 MPH
        9 MPH LtR STL
        0 MPH
        0 MPh
        9 MPH LtR SD
        0 MPH
        6 MPH oR LAA

        Now my little code here is oR means out to right, or LtR means Left to righ, iL means in from left, etc. I also put the parks if they were anything other than AZ. Now you can tell when the AZ park started closing the dome, but for the most part those other numbers are all over the place. I find it terribly hard to believe all those kinds of combinations “even out” over just 30 some starts. Maybe it evens out over a few years, but not a year. In fact, just looking at in vs. out, Haren has pitched in seven games with the wind blowing out, and 4 with the wind blowing in. Also, the average speed of the wind blowing out was about 9 MPH, while blowing in it was 8. May not sound like much, but it does help skew an already skewed distribution.

        All this talk about pitchs and what not is rather besides the point. We’ve already assumed the ball was struck hard, now it about how far it will go due to things the pitcher can’t control. Like wind, or park shape, the direction within just a few degrees, things like that.

        And no shit most HRs are to pull. That’s where both the park is short and bats are fast….

        Vote -1 Vote +1

      • CircleChange11 says:

        I don’t think wind is a big factor, I was stating just the opposite.

        I think the biggest factors in “balls crushed” are [1] pitch location, and [2] quality of hitter.

        All I have been saying in this thread and others is:

        [1] Pitcher influence BABIP and HR rate by where they throw the ball. How much influence would take some math that I cannot do. We have data on teams and players and their BA on BIP from the “9 different zones” of the strike zone. Same thing with HRs. By pitching in those BA and HR zones, you’re lowering the probability of a BIP being an out.

        [2] Quality of opposition should be figured into pitcher value.

        That’s it. I think you’re arguing against some things that I am not stating.

        My big point is that you cannot “pitch to the center” of the zone and then when more than 3 out of 10 balls on average go for hits, state “bad luck” on those balls. One should expect a BA and wOBA of far greater than the average of all the zones.

        Vote -1 Vote +1

    • randomness is for tools says:

      Ground balls be can hit just as hard as a home runs. Someone like Louis Castillo isn’t going to start hitting homeruns just by hitting the ball on the right part of the bat. Miguel Cabrera can hit screaming ground balls almost right at someone and still make it through the infield.

      Vote -1 Vote +1

    • randomness is for tools says:

      Where are the metrics for how hard a ball is hit? Ground balls or singles can be hit just as hard as a fly balls or home runs. Someone like Luis Castillo isn’t going to start hitting homeruns just by hitting the ball on the right part of the bat. Someone can hit a screaming ground ball almost right at defender, and it will make through the infield. An average hit ground ball at that same spot would be an easy out. But a weakly hit ground ball in that same spot, might be an infield hit if the runner has enough speed to beat it out. All this stuff wouldn’t be so random if you had data for how hard a ball is hit and data for how fast someone runs to 1st base. IMO, sabermetrics is stuck in a fog and some of you guys think everything you can’t explain is randomness.

      Vote -1 Vote +1

      • CircleChange11 says:

        This is a point that I often try to convey to the fan that did not play baseball at a high level (college or greater).

        MLB baseball features 6’3 200 pound pitchers throwing 90+ mph to 6’3 200 pound hitters swinging a bat with bat speed anywhere from 70-100 mph.

        When they hit a hard ball with a hard bat, the result is a high velocity baseball. A routine 3-hop grounder to MLB fielders would be the equivalent of me hitting fungoes at the average joe with a velocity they could not handle. MLB fielders are simply amazing. They handle 80mph grounders as if they were a cat playing with a mouse (on average). I often say that MLB fielders are their own worst enemies because they make it look so easy, routine even.

        Now, the closer the ball is to the center of the zone, the better/harder the hitter will strike it, due to barrel location,leverage, etc. Harder balls are higher velocity. Anything moving at a higher velocity is harder to field and travels further. It’s why MLB SS can start off their “creep” or “prep steps” while standing in “short LF”. A 3-hop grounder travels so fast and their arms are so strong that they can field a ball that deep, and still get an MLB runner/batter out at first at an extremelly high rate.

        If you give up line drives or extremelly hard hit grounders, the only way you get the batter out is if it is hit right at a fielder or they make a tremendous one half-step and dive play. The odds of that happening are not good. Something like 70% of liners go for hits, and if they’re in the gap, they’re almost guranteed extra bases.

        Look at “comebackers” that are line drives on balls thrown to the center of the plate, the P considers himself lucky if he can just get his glove up in time to protect his face. Batters don;t generally get that type of contact on pitches located at the corners of the zone.

        Pitchers do influence batter contact quality and BABIP … we just cannot currently quantify it. It is also possible for a high K pitcher to negatively influence BIP if he misses “in the zone” too often. No pitcher hits their spots all the time, but the best ones miss “out of the middle” more than they “miss in the center”. Guys that feed the ball over the center consistently don;t enjoy MLB careers for very long.

        Vote -1 Vote +1

  36. Robert Zimmerman says:

    haha The Joshua Tree is a good album

    Vote -1 Vote +1

  37. pft says:

    Randomness exists, but not everything is due to random chance. Just because you can not find statistical proof that there is some other cause for events, does not mean this is proof it is random.

    This post is a classic example of faulty logic and why folks are questioning some of the sabermetric conventional wisdoms.

    The idea that someone should just accept random chance as an explanation for events and not seek alternative explanations is a sad commentary of our times, and extends far beyond baseball.

    Vote -1 Vote +1

    • philkid3 says:

      Re-read the last paragraph.

      Vote -1 Vote +1

      • pft says:

        Seven paragraphs selling randomness, followed by a maybe it’s not random, but if you don’t accept it is random, you are probably seeing things that don’t exist and you will provide explanations for that which has no explanation. Kind of implies you should just accept randomness.

        Given a large enough population randomness tends to be seen and models based on large populations perform well. However, not all of the individuals within a large population behave in a random manner. The biggest problem with some analysts is their assumption that if the larger population exhibits random behavior, then individuals within that population do as well, and if you can not prove an individuals performance above or below expectation is not random, you should accept randomness.

        Vote -1 Vote +1

      • Wally says:

        ift,

        “if you don’t accept it is random, you are probably seeing things that don’t exist and you will provide explanations for that which has no explanation. Kind of implies you should just accept randomness.”

        Its saying that its random in so far as we know today. Like the coin flip example. There are certainly things that control the coin flip, but we can’t account for all of them and nor can the flipper control them, so it appear random, when in fact, there is a perfectly good physical explaination for how many rotations a coin will make before it comes to rest.

        Vote -1 Vote +1

  38. rick11p says:

    Most animals learn to come in out of the rain; only humans have the pattern-recognition “software” to note the connection between gathering storm clouds and rain.

    Not much of an outdoorsman, huh?

    Vote -1 Vote +1

  39. JH says:

    Davey Johnson meet Jose Bautista

    Vote -1 Vote +1

  40. Wallyisatool says:

    Wally is a tool.

    Vote -1 Vote +1

  41. Mike says:

    It seems that some commentors are implying that Dave is labelling Haren’s inflated HR/FB and BABIP as purely random.

    He makes no such claim above. His point is just that we haven’t got a conclusive reason for Haren’s altered performance and, at this stage, it could simply be a random variation.

    That this could be random variation doesn’t imply it is. It doesn’t even imply that it probably is. It’s basically an expression of our frustrating epistemic position.

    Vote -1 Vote +1

    • CircleChange11 says:

      It could be viewed just as “randomness” because we don’t have any more data than just his season line. So “randomness” could just be baseball’s version of “God”, where that’s the default answer for things we don’t know or cannot explain.

      For example, if we had Haren’s charts for [1] BABIP for the strike zone (broken into 15 zones, 9 strike, 6 balls) for the 2010 season, [2] same chart for his career, and [3] a chart for leaguewide data (or whatever) … there might be something REALLY revealing.

      Perhaps we find out that he’s throwing the ball in the “5 zone” (dead center) much more often than he has for his career. Since balls in zone 5 have a higher BA and HR rate (by looking at hitters charts for the zones), that could be a well-accepted explanation.

      But, we don’t have any of that (I’m sure MLB teams do), we just have his season stat line (and slightly more advanced metrics) compared to his career. When you think about it, we really don’t have that much info, nor do we have the BEST info.

      Think about when you are ill and go to the doctor, it “could be” lots of things, but the more data you collect, the better the doctor’s analysis can be. The more data you have, the more accurate the “diagnosis” will be.

      Vote -1 Vote +1

      • CircleChange11 says:

        If we could have BABIP info for a pitcher that included [1] location, [2] count, [3] pitch type … that would be really informative.

        If we could add in “pitch sequence” that would be even better … but lets get 1, 2, and 3 first. *grin*

        Vote -1 Vote +1

      • CircleChange11 says:

        FWIW, I do accept some randomness.

        One could throw Pujols (or any MLB batter) 20 identical pitches right down the middle at 90 mph. Whether they go for HRs, doubles, pop ups, etc is not something the pitcher can control.

        But throwing those 20 pitches in the center increases the probability of a hit on BIP more than throwing those same pitches to zone 1 “low and away”. Even 20 pitches in zone 1, Pujols may hit a couple of homers.

        Throw in good movement with good location and effective changing of speeds, and the pitcher drastically lowers the probability of good contact (or the probability of a hit on a BIP).

        Randomness happens, I accept. But, sometimes randomness is just “not having enough information”. Knowing when the situation is which is where it’s at, IMHO.

        It would be nice sometimes to just be able to chalk it up to “randomness”, but my mind doesn’t operate in such a manner … it always wants more information to find a cause/influence/probability/etc.

        Vote -1 Vote +1

      • CircleChange11 says:

        I also suppose some genius can or has, broken a standard baseball field into “areas”, viewed average velocity of BIP, trajectory, and a host of other factors and arrived at a probability or BABIP that has to be some type of MIN or MAX, and that number may actually be .300.

        Meaning that evn if a pitcher was able to throw the “right pitch” in the “right location” at the “right time”, they still might not be able to get a batter out every single time … as I said a ‘minimum’ BABIP simply due to “open area” in the defense, and human limitations of speed of movement, reaction, etc.

        Ah, I’m just rambling now … and in an old discussion no less. Hello? Anyone out there? Is this thing on?

        Vote -1 Vote +1

  42. baty says:

    So, I’m a total hack, but I thought the idea was that often times occurrences are unexplainable because it’s impossible to account for absolute pattern (randomness). And while the formulas that are used here account for many of those immeasurable occurrences (by developing probable outcomes) to a reasonable degree, the formulas don’t have the ability to fully explain or define many of those (immeasurable) occurrences that are being accounted for.

    Or am I just confusing myself… haha

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>