Don’t Trust Stats This Week

The absolute hardest part of being a fan or analyst is avoiding the use of small samples to make a definitive claim. Allowing unreliable information to shape an opinion or serve as the foundation for an important decision is a mistake. This especially occurs in the opening week of the season when the statistics produced are meaningless and everyone is susceptible to the manipulation of numbers out of sheer joy that baseball has resumed. Just because Ramon Hernandez went 4-5 yesterday does not mean he has regained his stroke and will have an all-star caliber season. The same can be said if he starts the season 12-20, or 21-40.

We have enough information about his true level of abilities at this point in time that 20 trips to the dish is nothing more than a mere blip in the dataset. But when can we be sure that a trend to open the season is actually indicative of a noteworthy change?

The answer is not cut and dried. It depends upon the specific statistic being examined. As I wrote back at the beginning of 2009, statistics “stabilize” at different thresholds of plate appearances. A trend can be considered significant for certain statistics sooner than it can for others. Having this knowledge can prevent rash decisions in fantasy leagues, and can aid in the avoidance of anointing players as having breakout years or writing them off completely. The results in the prior article were based upon the tremendous work of Russell Carleton, my former colleague at Statistically Speaking, and current analyst for the Cleveland Indians.

The method known as split-half reliability was utilized, which measures the correlation between different parts of the same dataset. An example would involve separating Matt Holliday‘s even-numbered plate appearances from his odd ones, and then running a correlation on both bins. When the correlation between the bins is somewhere around the 0.7 range–correlations run from -1 to 1, with +/- 0.7 or above indicating significance in a statistical study–the statistic can be considered useful in forming opinions and noting trends.

By ‘useful’ I am referring to the notion that our expectations moving forward can be more narrowly defined. The goal then becomes finding the lowest PA total at which point the correlation is significant. Holliday has averaged around an 11% walk rate over the last three seasons. If in his first 200 PAs this season his walk rate has dropped to five percent, the split-half reliability test will tell us how closely his second 200 PA bucket will mirror the first.

If the correlation is close to 1.0 then we can say with an increased level of confidence that his walk rate will remain at five percent. Even though his past exploits are known, that specific statistic can be indicative of a change at around the 200 PA mark. When using numbers to form opinions, isn’t that type of assumed reliability the desired result? Nobody knows for sure what exactly will happen over those next 200 PAs, but tests like this help to reduce the range of possibilities and eliminate some guessing in the dark.

The thresholds for various statistics offered on this site are as follows:

 50 PA: Swing %
100 PA: Contact Rate
150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA
200 PA: Walk Rate, Groundball Rate, GB/FB
250 PA: Flyball Rate
300 PA: Home Run Rate, HR/FB
500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate
550 PA: ISO

How can this information be used? Well, it’s unlikely that anyone will rack up 50 plate appearances in the first week of the season, so it will take at least two weeks before the first trend on offense can truly be tracked. A hot start from an unexpected player might be interesting to flag for future evaluation, but cannot be considered noteworthy until much more of the season has passed.

My goal isn’t to serve as a wet blanket or Debbie Downer, but rather to shed light on seminal research that can help everyone keep some perspective in the opening week(s) of the season. Not only should fans and analysts avoid letting preformed opinions shape their perceptions of a player based on numbers produced at the start of the season, but the statistics themselves should not even be considered worth discussing until they surpass the aforementioned plate appearance marks.




Print This Post



Eric is an accountant and statistical analyst from Philadelphia. He also covers the Phillies at Phillies Nation and can be found here on Twitter.


43 Responses to “Don’t Trust Stats This Week”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. CircleChange11 says:

    Just because Ramon Hernandez went 4-5 yesterday does not mean he has regained his stroke and will have an all-star caliber season.

    But, his is clutch, right? Heh Heh.

    Good on ya for posting the thresholds. Useful information.

    Vote -1 Vote +1

  2. Joe says:

    Good article.

    “Wet blanket” and “seminal research” shouldn’t be used in the same sentence, though…

    +19 Vote -1 Vote +1

  3. MikeS says:

    You mean I can’t extrapolate Ramon Hernandez’s 0.3 WAR day into a 48.6 WAR season? Time to dump him from my fantasy team. Maybe I can trade him for that Pujols guy and his wRC+ of -100.

    Vote -1 Vote +1

  4. Cabrone says:

    Great exercise, can you rerun it for pitchers, or are do you think the numbers would be about the same?

    Vote -1 Vote +1

  5. Mile says:

    I just learned about split half reliabaility the other day, so this is especially awesome. Is there any difference between career trends and in season trends? If so, did you find thresholds for them?

    Vote -1 Vote +1

  6. dutchbrowncoat says:

    april fools! these stats are actually useful and albert pujols will be the worst hitter in baseball this year!

    +20 Vote -1 Vote +1

  7. powder blues says:

    A uniquely Fangraphs article – love it.

    Vote -1 Vote +1

  8. B N says:

    Hmm… looks like there’s an argument brewing between Seidman and Appelman:
    http://www.fangraphs.com/blogs/index.php/2011-stats-working/

    So are they working or should we not trust them?! Inquiring minds want to know! (For the record, I side with Seidman. I’m pretty sure that Kimbrell won’t put up a -2.28 ERA or FIP for the balance of the season. … Though I’ve been wrong before. Maybe the man just erases runs!)

    Vote -1 Vote +1

  9. phoenix2042 says:

    thats going to be really helpful for fantasy managers. And i was convinced the nats werent going to score a run all season, darn. And it kind of looked like matt kemp was going to take 486 walks this (3 per game)!

    Vote -1 Vote +1

  10. Ari Collins says:

    Amusing that this is posted right after “2011 Stats Working!”

    As an English major, I’m naturally a bit unsure of what I’m seeing, though. Which stats are for pitchers, which for hitters? Is there a difference between the thresholds for similar statistics whether they’re for pitchers or hitters? Like, say, strikeout rate?

    Vote -1 Vote +1

  11. Kapellmeisters says:

    Great stuff here

    Vote -1 Vote +1

  12. phoenix2042 says:

    when does pitch velocity become significant?

    Vote -1 Vote +1

    • Eric Seidman says:

      Yeah, the big thing with pitchers is that around 150 PA we gain some idea about the change in groundball rate. This was helpful in identifying Cliff Lee’s breakout a few years ago.

      Vote -1 Vote +1

  13. Jeremy says:

    How many PA before BABIP will stabilize?

    Vote -1 Vote +1

  14. dudley says:

    we might not be able to trust the stats, but we can trust our eyes, right? e.g., nagging injuries don’t seem to be sapping braun’s power anymore.

    Vote -1 Vote +1

  15. Jeff Zimmerman says:

    We can begin trusting them next week right?

    Vote -1 Vote +1

  16. Wraithpk says:

    Could be longer. Austin Jackson (since he is the poster-boy for the dangers of babip) had 675 plate appearances last year, and put about 450 balls in play. At that rate, it would take him 8-9 years to reach 3800 balls in play.

    What this says to me is that babip never truly stabilizes, because a player won’t be the same at the end of that eight years that he was at the beginning. It takes you eight years of data before you can say “ok, I have a good idea of this guy’s true talent.” Only, you can’t say that, because what you know is his average over that eight year span. He might not be that player anymore.

    What I would be interested in seeing is how long it takes for xbabip to stabilize.

    Vote -1 Vote +1

    • CircleChange11 says:

      And players with horrible BABIP stats won;t be in th eleague that long.

      So, just like pitchers, whatr we get are comparing BABIPs of guys that are in the league for 10 years … so we’re trying to find differences in the top 5% of an already elite field.

      Not surprisingly we come to the conclusion that “there’s no real difference”.

      Vote -1 Vote +1

  17. mcneo says:

    I can just see it now…

    Cardinals Announcer:
    “Tony La Russa is now pinch hitting Chris Carpenter for Albert Pujols because Chris Carpenter has a .333 BA in even numbered at bats with a slug percentage of 1200!! Albert is about to have an odd numbered at bat and he only hits .311/.357/.516 in odd numbered at bats… “

    Vote -1 Vote +1

  18. Detroit Michael says:

    Do you no longer write for BaseballProspectus.com? I don’t recall their regular contributors usually writing for multiple websites (not that I mind).

    Vote -1 Vote +1

  19. Chip Buck says:

    Wait, so you’re saying I shouldn’t have dropped Albert Pujols after his three double play performance yesterday???

    Vote -1 Vote +1

  20. don says:

    ChiSox are going to score 5000 runs this year.

    Vote -1 Vote +1

  21. woodman says:

    J.P. Arencibia is now on pace for 324 homers. The Blue Jays are on pace for 648 homers. Sounds reasonable.

    Vote -1 Vote +1

  22. pft says:

    SSS stats are simply small pieces of new information that if used wisely may be useful in some cases. Any SSS stats must be coupled with observation to provide any meaning.

    SSS stats are not predictive, however, unusually good or bad numbers could signal a break out year or collapse, or suggest a hidden injury (or be evidence a players recovery from a previous injury is not complete) .

    In any event, while not predictive, the stats are a good indicator of how a player has performed over a short period, even if some of that may be due to good or bad luck. We get excited over no hitters and batters hitting for the cycle, and these are SSS stats in the extreme, and influenced by luck..

    Vote -1 Vote +1

  23. Kevin says:

    Another joke about how I shouldn’t judge Albert Pujols this year based on his performance yesterday.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>