I Fought Marcel And Marcel Won

Like my work last year around pitcher aging and velocity decline, I am always looking for reliable indicators or signals of change in players. One thing I’ve been interested in trying to better understand are changes in performance that might signal or herald a large droop-off in performance in the following year.

Projection systems do a very good job of predicting performance, but my thought was there must be some way to better predict the 2011 Adam Dunns of the world.

So, one Saturday morning I decided to do some statistical fishing.

I took three year trended data for hitters with greater than 300 plate appearances from 2007 through 2012 and identified hitters whose wOBA declined by at least .030 points from one year (YR2) to the next (YR3). I then backed up and looked at the change in various statistics between YR1 and YR2 to see if there was a way to better predict those players that were poised to experience a large decline in their wOBA in YR3.

I even had a nifty nickname ready for the eventual statistic–CLIFFORD. (The idea being that the statistic would help predict hitters likely to “fall off a cliff” offensively. Hey, when you do this kind of work as much as I do, it just comes with the territory.)

I calculated quartile values for the year-over-year change in about 40 statistics, including various plate discipline and PITCHf/x metrics. I then calculated the relative risk that a hitter would put up a wOBA at least .030 points lower in YR3 if their performance in each of those statistics decreased from YR1 to YR2 by at least as much as the 25th percentile.

Here are the metrics with a relative risk over 1:

Metric Relative Risk of YR3 Delcine
YR2_minus_YR1Z-Contact% (pfx) 1.45
YR2_minus_YR1UBR 1.29
YR2_minus_YR1FA% (pfx) 1.27
YR2_minus_YR1Spd 1.24
YR2_minus_YR1K% 1.19
YR2_minus_YR1GB% 1.16
YR2_minus_YR1Contact% (pfx) 1.15
YR2_minus_YR1FO% (pfx) 1.12
YR2_minus_YR1IFFB% 1.11
YR2_minus_YR1Swing% (pfx) 1.09
YR2_minus_YR1Zone% (pfx) 1.08
YR2_minus_YR1FB% 1.03
YR2_minus_YR1Z-Swing% (pfx) 1.01

I then decided to see if there was some combination of metrics that would provide decent predictive power. So, I tried a few combinations, looking to classify hitters as “cliff” candidates if they had at least three of the following four:

  • YR2 – YR1 at or below the 25th percentile in Z-Contact%(pfx)
  • YR2 – YR1 at or below the 25th percentile in UBR
  • YR2 – YR1 at or below the 25th percentile in Spd
  • YR2 – YR1 at or below the 25th percentile in FA%

Why did I choose these four? First, I needed to start somewhere. Second, they were the four metrics with the highest relative risk to players that declined by more than .030. And, third, they made sense as a starting point. Players that suffer a large decline in zone contact could be experiencing a decline in bat speed. The two base running metrics (UBR and Spd) might help predict a little bit of aging, but more importantly just a general decline in speed and quickness which would impact performance. Four-seam fastball percentage also made sense to me, since hitters that faired better against straight fastballs might have trouble adjusting if pitchers stopped throwing them as much.

I then calculated the relative risk of a player’s wOBA declining by more than .030 if they were tagged as a CLIFFORD candidate to those who were not tagged:

Group % Declining by more than .030
CLIFFORD Candidates 53%
Non-CLIFFORD Candidates 25%
Relative Risk 2.1

I here’s where I got excited. This methodology identified players that were twice as likely to see their offensive performance decline sharply. Not only did I get excited, I also got ahead of myself. I took to twitter, teasing that I had this wonderful new toy that could help identify decline candidates. I had the article all written in my head. I was all set to go.

But then I stopped and thought, well, maybe projection systems already do as good of a job predicting this type of decline. Maybe even Tango’s Marcel does–I just had assumed they didn’t. The hypothesis, then, was that CLIFFORD would produce a higher relative risk regarding declining players than Marcel.

So I went back and matched up Marcel projections for a players in the data set and identified those players that Marcel thought would decline by at least .030 and then calculated the relative risk compared to players not tagged by Marcel as decliners:

Group % Declining by more than .030
Marcel Candidates 58%
Non-Marcel Candidates 24%
Relative Risk 2.4

Stymied, thwarted, mission aborted.

For all my calculations and gyrations, simply looking at Marcel projections would have done just as good–in fact, slightly better–of a job at identifying cliff candidates*.

Why am I sharing this negative (in some sense, embarrassing) result with you fine readers?

For starters, we don’t do enough of this. By we, I mean any researcher in just about any discipline–not just baseball analysis. More and more, journals, newspapers, online forums, and conferences are filled with the reporting out of positive results (“hey, look, I confirmed my hypothesis! I discovered a new thing!”). And while positive results are interesting and important in their own right, negative results are (or, should be) just as important. Any discipline progresses in large part by falsifying hypotheses, replicating results, and figuring out what doesn’t work so it can focus on what might.

Of course, I’m just as guilty of this as the next guy. Hopefully this will inspire more people to write up and post their own failures, since they can move us forward just as much as the successes.


*A few commenters have rightly noted that given the lack of overlap between CLIFFORD and Marcel decline candidates there is still a chance of something interesting there. I still think the “failure” story is worth telling, but I will be looking into what use we might still be able to pull out of CLIFFORD in the near future.

Print This Post

Bill works as a consultant by day. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, consults for a Major League Baseball team and appears on MLB Network's Clubhouse Confidential. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Tumblr or Twitter @BillPetti.

22 Responses to “I Fought Marcel And Marcel Won”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. David says:

    This wasn’t clear to me from reading this article, but are CLIFFORD’s candidates and Marcel’s candidates the same group of players, or at least largely so?

    If they are, then yeah … stymied. But if it’s a wholly different set of players, then not so much

    Vote -1 Vote +1

    • Bill Petti says:

      No, I didn’t mention that but it’s a good point. Of the 34 CLIFFORD candidates, only 3 were also identified by Marcel. So, yes, there may be something there to investigate. I just haven’t gotten back to it. So, not completely a loser, but still thought the overall “failure” story might be worth telling.

      Vote -1 Vote +1

      • Matt Hunter says:

        Wow, only 3? That’s very interesting, and definitely makes me think it wasn’t a failure at all. If you can identify an entirely different set of players poised to decline, then that’s fantastic!

        Vote -1 Vote +1

      • cass says:

        Based on what you’ve said, if you create a new metric that adds together all of the CLIFFORD candidates and all the Marcel candidates, then this new metric would be much, much better than Marcel. You haven’t failed at all.

        To be more elegant about it, of course, you’ll want to see how Marcel is identifying these candidates and incorporate it into your own metric. Or, perhaps more accurately, projection systems like Marcel will eventually want to incorporate your new findings, I imagine.

        Vote -1 Vote +1

      • MDL says:

        We definitely need to see a follow-up article at the end of the season to see how well CLIFFORD and Marcel decline projections fared.

        Vote -1 Vote +1

      • out of curiousity, who were the three overlap players?

        Vote -1 Vote +1

  2. John R. Mayne says:

    Positive outcome publication bias does occur, but it leads to more interesting papers. To quote OCP executive Dick Jones, “Who cares if it works or not?”

    You could have also solved the lack-of-results problem rather easily in a well-established social science method delineated in this paper: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571 .

    Result fabrication leads to much more interesting conclusions than “I have found nothing interesting.” (You can also run correlations of enough data to get a nice positive result; did you know that sales of organic foods have a brilliantly high correlation to autism diagnoses?)

    Applying effort to find the right answer is hardly going to get you a grant. As it stands, Lennay Kekua does not like your article.

    Vote -1 Vote +1

  3. Bill Petti says:

    @Matt and @cass: I totally agree–for this I wanted to make sure I reported out on one type of failure (re: the hypothesis that relative risk would be better with CLIFFORD than Marcel), but there is likely something to be had by combining the two together.

    Vote -1 Vote +1

  4. AC_Butcha_AC says:

    Am I missing something or did you leave age related decline out of this? Pretty sure Marcel uses age as a factor of decline.

    This leads me to an assumption:
    I think the marcel candidates are strongly correlated to age, while it seems that you found skillsets that are prone to decline.

    This is where the difference in your candidates comes from IMHO.

    could you check this? and correct me if I’m wrong

    Vote -1 Vote +1

  5. Jaker says:

    Interesting stuff!

    Seems that you’ve jumped the gun twice.

    A lot of your CLIFFORD factors seem common sense as well.

    In terms of future progress:

    – Did you re-run the relative risk including the 5th parameter, change in K%? That might boost your relative risk even higher. It’s probably correlated with contact% but unless that correlation is perfect you’re likely losing some predictive power.
    – Have you considered including more years in your data set?
    – I’m somewhat new to Sabermetrics so perhaps this has been answered before, but is there a reason you’ve not built a linear regression model with these data? Wouldn’t this help you refine your model even further?

    Thanks! Not a failure at all.

    Vote -1 Vote +1

  6. matt w says:

    Are the Marcel candidates generally guys who regress back to their norms after a career year? From the way I understand Marcel to work, that would pretty much have to be it.

    Because if CLIFFORD finds a way to identify players who are suddenly going to drop off from their career norms, that would be interesting.

    Vote -1 Vote +1

    • evo34 says:

      Yeah, I would think Marcel is just picking the low-hanging fruit (predicting regression for guys coming off ridiculous years), whereas your system clearly is not doing so, but rather is filtering for guys who are coming off a relatively bad year. Look forward to more info.

      Vote -1 Vote +1

  7. Bill Petti says:

    @juan pierres mustache Miguel Olivio in 2009, Robinson Cano in 2011, Ryan Zimmerman in 2011–they were both predicted, but not all actually declined by more than .030.

    Vote -1 Vote +1

    • tylersnotes says:

      i wouldn’t give up just yet either. I would want to know A) of the differences between clifford and marcel, which has a higher success rate? B) if clifford isn’t taking age factors into account, do you surpass marcel once you do add these factors in? you use UBR and SPD to account for aging but i’m not sure that’s sufficient unless changes in those stats from year to year are largely attributable exclusively to age. C) what happens if you expand the categories?

      If you feel like Marcel effectively disproved your hypothesis and that this research doesn’t warrant further pursuit, i don’t think you proved that in this article.

      I love the “inside baseball analytics” stuff, but i see this more as “showing your work” than “acknowledging a complete bust.” Maybe this could be part 1 in a series about developing a better prediction? Make some adjustments and report back, etc. Even if clifford doesn’t pan out, it could be used to test against existing projection systems and compare how, say, marcel does at predicting decline vs other systems

      Vote -1 Vote +1

  8. rotobanter says:

    And i’m sure Bill, that your peripherals in question are often associated with the “natural” declines/trends per Marcel’s system – i mean even the KPI’s you referenced: z-contact, UBR, speed, & FA/catching up with the FA, etc. all usually decline with age (at least that’s a good assumption). I think if you can disassociate variables with age trends, your research would have significant utility. I think it still has great significance because even accounting for age, a combination of certain variables will associate moreso with a visible decline.

    With that said, your system should still show value if it points to red flags that dont often correlate significantly with age trends, but it’s incestual if the relative risks you start with all relate to age.

    I always enjoy when someone works relentlessly to wind up going against their own hypothesis. It shows integrity.

    Vote -1 Vote +1

  9. Moonraker says:

    It seems like even if your decline predictions had already been found by Marcel, that doesn’t make your result a negative one. It still would have been a positive result (you confirmed your hypothesis), it’s merely that you would have replicated the positive result found by Marcel.

    Vote -1 Vote +1

  10. MikeS says:

    For starters, we don’t do enough of this. By we, I mean any researcher in just about any discipline–not just baseball analysis. More and more, journals, newspapers, online forums, and conferences are filled with the reporting out of positive results (“hey, look, I confirmed my hypothesis! I discovered a new thing!”). And while positive results are interesting and important in their own right, negative results are (or, should be) just as important. Any discipline progresses in large part by falsifying hypotheses, replicating results, and figuring out what doesn’t work so it can focus on what might.

    This is called “Reporting bias” and is a huge problem in all fields that I am familiar with. Boring negative results don’t get reported and when they do, they don’t get the press releases or the headlines in the lay press that the sexy positive ones do. You could have 14 people try to prove a hypothesis and fail, then one “proves” it correct with a p value of 0.1 and guess which one people remember? Now you have to do the original 14 studies over again as “confirmatory” studies and then they get reported. It’s a waste of resources.

    Vote -1 Vote +1

  11. GordieDougie says:

    “I have not failed. I’ve just found 10,000 ways that won’t work.” -Thomas Edison

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>