Starting Pitcher DL Projections (Part 1)

This study is a start at looking at injury projections. It is far from perfect, but I hope to get the ball rolling to help to get some initial numbers for people to mull over. I am just looking at the chances of a starting pitcher going on the DL and will look at projected time lost later.

The main problem with projecting the chances of going on the DL is what pool of players to look at going into a season. All players could be lumped together, but there is definite differences in DL trips between pitchers and position players. Also, when looking just at pitchers, there are starting versus relief pitchers and their usage patterns. Finally, the injury information (Josh Hermsmeyer’s injury database for 2002 to 2009 and mine for 2010) is only for major league DL trips and doesn’t included any minor league information. For this reason, I am only trying to look at players that are established major league starters and had little chance of going to the minors during the season.

After talking to a few people (Thanks – Tommy, Justin and Sky), I figured I would limit my first look to pitchers that threw over 120 innings and made 20 starts in the previous season (41% of this group of pitchers spent some time on the DL). These two number seems to divide the pitchers into established starters and rest of the league’s pitchers.

Once setting the parameters, I ran a logistic regression against the data. A logistic regression looks at the chances of an event happening (going on or not going on the DL) knowing a set of conditions (age, previous trips, etc). For example, ten pitchers could have the same set of conditions and if 3 of them went on the DL, each of the 10 would project to have 30% chance of going on the DL. After running several regressions with different inputs of data (innings thrown, GS, trips to DL, time on the DL, types of injuries) from 2006 to 2009, I ended up using the following categories:

Numbers of Years with DL Trips (Max 3) – In the past 3 years, how many of those years did the player go on the DL. I didn’t look at total number of trips because many trips are back to back for the same injury. So if looking at a pitcher for 2011 and they went on the DL twice in 2008 and once in 2010, they would have a value of 2 for this variable.

Game Starts in the past 3 years – I am using GS to tell the difference between a pitcher’s first year in the league when their durability is unknown vice pitchers that have been starting for a few years.

Age – The age used is when the pitcher threw 120 innings and had 20 GS. For example, a pitcher going into the 2011 at age 25, the age of 24 would be used.

With those variables, here is the equation I generated to predict time lost (again 41% is the league average for this pool of players):

1/(1+e^(-z))

where:
z = (.2209)(Years with Trips to DL)+(-0.0040)(GS in last 3 year)+(0.0509)(Age in previous season)-1.7692

Using the equation, here is an example of the values of an imaginary pitcher and his chances of going on the DL:

As the pitcher gets established in the league, his DL chances go down even though he is aging. Once he gets injured, his chance of going on the DL begin to go up at a decent clip. Generally, here are the changes in percentage for players going on the DL for each of the 3 events:

One year older = +1%
33 more game started = -3%
1 year of Injuries = +8%

Well, that is it for today. Tomorrow I will list at the 2011 projections, answer some questions from this article and go over some possible future improvements I see I can make on projection DL time.




Print This Post



Jeff writes for FanGraphs, The Hardball Times and Royals Review, as well as his own website, Baseball Heat Maps with his brother Darrell. In tandem with Bill Petti, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @jeffwzimmerman.


16 Responses to “Starting Pitcher DL Projections (Part 1)”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. peachesnnuts says:

    I’d love to see the derivation for the constants in your z equation.

    Vote -1 Vote +1

  2. J-Doug says:

    So many great uses for logistic regression in sports analysis, so few implementations. Love this.

    Vote -1 Vote +1

  3. shthar says:

    How bout running these projections for the 2010 season? So we can see if they work?

    Vote -1 Vote +1

    • Zac says:

      He includes the equation, it’s easy enough to figure things out for any year where that data is available.

      Highest chance of going on the DL in the 2010 season (50% or higher):
      Tim Wakefield: 62%, Jamie Moyer: 55%, Carl Pavano: 55%, Vicente Padilla: 54%, Brian Moehler: 53%, Rich Harden: 51%, Ryan Dempster: 51%, Kevin Millwood: 51%, Jeff Suppan: 51%, Jose Contreras: 50%
      6 of the 10 did end up on the DL in 2010.

      Lowest chance of going on the DL (30% or lower):
      Matt Cain, John Danks, Clayton Kershaw, Chad Billingsley, Edwin Jackson, Rick Porcello, Tim Lincecum, Justin Verlander, Trevor Cahill, John Lannan.
      1 of the 11 did end up on the DL in 2010.

      Vote -1 Vote +1

      • Jeff Zimmerman says:

        The best way to see how well it holds up is to add up the % and compare to the total numbers.

        So for the first group 5.3 should go on the DL and 6 did.

        The second group should be near 3 players on the DL and 1 did.

        Vote -1 Vote +1

  4. MikeS says:

    40% risk of DL time for anybody throwing 120 innings? No wonder some GM’s won’t give long term contracts to pitchers.

    Vote -1 Vote +1

  5. Hank says:

    Interesting effort. I also would be curious to see this applied to pre-2010 to see what it would have predicted for the 2010 season

    Would it also be possible to publish a standard dev? (or provide a range with confidence level on the #’s?)

    And probably too small a sample size, but it might be interesting to look at pitcher types down the line if things look promising
    - physical differences… ex height/weight buckets (CC vs Lincecum)
    - pitch repertoire (either pitch types: slider heavy vs cutter, high curveball %’s, etc…. or fastball speed)

    All this would probably lead to small samples (and probably be tough to draw conclusions on), but if the equation shows some promise it might be an interesting next step (especially when hearing “his slight frame scares me” or “he has a sturdy build” observations that are often thrown out for prospects)

    Vote -1 Vote +1

    • Socrates says:

      I agree strongly with trying to add body size and pitch types. There is no doubt that certain pitches are putting more stress on shoulders and elbows.

      Vote -1 Vote +1

  6. bob says:

    So injury proneness seems to be statistically proven?

    Vote -1 Vote +1

  7. [...] I went through the formula used for predicting which starting pitchers have the greatest chances of going on the DL in …. Now here are the projections for 2011. Besides revealing the list, a few other points and possible [...]

    Vote -1 Vote +1

  8. Joe says:

    It’s definitely a nice try at something valuable. You warn not to put much stock in it which is probably good advice.

    Vote -1 Vote +1

  9. Z says:

    Mechanics. Mechanics. Mechanics. Mechanics. However, since theirs no concensious on correct mechanics, I guess this method is as good as any.

    Vote -1 Vote +1

  10. [...] disabled list in 2011. His methodology isn’t terribly complex, so check out his two posts (part one, part two) for an explanation. He essentially based the projection on age and the pitcher’s [...]

    Vote -1 Vote +1

  11. [...] of arm problems, and projecting pitcher injuries is never easy, but Jeff Zimmerman has recently done some interesting work on the likelihood of various pitchers landing on the DL. One of the first things you might notice [...]

    Vote -1 Vote +1

  12. [...] recently posted a projection formula (here and here) that estimated the chance of a starting pitcher spending time on the disabled list. To [...]

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *