This study is a start at looking at injury projections. It is far from perfect, but I hope to get the ball rolling to help to get some initial numbers for people to mull over. I am just looking at the chances of a starting pitcher going on the DL and will look at projected time lost later.
The main problem with projecting the chances of going on the DL is what pool of players to look at going into a season. All players could be lumped together, but there is definite differences in DL trips between pitchers and position players. Also, when looking just at pitchers, there are starting versus relief pitchers and their usage patterns. Finally, the injury information (Josh Hermsmeyer’s injury database for 2002 to 2009 and mine for 2010) is only for major league DL trips and doesn’t included any minor league information. For this reason, I am only trying to look at players that are established major league starters and had little chance of going to the minors during the season.
After talking to a few people (Thanks – Tommy, Justin and Sky), I figured I would limit my first look to pitchers that threw over 120 innings and made 20 starts in the previous season (41% of this group of pitchers spent some time on the DL). These two number seems to divide the pitchers into established starters and rest of the league’s pitchers.
Once setting the parameters, I ran a logistic regression against the data. A logistic regression looks at the chances of an event happening (going on or not going on the DL) knowing a set of conditions (age, previous trips, etc). For example, ten pitchers could have the same set of conditions and if 3 of them went on the DL, each of the 10 would project to have 30% chance of going on the DL. After running several regressions with different inputs of data (innings thrown, GS, trips to DL, time on the DL, types of injuries) from 2006 to 2009, I ended up using the following categories:
Numbers of Years with DL Trips (Max 3) – In the past 3 years, how many of those years did the player go on the DL. I didn’t look at total number of trips because many trips are back to back for the same injury. So if looking at a pitcher for 2011 and they went on the DL twice in 2008 and once in 2010, they would have a value of 2 for this variable.
Game Starts in the past 3 years – I am using GS to tell the difference between a pitcher’s first year in the league when their durability is unknown vice pitchers that have been starting for a few years.
Age – The age used is when the pitcher threw 120 innings and had 20 GS. For example, a pitcher going into the 2011 at age 25, the age of 24 would be used.
With those variables, here is the equation I generated to predict time lost (again 41% is the league average for this pool of players):
z = (.2209)(Years with Trips to DL)+(-0.0040)(GS in last 3 year)+(0.0509)(Age in previous season)-1.7692
Using the equation, here is an example of the values of an imaginary pitcher and his chances of going on the DL:
As the pitcher gets established in the league, his DL chances go down even though he is aging. Once he gets injured, his chance of going on the DL begin to go up at a decent clip. Generally, here are the changes in percentage for players going on the DL for each of the 3 events:
One year older = +1%
33 more game started = -3%
1 year of Injuries = +8%
Well, that is it for today. Tomorrow I will list at the 2011 projections, answer some questions from this article and go over some possible future improvements I see I can make on projection DL time.