Starting Pitcher DL Projections (Part 2 of 2)

Yesterday, I went through the formula used for predicting which starting pitchers have the greatest chances of going on the DL in a given year. Now here are the projections for 2011. Besides revealing the list, a few other points and possible improvements to the process will be discussed.

First, here are the five most and least likely starting pitchers (>20 GS and >120 innings in 2010) to go onto the DL in 2011 (creating these projections is still a work in progress, so no one should take too much stock in them right now):

There are no real surprises on the list, with young experience pitchers ~25% less likely to go on the DL than older injured pitchers. A complete list of players can be found here in this Google Doc.

The projections estimate that 45 of the pitchers will go on the DL sometime during the season, which is 39% of the pitchers being examined. Looking at 2010, the projections would have predicted 43 players going on the DL and 37 actually did or 34% of the total pitchers.

Knowing just the chances for going on the DL is not the entire picture. The number of days lost also needs to be known, but I have not figured out a good way to get the days lost yet. Instead, here is a chart to use with the number of days lost for each trip to the DL for this group of starting pitchers:

One encouraging sign from the proceeding graph is that, once on the DL, the pitcher has less than a 20% chance of missing more than 90 day in the season.

Besides figuring out the possible days lost, I may look into a couple other improvements in the future, as follow.

1. Use Tom Tango’s fan playing-time projections. Instead of looking at what a pitcher did the year before, it would look at how much fans think the pitcher will pitch in the coming year. He has only generated them for the past two years, so the data set used to create the projection would be limited.

2. Look at the pitcher’s fastball velocity. It seems that pitchers who throw harder are more likely to end up injured than pitchers that are soft tossers. I may add these pitch speeds in when I get around using Tom’s playing-time projections.

With the starting pitchers done for now, I will be moving on to relief pitchers next. Hopefully, I will have some data in the next couple of weeks.



Print This Post



Jeff writes for FanGraphs, The Hardball Times and Royals Review, as well as his own website, Baseball Heat Maps with his brother Darrell. In tandem with Bill Petti, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @jeffwzimmerman.


Sort by:   newest | oldest | most voted
Blue
Guest
Blue
5 years 6 months ago

The lowest chance is 27 percent. Wow.

Barkey Walker
Guest
Barkey Walker
5 years 6 months ago

These numbers need confidence intervals.

Barkey Walker
Guest
Barkey Walker
5 years 6 months ago

I guess I’ll note that these are not entirely straightforward to build, but under they are easier than you think. You just use the linear part of the model and build CIs like you usually do, and the map those CIs back to the 0/1 space.

Barkey Walker
Guest
Barkey Walker
5 years 6 months ago

I’m telling you that approximate CIs are easier than you think. The program that you use should be able to pop out prediction intervals, but if it can’t, you can use the VC of the last iteration of the weighted least squares to construct them just like you would for a regular weighted regression. The only difference is that you then have to take those predictions and run them through the logit function at the end to get the results on the interval [0,1].

baty
Guest
baty
5 years 6 months ago

Yeah I feel like this list just sort of restates the obvious of the past…

I find it hard to believe that…
Ubaldo
Lincecum
Lester
Volstad
Cain

5 very different pitching builds, different pitch selections, different pitching strains, would all be lumped together within a 0.3% probability.

Also I would tack on an extra 15% to any pitcher that has to throw for Dusty Baker, haha.

camisadelgolf
Guest
camisadelgolf
5 years 5 months ago

Again with Dusty-Baker-destroys-arms propaganda . . . Please name a single pitcher with decent pitching mechanics (i.e. not Kerry Wood or Mark Prior) that Baker has destroyed with overuse.

Jesse Wolfersberger
Member
Member
5 years 6 months ago

Might also be worth adding another age variable. If you include both age and age-squared, you can get a parabolic shape on the variable.

This is purely conjecture, but I think that very young pitchers have an increased risk of injury just as very old pitchers do. Might be worth examining.

Also, any thought to including innings pitched instead of games started? Or maybe innings per start? Seems like this is usually used for this type of equation instead of games started.

Good work though, I’m looking forward to seeing the results with pitch speed as well.

baty
Guest
baty
5 years 6 months ago

Stuff that I wonder about when it comes to the difference between myth and reality…

innings per season?
innings per start?
pitches per start?
pitches per inning?
average days rest between starts?

For younger pitchers I always wonder how their progressed use throughout their minor league career contributes…

Is there any difference whatsoever between a drafted high school pitcher’s history and a drafted college pitcher’s history?

Off-speed pitch selections… Breaking ball usage compared to change-up usage… And is there an increased risk when you transition between particular breaking ball thresholds? for instance the risk of increased velocity with a 12-6 slow curve… the slurve… a slider transitioning to a cut fastball…

(I guess that could be something like the velocity differentials between certain pitches… Is a pitcher putting increased strain on his arm when they have unusually high or low pitch velocity differentials with breaking balls and fastball types.)

baty
Guest
baty
5 years 6 months ago

to clarify… I wonder how Clayton Kershaw might have effected injury probability with his pitch selection shift…

Going from a reliance on that big curve with a 20MPH difference to his fastball to using a frequent slider. It could be two fold… pitch stress and a more probably increase in control/command development (efficiency)?

Holier
Guest
Holier
5 years 6 months ago

So Hudson is extremely likely because of his TJ surgery a few years ago? Part of it would be age of course, but I think that probably overestimates his probability in your projections.

I like the process behind this, and I think when you look at the results (projections) and see stuff like this, it can make you question your formulas, which can lead you to tweak or confirm your methodology, neither of which of course are bad. I think there’s something interesting here though, even if what you have know is early in the process.

Jack
Guest
Jack
5 years 6 months ago

any chance we could get the team names in a collumn next to the pitcher’s name, that way you could see which rotations are most at risk.. then maybe in a later post you can see which clubs have the best minor league help and compare that to dl risk

WilsonC
Guest
WilsonC
5 years 6 months ago

Tim Hudson seems a bit off as one of the highest risk candidates.

Looking at Hudson, he’s been quite durable over his career, but missed a lot of games over 2008-2009 due to TJ. Chris Carpenter’s overall playing time statistics over the past three years match Hudson’s closely, and he’s about the same age, but his injury history is more extensive.

In Hudson’s case, he’s hit for 2 DL trips for one injury, and a lot of missed starts, but how does past TJ surgery actually affect a pitcher’s future risk once he’s made a recovery? With pitchers, TJ is such a common injury that causes enough missed time that it may need to be treated as a separate case in order to prevent confounding the results.

baycommuter
Guest
baycommuter
5 years 6 months ago

It needs a better mechanics component than prior DL trips. For example, does anyone really think there’s a 37% chance Barry Zito goes on the DL? The guy was taught perfect mechanics by Randy Jones when he was 10 years old and practiced them every day in the mirror for years so they’re constantly repeatable. One of the reasons the Giants gave him that contract is that Boras pushed hard on the “never been injured” theme. In fact, it’s kind of a problem because the Giants can’t hide him on the DL during his periods of ineffectiveness. (Zito’s protege, Dan Haren, who copied his warmup techniques in Oakland, is also unlikely to be injured).

Zac
Guest
Zac
5 years 6 months ago

Just want to point out that not all injuries are due to poor mechanics. Over the last 3 years, players have ended up on the DL for: anxiety, appendectomy/appendicitis, blisters, bone chips, blood clots, bruises, bursitis, concussions, fractures, infections, shin splints, and shingles, among many others.
Does it all add up to a 37% chance of ending on the DL? Probably not. But it’s not 0% either. How much it should be is up for debate, and I’m sure Jeff would love to get more accurate numbers if he could find a way to do so.

Adam
Guest
Adam
5 years 6 months ago

Wow, only a 42% chance Johan Santana ends up on the DL! And here I was, thinking he was recovering from surgery and would be on the DL until at least August. Phew!

Hank
Guest
Hank
5 years 6 months ago

One possible future suggestion. Maybe instead of using DL trips in the equation, that number could be waited by injury type?

This of course would lead to a big increase in subjectivity going into the formula, but does a sprained ankle deserve the same waiting as a shoulder injury in terms of number of DL trips? (I realize this gets factored in a bit as it will impact games started)

I’m not sure how you would come up with the weightings, but I just thought I’d throw that out there as a possibility.

hamandcheese
Member
hamandcheese
5 years 6 months ago

I could be wrong, but it seems like a decline in fastball velocity should be taken into account as well.

Sodomojostrikesback
Guest
Sodomojostrikesback
5 years 6 months ago

I ran the equation for Erik Bedard, just for kicks. 75%!

Zac
Guest
Zac
5 years 6 months ago

It’s not fair to run the numbers on Bedard because the calculations specifically excluded people who started less than 20 times the year before. We have no idea if the numbers under the calculated subset can be extrapolated to a more general population.

Sodomojostrikesback
Guest
Sodomojostrikesback
5 years 6 months ago

I didn’t mean to imply that Erik Bedard was a fair use of this calculation. It simply was out of curiosity since he is comically lacking in the three variables used. Obviously someone who doesn’t pitch the entire year (at least in the majors) won’t generate statistically significant data.

statistician
Guest
statistician
5 years 6 months ago

“I will tell you that I am 100% confident that the player will either be on the DL or won’t be.”

JZ’s response here is simply incorrect. It’s true that it’s not easy in practice to get the confidence interval for an estimated percent from logistic regression, but it’s incorrect to suggest that it’s not possible or that the question doesn’t make sense.

The presentation as a whole is leaving a lot to desire; as it is, I’m highly skeptical that what JZ did was done appropriately.

Matt
Guest
Matt
5 years 1 month ago

““I will tell you that I am 100% confident that the player will either be on the DL or won’t be.”

JZ’s response here is simply incorrect”

Are you saying that the chance that the player will either be on the DL or won’t be is NOT 100%? I hope you don’t work on any statistics I’m reading, in that case. The sum of probabilities of the sample space is generally accepted to be 100%.

statistician
Guest
statistician
5 years 6 months ago

JZ: would you be willing to post the data somewhere so I can take a look at it (I don’t know why sabermetricians don’t do this more often)? I realize my last comment may have come across as overly antagonistic, and I think it’d be more productive for me to take a stab and what you’re working than to sit around whining.

trackback

[…] Zimmerman is working on a DL prediction system for pitchers, and guess who’s expected to be the 2nd most fragile pitcher in the Majors this year? Our own […]

Ben
Guest
Ben
5 years 6 months ago

Wouldn’t this be something that might be better off modeled as a Decision Tree using a binary variable of “will player X go on the DL” or “will player X go on the DL for more than Y total days”

Love the dataset and the avenue you are pursuing.

trackback

[…] there’s another virtual certainty for Harden – he’ll hit the DL at some point. Per Jeff Zimmerman’s excellent work on predicting DL stints for starters, Harden has a 51% chance of hitting the DL in 2011. To some, that may sound low, but the bottom end […]

CarlosM7
Member
CarlosM7
5 years 6 months ago

If I told you how easy it is to get a job in this recession, you wouldn’t believe me. But the truth is more employers are going online to find people just like you and me who are ready to work at a good job (one that pays good!). The only thing that makes sense is to stop wasting time driving around all day filling out a dozen applications and going from one boring low paying job to another. I found this site that pretty much matches you up with your dream job that is available in your city right now. I have found it very helpful. Go to YouFindWork.com

trackback

[…] recently posted a projection formula (here and here) that estimated the chance of a starting pitcher spending time on the disabled list. To say the […]

wpDiscuz