I think the BMI would be more useful with 3 categories, fat, thin, and medium; even though this would make sample sizes smaller. We always need some sort of measure of statistical relevance; standard deviations, error bars etc. Except possibly for age, this looks to me just like random variation.
Anything can happen and it will be intersting to see if this model is predictive but if Mark Buehrle ever makes less than 30 starts I’m going to start waking up early every day to make sure the sun rises in the east.
Thanks for providing the data; I fired up SAS to take a look at it.
I ran a Logit regression on whether a pitcher went on the DL or not. None of the variables were explanatory.
I next ran a linear regression on DL length of stay for those who were on the DL. That yielded a one-variable model that was not significant at the alpha=.05 level (although did make an alpha=.10 cut). The R-squared is terrible (.006). The variable that was included was BMI; higher BMI values were mildly and weakly associated with lower stays on the DL.
Conclusion: whatever causes variance in propensity to go on the DL or length of stay on the DL isn’t apparent in this data, at least on a first cut model.
BMI is a flawed characterization of body size ! Its an old school stat much like we used to use RBIs to measure the productivity of a baseball player. Joe Carter had 12 straight straight years of 100+ RBIs but we all know he shouldn’t be in the HOF and benefited from having some high OBP guys in front of him. I have a BMI of 28.2 but my wife (who is a physiotherapist) says that I am skinny. I am wide but muscular and thin. BMI accounts only for height but not the width of a person — me being in the 95% percentile of width vastly skews my BMI as I have more mass/cm of height but my width/cm height is way higher than most. Aparently %body fat (the sabermetric of body characterization) is a much better indicator of physical fitness/body type and might be a better indicator of strain put on a body. Although its not a readily available stat I would love to see these stats re-calced with %fat as the “counting” stat
Hurrah! Someone actually looking at statistical significance. I applaud Mr. Zimmerman for investigating this and especially for making the data available, even though the apparent conclusion is “we can’t predict who will go on the DL or for how long.”
Comment by lex logan — January 22, 2011 @ 10:37 pm
[…] Starting Pitcher Disabled List Analysis (1 of 3) (Jeff Zimmerman, FanGraphs) — see also Part 2 and Part 3 […]