# Improving a Starting Pitcher Injury Model With Multiple Classifier Algorithms

Starting pitchers are an expensive commodity in modern major league baseball. They only pitch every fifth day, but they command as high a salary as position players who play nearly every game. Due to the high level of stress put on their arms, they also tend to get injured at a higher rate, despite playing in many fewer games. Obviously, it would be valuable to be able to predict when a starting pitcher is likely to get injured. Many people have tried to develop models to do just that, including Carl Wivvag in a June 2017 publication, which can be found here. While this model is quite complete and delivers accurate results, we will attempt to improve upon it by using its multiple classifier algorithms in tandem.

### Review of previous model and work

Wivvag developed his model and project as part of the Insight Health Data Fellows program. This is a very well known fellowship program in the data science domain that trains PhDs and MDs for careers in data science. Although Wivvag describes himself as a “casual baseball fan,” he’s likely just being modest, as his model incorporates many different pitching factors and statistics, on several different time scales (including single-game, seven-game, and career records).

He delivered a very complete project, which incorporates the full gamut of the data science and application development spectrum, starting with scraping MLB.com for injury information, to developing the model, and finally, to offering a user-friendly web interface to view the results of the model. Overall, I was impressed with his work, and would like to point out that the objective of this article is not to criticize any of the work he did. Rather, as the title of the article suggests, my goal is help to improve and go further with his results.

The root problem Wivvag is trying to solve is the binary classification problem. There are two possible classes (injured or not injured), and for each pitcher supplied as input, he needs to decide whether the output is “injured” or “not injured.” He actually built three different models by which to conduct predictions, using the logistic regression, random forests, and neural network algorithms, all three of which are algorithms commonly used to solve this type of problem.

In evaluating the performance of each algorithm, Wivvag chose to use the AUC (area under the curve) metric of ROC (receiver operating characteristic) curves, which is a common choice in this space. AUC is a number that is between 0.5 to 1.0, where 0.5 is the result of random guessing (using no model at all) and 1.0 is a perfect model that always classifies properly (which doesn’t exist in the real world). The closer to 1.0 the AUC is, the better the model is. Carl determined these AUC values from his models:

- Logistic regression – 0.71
- Random forests – 0.73
- Neural networks – 0.77

Out of these models, the neural network classifier seems to have the best performance. I decided to extract the ROC data from Wivvag’s models and analyze them in Python. Here is a plot of all three ROC curves on the same axes:

You can see that the neural networks curve has the greatest amount of area under it, thus confirming the previous stated result. However, you can also see that it is not the always the best choice. For certain false probability rates, either the logistic regression or the random forests curve can have a higher true probability rate, and in those cases, neural networks are not the best choice.

### Using multiple detection curves

This leads us to wonder if it is possible to use multiple ROC curves in tandem to obtain a more optimal result than we’d achieve a single ROC curve? The answer to this question is yes. One paper that explores this idea is this one, published by Foster Provost and Tom Fawcett. They proposed building what they call a “ROCCH-Hybrid” classifier. The example study and data they use deals with detecting heart disease instead of pitcher injury, but the same general concept applies.

Perhaps the biggest takeaway of the paper is illustrated in the graph on page eight:

The graph shows four classifiers—A, B, C, and D—and also the convex hull of these curves, which is a composite curve connecting the outer boundaries of each individual curve. The paper proves that the optimal classifier performance that can be obtained from a set of individual classifiers is equivalent to the convex hull curve of that set of classifiers. The math behind this proof is quite detailed, so we will accept the result as-is. If you want to see the math, I direct you to read the paper.

### Computing and evaluating the convex hull

So now, to compute the optimal pitcher injury ROC curve, we are left with the task of computing the convex hull of the neural network, logistic regression, and random forests ROC curves created by Wivvag. Fortunately, there are ready made software packages to handle that task, so that we don’t have to write the code from scratch. I used the scipy.spatial.ConvexHull class in the SciPy package written in Python. No joke, this class allowed me to compute the convex hull in one line of code!

Here is a plot of the three ROC curves along with the convex hull:

Clearly, the convex hull has a greater AUC than any of the individual curves. The SpiPy convex hull class also allows computing the AUC with one line of code. It turned out to be 0.82, which is a nice improvement over the previous best mark of 0.77 set by the Neural Networks curve.

This result means it is possible to build a ROCCH-Hybrid classifier that will achieve an AUC of 0.82, and thus achieve more accurate injury predictions than using individual classifiers by themselves. The steps involved in this process are too detailed for this article, so again I will direct you to the paper for more details.

### Conclusion

It seems like everybody in the baseball analytics community is trying to invent or discover the “next big thing” in sabermetrics. Some ideas, like Voros McCracken’s theory of BABIP, come out of nowhere and make a huge impact. However, I believe that we can also make improvements to existing ideas by combining things that we already know, and the above analysis of pitcher injury prediction outlines one specific example of that. This process can be extended to predictions of any kind, and perhaps there are other domains in sabermetrics that could also benefit from using multiple classifiers in tandem.

## Leave a Reply

9 Comments on "Improving a Starting Pitcher Injury Model With Multiple Classifier Algorithms"

You must be logged in to post a comment.

You must be logged in to post a comment.

Great article. Do you see yourself building the ROCCH-hybrid classifier in the future? In practice, how much would an improvement of AUC from 0.77 to 0.82 improve the classification of pitcher injury?

Thanks. Unfortunately I don’t know of a good way to make a quantitative prediction on how much the pitcher injury classification results improve without having access to the code and model.

Note that my previous comment meant that I’m not sure how the accuracy improves with the hybrid classifier (which I assume was intent of the question). The increase in AUC is in itself a measure of improved performance, as well as the increase in true positive rate for each false positive rate.

Even before reading the completing the article, my answer to the headline is a resounding yes. Right now, out methodology appears to be, is he a starting pitcher? does his fastball go above 90 mph? Yeah, he is going to likely need Tommy Johns at some point.

I notice that most of the convex hull’s gained area would come from convexifying the NN curve alone. What does this procedure *do* over a single classifier?

How does the hybrid classifier address overfitting? Some of the exact jittery shape of a given curve is noise, and stitching pieces of those together will not gain everything it thinks it’s gaining.

Great piece of work. I wonder if this wasn’t akin to the system the Pirates have employed?

I know some of these words.

This was very cool.

I went to check out Carl Wivagg’s original article, and I noticed that the link to the web interface for the model seems to be broken. Does anyone know if it still exists in some form? http://www.baseballinjurypredict.tech/