Improving a Starting Pitcher Injury Model With Multiple Classifier Algorithms

Max Scherzer is one of the most durable and highly paid starting pitchers in the game. (via Arturo Pardavila III)

Starting pitchers are an expensive commodity in modern major league baseball. They only pitch every fifth day, but they command as high a salary as position players who play nearly every game. Due to the high level of stress put on their arms, they also tend to get injured at a higher rate, despite playing in many fewer games. Obviously, it would be valuable to be able to predict when a starting pitcher is likely to get injured. Many people have tried to develop models to do just that, including Carl Wivvag in a June 2017 publication, which can be found here. While this model is quite complete and delivers accurate results, we will attempt to improve upon it by using its multiple classifier algorithms in tandem.

Review of previous model and work

Wivvag developed his model and project as part of the Insight Health Data Fellows program. This is a very well known fellowship program in the data science domain that trains PhDs and MDs for careers in data science. Although Wivvag describes himself as a “casual baseball fan,” he’s likely just being modest, as his model incorporates many different pitching factors and statistics, on several different time scales (including single-game, seven-game, and career records).

He delivered a very complete project, which incorporates the full gamut of the data science and application development spectrum, starting with scraping MLB.com for injury information, to developing the model, and finally, to offering a user-friendly web interface to view the results of the model. Overall, I was impressed with his work, and would like to point out that the objective of this article is not to criticize any of the work he did. Rather, as the title of the article suggests, my goal is help to improve and go further with his results.

The root problem Wivvag is trying to solve is the binary classification problem. There are two possible classes (injured or not injured), and for each pitcher supplied as input, he needs to decide whether the output is “injured” or “not injured.” He actually built three different models by which to conduct predictions, using the logistic regressionrandom forests, and neural network algorithms, all three of which are algorithms commonly used to solve this type of problem.

In evaluating the performance of each algorithm, Wivvag chose to use the AUC (area under the curve) metric of ROC (receiver operating characteristic) curves, which is a common choice in this space. AUC is a number that is between 0.5 to 1.0, where 0.5 is the result of random guessing (using no model at all) and 1.0 is a perfect model that always classifies properly (which doesn’t exist in the real world). The closer to 1.0 the AUC is, the better the model is. Carl determined these AUC values from his models:

  • Logistic regression – 0.71
  • Random forests – 0.73
  • Neural networks – 0.77

Out of these models, the neural network classifier seems to have the best performance. I decided to extract the ROC data from Wivvag’s models and analyze them in Python. Here is a plot of all three ROC curves on the same axes:

You can see that the neural networks curve has the greatest amount of area under it, thus confirming the previous stated result. However, you can also see that it is not the always the best choice. For certain false probability rates, either the logistic regression or the random forests curve can have a higher true probability rate, and in those cases, neural networks are not the best choice.

Using multiple detection curves

This leads us to wonder if it is possible to use multiple ROC curves in tandem to obtain a more optimal result than we’d achieve a single ROC curve? The answer to this question is yes. One paper that explores this idea is this one, published by Foster Provost and Tom Fawcett. They proposed building what they call a “ROCCH-Hybrid” classifier. The example study and data they use deals with detecting heart disease instead of pitcher injury, but the same general concept applies.

Perhaps the biggest takeaway of the paper is illustrated in the graph on page eight:

The graph shows four classifiers—A, B, C, and D—and also the convex hull of these curves, which is a composite curve connecting the outer boundaries of each individual curve. The paper proves that the optimal classifier performance that can be obtained from a set of individual classifiers is equivalent to the convex hull curve of that set of classifiers. The math behind this proof is quite detailed, so we will accept the result as-is. If you want to see the math, I direct you to read the paper.

Computing and evaluating the convex hull

So now, to compute the optimal pitcher injury ROC curve, we are left with the task of computing the convex hull of the neural network, logistic regression, and random forests ROC curves created by Wivvag. Fortunately, there are ready made software packages to handle that task, so that we don’t have to write the code from scratch. I used the scipy.spatial.ConvexHull class in the SciPy package written in Python. No joke, this class allowed me to compute the convex hull in one line of code!

Here is a plot of the three ROC curves along with the convex hull:

Clearly, the convex hull has a greater AUC than any of the individual curves. The SpiPy convex hull class also allows computing the AUC with one line of code. It turned out to be 0.82, which is a nice improvement over the previous best mark of 0.77 set by the Neural Networks curve.

Thus Spoke Baseball: Another Look at the Language of the Game
In other words, baseball gets the glossary it deserves.

This result means it is possible to build a ROCCH-Hybrid classifier that will achieve an AUC of 0.82, and thus achieve more accurate injury predictions than using individual classifiers by themselves. The steps involved in this process are too detailed for this article, so again I will direct you to the paper for more details.

Conclusion

It seems like everybody in the baseball analytics community is trying to invent or discover the “next big thing” in sabermetrics. Some ideas, like Voros McCracken’s theory of BABIP, come out of nowhere and make a huge impact. However, I believe that we can also make improvements to existing ideas by combining things that we already know, and the above analysis of pitcher injury prediction outlines one specific example of that. This process can be extended to predictions of any kind, and perhaps there are other domains in sabermetrics that could also benefit from using multiple classifiers in tandem.


Roger works as a software engineer by day, writes for The Hardball Times and FanGraphs by night, and has also worked for a Major League club.

Leave a Reply

9 Comments on "Improving a Starting Pitcher Injury Model With Multiple Classifier Algorithms"

newest oldest most voted
RangingRandle
Member
Member
RangingRandle

Great article. Do you see yourself building the ROCCH-hybrid classifier in the future? In practice, how much would an improvement of AUC from 0.77 to 0.82 improve the classification of pitcher injury?

v2micca
Member
Member
v2micca

Even before reading the completing the article, my answer to the headline is a resounding yes. Right now, out methodology appears to be, is he a starting pitcher? does his fastball go above 90 mph? Yeah, he is going to likely need Tommy Johns at some point.

Jetsy Extrano
Member
Jetsy Extrano

I notice that most of the convex hull’s gained area would come from convexifying the NN curve alone. What does this procedure *do* over a single classifier?

How does the hybrid classifier address overfitting? Some of the exact jittery shape of a given curve is noise, and stitching pieces of those together will not gain everything it thinks it’s gaining.

channelclemente
Member

Great piece of work. I wonder if this wasn’t akin to the system the Pirates have employed?

bensnider94
Member
bensnider94

I know some of these words.

Kervin
Member
Kervin

This was very cool.

I went to check out Carl Wivagg’s original article, and I noticed that the link to the web interface for the model seems to be broken. Does anyone know if it still exists in some form? http://www.baseballinjurypredict.tech/