Does Pitching Deep into Games Lead to More Wins?

Predicting pitcher wins is a capricious exercise, and few factors have been shown to have any correlation whatsoever with win percentage (W%). To predict wins, one should consider a pitcher’s ERA, offensive support, strength of bullpen, quality of defense behind the mound, and, innings pitched (IP) in a season.

In fact, research has shown that IP and ERA are the only two factors that have a correlation above .30, and the two are very close. In a sample of pitchers from 2003-2013, the correlation for both eclipsed .40.

Obviously, pitching more games leads to more wins in a season, but many fantasy experts insist that pitching deep into games is an important part of earning a win as well. The theory, which I’ve seen taken for granted by experts at ESPN, CBS, Baseball Prospectus, and Rotographs, is that a starting pitcher who pitches into the 8th or 9th inning and leaves with a lead intact is more likely be credited with the W.

However, to earn a win a starter must pitch only 5 innings. Since we know that starters are often less effective after 75 pitches or so, pulling a pitcher early and relying a fresh bullpen that is at least league average should, in theory, be more effective than keeping a starter in the game. Dave Cameron articulated this point when creating a gameplan for the Pirates’ all-important play-in game in October 2013 when he suggested Liriano be pulled after only 3 innings. The chart below reinforces the obvious point that, except for walk rate, relievers generally eclipse starters in most skill metrics.

Figure 1

In 2013 Shelby Miller started 31 games and came away with the W a total of 15 times, earning a W% that ranked 22nd in the majors right behind Clayton Kershaw and Anibal Sanchez. That’s impressive, but also consider that the innings-limited rookie pitched an average of 5.5 innings per start—he only racked up 13 quality starts (QS), ranked 86th in the league. QS, after all, require putting in 6 innings of work with at least a 4.50 ERA.

Why, then, are innings pitched per start (IP/GS) so important, relatively, when considering W%? I hypothesized that pitchers who are given the leeway to pitch deep into games, and hence give their bullpen a rest, were generally better at run prevention than their peers, i.e. sported a lower ERA.

In healthcare research, where we don’t write particularly well, we love simple diagrams to explain hypothesized effects. Below is a diagram showing how one might view the relationship between various factors like ERA, IP, defense, offensive support and bullpen ERA. The perceived link between IP/GS and Pitcher Wins is confounded by ERA, which has an effect on both factors.

Pitch Efficiency

Before examining the theory that ERA accounts for the correlation between IP and W%, lets look at another possible explanation. Perhaps pitch efficiency is the key. Jordan Zimmermann was the 3rd most efficient starter (14.5 P/IP) in the majors last year, and was tied for the 8th highest W% (.68). However, the table below shows the correlation between W per game started (GS) and P/IP, ERA, and IP/GS among starters between 2009-2013:


W% and…








While ERA and IP/GS appear to be almost equally correlated, the squared correlation coefficient for P/IP was negligible at .08. Variance in pitch efficiency has little to do with variance in W%.

IP/GS: How to Measure a Confounder

There are 2 straightforward ways to determine if the relationship between 2 variables is actually being skewed by a third factor, in this case ERA. The first is to stratify the sample by ERA and see if the relationship between IP/GS and W% still stands. If ERA is not a confounder, we would expect the correlation between each tier to remain relatively stable. As we can see in the chart below, it follows no clear trend.

Figure 3

Interestingly, only the best tier of pitchers, those with an ERA less than 3.65, show any discernible relationship between W% and IP/GS, supporting the theory that those starters who have demonstrated a strong ability to prevent runs are given the chance to pitch more innings.  Among more middling pitchers, the relationship between pitching deep into games and W% is negligible.

The second way to measure confounding is using a regression model. If you create a model examining how factor X predicts factor Y, introducing factor Z should not change the coefficient for X by more than 10% if Z does not have a strong pull on the relationship. For example, if we run a model that shows that smoking doubles your chance of getting lung cancer, then introducing tea drinking into the equation should not really change that smoking-lung cancer connection by more than 10%, unless we believe that drinking tea can also affect lung cancer and/or smoking.

I’m with MGL that regression is often unnecessary in baseball research, as its results can be difficult to interpret and unnecessarily complicated. I might add that even simple linear regression rests on a series of assumptions that are not always met. With that caveat, the data in this sample are normally distributed and I kept the model as simple as possible. Model 1 examines the relationship between W% and IP/GS. Model 2 adds a third variable, ERA.


Coefficient (%)


Model 1




Model 2




Model 2




All results are statistically significant. Model 1 indicates that for each 1-inning increase in IP/GS, we would expect an 11% increase in W%. Once we control for ERA, we see that each 1-inning increase would result in an even weaker relationship— we would expect a 6% increase in W%. The new coefficient, .057, is more than 10% different from .111 and we can safely conclude that ERA is confounding this relationship, just as we found in the stratified analysis above.

Predicting Wins?

Here at FanGraphs we might mock the idea of pitcher wins, since they are mostly a byproduct of an era when pitchers did pitch deep into games and bullpens were not utilized as often or as effectively. However, when it comes to predicting wins, Will Larson has shown that projection systems like Steamer and CAIRO do a pretty good job, and are on average within 3.5-4 wins of the actual end-of-season results.

In fact, projection systems across the board are better at capturing player-to-player variation (ranking players) in counting statistics like W and strikeouts than rate stats ERA and WHIP.

Figure 4

While I have previously shown that QS correlate much better than W with pretty much every measure of pitcher skill we have, W% is still somewhat predictable. As long as we have yet to #killthewin, we might as well keep trying to forecast the future. 

Print This Post

Sort by:   newest | oldest | most voted

Just a thought, and a wildish one. Suppose the distributions from which you sample are Gaussian, but say maybe Cauchy or some other. What then?