KATOH: Forecasting Major League Pitching with Minor League Stats

Julio Urias projects to have the highest WAR of any minor-league pitcher by age 28. (via Dustin Nosler)

Julio Urias projects to have the highest WAR of any minor-league pitcher by age 28. (via Dustin Nosler)

Shortly before the new year, I wrote a piece here at The Hardball Times introducing KATOH — a methodology for forecasting major league performance using minor league stats. Using a series of probit regression analyses, I explored how a hitter’s age and offensive statistics are predictive, across all levels of the minor leagues, from Rookie ball to Triple-A, of his future big league performance.

The result was a set of projections for for each minor league hitter, which included the probability that he’d play in the majors and that he would hit certain WAR thresholds through age 28. This analysis also provided some insight into which offensive statistics are predictive of future success for players at each level of the minor leagues. The most pronounced trend — and possibly the most surprising one — relates to a hitter’s walk rates. In the lower-levels of the minors, they have little to no bearing on a player’s future big league success.

Today, I’m going to deal with minor league pitchers using the same type of methodology. In my original piece, I noted that these projections shouldn’t be used to replace traditional, scouting-based methodologies. Instead, they are intended to complement them, and possibly uncover statistical factors that have been overlooked. This is especially true for pitchers, whose stats take longer to stabilize, and whose stuff often matters more than stats.

The table below summarizes which stats proved significant at each minor league level. This analysis includes minor league data going back to 1991, the earliest year for which Baseball-Reference has batters-faced totals for pitchers. R+ refers to the advanced rookie leagues–the Appalachian and Pioneer Leagues — while R- includes the Arizona and Gulf Coast Leagues.

The following factors wound up being significant for pitchers at one or more minor league level: Age, the percentage of a pitcher’s games that were starts, strikeout rate, walk rate, home run rate, and handedness. Additionally, the square of a pitcher’s strikeout rate (K%^2) was significant at the Triple-A level with a negative coefficient. Essentially, this says that a high strikeout rate bodes well for pitchers in Triple-A, but the added benefit starts to diminish around 25 percent. All the performance stats (K%, BB%, HR%) have been adjusted to league average, but were not park-adjusted.

Significant Statistics by Level
Level Age GS% K% BB% HR% Handedness K^2
AAA Yes Yes Yes Yes Yes   Yes
AA Yes Yes Yes Yes Yes Yes  
A+ Yes Yes Yes Yes      
A Yes Yes Yes Yes Yes    
A- Yes Yes Yes   Yes    
R+ Yes Yes Yes        
R- Yes Yes Yes     Yes  

Unsurprisingly, both a pitcher’s age and his percentage of games started are predictive in the direction you would expect. Pitchers who are young for their level are more likely to be successful than older prospects, and starting pitchers are generally more successful than those who work in relief. Strikeout rate is also a very important predictor of future success, especially in short-season leagues, where other metrics don’t tell us anything about future success.

Just as we saw with hitters, a pitcher’s walk rate is not at all predictive for appearances in the lowest levels of the minor leagues. However, once a pitcher reaches full-season ball, a one percent change in walk rate immediately becomes almost as useful as a one percent change in strikeout rate in forecasting future performance. This differs from what I found for hitters, whose walk rates mean very little below Double-A, and don’t become as useful as strikeout rate until the Triple-A level.

Home run rate is another metric that’s pretty much meaningless in the lower levels of the minors. Although it starts to add some predictive value as early as A-ball, its effect is relatively small for pitchers below Double- and Triple-A. Relative to strikeout percentage, a one percent increase in a pitcher’s home run rate matters about twice as much Triple-A as it would for a pitcher in Low-A and Short-Season-A.

Another interesting finding is the significance of a pitcher’s handedness. For pitchers in Double-A and in the lower rung of Rookie ball, a righty is more likely to blossom into a successful big leaguer than a lefty, all else being equal. Don’t read too much into this, however, as the effect isn’t large enough to make a noticeable difference in the projections. Its hard to say exactly why a righty is more likely to succeed than a lefty with comparable stats, but my guess is that it has something to do with minor league hitters facing lefties with unusual deliveries — like throwing side-arm — for the first time.

One variable I wish I could include here is a pitcher’s height. Its generally accepted that taller pitchers have an advantage over shorter ones — taller pitchers can throw the ball on a more downhill plane and usually release the ball closer to home plate. As a result, you often hear evaluators refer to a pitching prospect as “projectable” if he’s over 6-foot-3, implying that there’s some extra potential left for him to unlock. As a result, I would imagine that a height variable would be significant with a positive coefficient. In other words, if a 6-foot-4 pitcher and a 5-foot-11 pitcher both had the same stat line, I would guess that the taller guy would be more likely to succeed.

Unfortunately, I can’t say that for sure, since I couldn’t find height data for minor leaguers in a readable format. But I’m hoping to add height into the mix for future projections. So if you have any suggestions on how I could track down this type of data, please let me know.

Predicting how a minor league hitter will perform in the majors is no easy task, and doing it for pitchers is even harder. As the saying goes: “There’s no such thing as a pitching prospect.” Countless pitching prospects who have put up crazy minor league numbers only to flame out without establishing themselves at the big league level. Often it’s due to injury — we’ll never know what could have been for promising pitchers like Ryan Anderson and Nick Neugebauer, but sometimes healthy pitching prospects just turn into pumpkins once they reach the majors — like Rick Ankiel and Salomon Torres. Then there are guys like Johan Santana and Roy Oswalt, who turn into bona fide aces after scuffling for a few years in the minors. It’s a crapshoot.

Because of all of this uncertainty, the top KATOH projections for pitchers tend to run lower than they do for hitters — minor league pitching stats just aren’t as predictive as hitting stats. KATOH is pretty wishy-washy on most pitchers, especially when they’re in the low minors. For pitchers in short-season ball, nearly all of the KATOH projections are clumped relatively close together, even when simply estimating the probability that a player will make it to the majors. Aside from what he does in Triple-A, a pitcher’s minor league stats can tell us only so much, so unless a low-level pitcher is really tearing things up, KATOH is going to peg him somewhere close to the median.

Rplot02

While KATOH has a significantly harder time projecting pitchers than batters, it succeeds and fails in similar ways. Just as with hitters, it fares pretty well for players in the high minors, but has a really tough time with guys at the lower levels, especially when you look at the higher-WAR thresholds. For low-level pitchers, the only thing it can tell us with any sort of certainty is whether or not he’ll crack the big leagues before his 29th birthday.

Considering all pitchers with a KATOH projection since 1991, the table below shows the average residual–the difference between KATOH’s prediction and what actually happened (either zero or 100 percent)–divided by the average prediction at each level. The greener the box, the better job KATOH did of guessing right.

KAT

Now for the fun part.. But before you immerse yourself in the projections, keep a couple of things in mind:

1) Be wary of projections for pitchers in the lower levels of the minors. As I outlined above, the projections become less and less reliable with every step you take down the minor league ladder. Low minors projections are a best guess based on the available data, but are very much subject to error.

2) Pay attention to sample sizes. Many of the pitchers listed in my Google doc may have excellent projections over a small number of innings. Take these with as many grains of salt as you would a pitcher’s FIP over the same number of innings.

Unsurprisingly, the list of top KATOH projections (minimum 200 batters faced, or about 50 innings pitched) reads like a “who’s who” of top pitching prospects in baseball. But if you look further down the list, you’ll find a few lesser-known guys who may not have knockout stuff, but still managed to get batters out in 2014 despite being young for their level.

Top KATOH Projections
WAR
Name Age Org ’14 Level MLB > 4 > 6 > 8 > 10 > 12 > 16 Thru Age 28
Julio Urias 17 LAD A+ 91% 49% 49% 49% 45% 45% 43% 12.2
Noah Syndergaard 21 NYM AAA 99% 69% 63% 61% 50% 50% 23% 11.5
Luis Severino 20 NYY A/A+/AA 79% 39% 36% 30% 24% 22% 18% 7.2
Jose Berrios 20 MIN A+/AA/AAA 84% 35% 33% 31% 24% 23% 16% 7.0
Tyler Glasnow 20 PIT A+ 85% 32% 29% 28% 22% 20% 19% 6.9
Taijuan Walker 21 SEA A+/AA/AAA 94% 42% 36% 36% 18% 13% 12% 6.7
Clayton Blackburn 21 SF R-/AA 90% 44% 38% 27% 21% 13% 12% 6.5
Henry Owens 21 BOS AA/AAA 93% 39% 33% 24% 17% 14% 11% 6.2
Archie Bradley 21 ARI R-/AA/AAA 79% 26% 25% 24% 24% 23% 11% 5.7
Marcos Molina 19 NYM A- 63% 24% 21% 21% 21% 21% 13% 5.3
Rafael Montero 23 NYM R-/A+/AAA 91% 36% 29% 22% 20% 17% 4% 5.3
Matt Wisler 21 SD AA/AAA 92% 36% 29% 25% 13% 8% 7% 5.2
Kyle Hendricks 24 CHC AAA 94% 42% 32% 21% 17% 13% 1% 5.1
Lucas Giolito 19 WAS A 76% 33% 27% 24% 18% 12% 7% 5.0
Hunter Harvey 19 BAL A 77% 33% 28% 22% 18% 12% 7% 5.0
Daniel Norris 21 TOR A+/AA/AAA 86% 26% 23% 18% 14% 12% 9% 5.0
Mark Binford 21 KC A+/AA/AAA 76% 27% 25% 23% 16% 13% 8% 4.9
Victor Sanchez 19 SEA AA 84% 24% 24% 22% 14% 9% 8% 4.8
Jake Thompson 20 TEX AA 78% 23% 20% 17% 13% 11% 8% 4.5
Zach Davies 21 BAL AA 88% 28% 24% 16% 11% 6% 5% 4.4

As I did with hitters, I put together a document that includes a projection every player who threw a pitch in affiliated baseball last year. Again, sample size caveats apply. I also made another document that includes all of the minor league seasons that went into making these forecasts, which includes all prospects were 28 or older in 2014. Have fun browsing these projections, and don’t hesitate to reach out to me if you have any questions or suggested improvements.


Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, FanGraphs and The Hardball Times. He's also on the twitter machine: @_chris_mitchell None of the views expressed in his articles reflect those of his daytime employer.
newest oldest most voted
Joshua_C
Guest
Joshua_C

Excellent stuff, as with the KATOH hitters piece. Would be interesting to brainstorm different factors that could improve the model’s predictive power. Fastball velocity? Height (you mention this)? Number of pitches thrown? Swinging strike %?

obsessivegiantscompulsive
Guest
obsessivegiantscompulsive

I totally agree. Also, some analyst have been using K%-BB% as an indicator of how good a pitcher is, which they said was better than K/BB. You can calculate both of them with your existing data set and see which one does a better job for your analysis, that would be an interesting finding, whether or not these indicators work in the minors for indicating future performance.

Jeremy Losak
Guest
Jeremy Losak
Chris, Just as interesting a read as your hitter projections article. One question though: you mentioned that the stats you used were league adjusted but not park adjusted. HR% seems like a stat that is very useless without having the context of home park (because even within some leagues, ballpark sizes and air density vary). Would you consider taking an xFIP-like approach and just using fly ball percentage (which I believe is available at the Minor League level)? This may bias your results altogether because it is singling out fly ball pitchers, but maybe some other approach that adjusts for… Read more »
evo34
Guest
evo34

I agree that ignoring park factors is prob. the biggest problem with the methodology here. Even if exact numbers are not available for all seasons, I think using estimates based on recent park factors would be very helpful. E.g., http://minorleaguecentral.com/parkfactors

Mike Green
Guest
Mike Green

It seems to me that you need to have something based on IP in a season. Urias has started almost all of his games, but is averaging less than 4 innings per start. His projection would be the same if he was averaging 5.5 innings per start, and this does not make sense. He is essentially making starts that are more like long relief outings.

It is very interesting that Syndergaard rates as far and away the best of the high minors prospects. That is probably right.

tz
Guest
tz

I agree with Syndergaard’s ranking. Putting together a 3.70 FIP in the best hitter’s park in the PCL is nothing to sneeze at. His minor-league stat profile is the pitchers’ equivalent of Mookie Betts. With his stuff and mechanics, his upside is very, very high.

tz
Guest
tz

Chris, great job once again. Regardless of the details, I think you’ve developed a very strong framework for projecting future value of minor leaguers.

Tim
Guest
Tim

I know this isn’t productive to your research at all, but I find the projections for established major league pitchers highly amusing.

E.G. Clayton Kershaw is projected for 2.9 WAR by age 28.

Mike
Guest
Mike

I’m always hearing to take a prospect’s minor league splits into account. It seems like a pitcher with a large statistical difference between lefties and righties won’t end up being a starter, but rather a situational reliever. I’m guessing that this might only apply once the player reaches full season ball or maybe even later, do you think there is a way to evaluate this? OPS+ difference between righties and lefties?