Using High-A Stats to Predict Future Performance

Last week, I looked into how a player’s low-A stats — along with his age and prospect status at the time — can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A included: age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll analyze what KATOH has to say about players in class-A-advanced leagues. Here’s the R output based on all players with at least 400 plate appearances in a season in high-A from 1995-2009:

High-A Output

This looks very similar to what I found for low-A players: Walk rate isn’t significant, and everything else has very similar effects on the final probability. However, the coefficients from this model are all a tad bigger than those from the low-A version, implying that high-A stats might be a bit more telling of a player’s future. Intuitively, this makes sense: The closer a player is to the big leagues, the more his stats start to reflect his future potential.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in high-A as of July 7th. I also included a few notable players who fell short of the threshold, namely Joey Gallo (who checks in at a remarkable 99.8%), Peter O’Brien, and Jesse Winker. Here’s an excerpt of the top-ranking players:

Player Organization Age MLB Probability
Joey Gallo TEX 20 100%
Corey Seager LAD 20 99%
Carlos Correa HOU 19 99%
Albert Almora CHC 20 93%
Nick Williams TEX 20 93%
D.J. Peterson SEA 22 93%
Jesse Winker CIN 20 91%
Orlando Arcia MIL 19 88%
Jose Peraza ATL 20 87%
Colin Moran MIA 21 87%
Renato Nunez OAK 20 86%
Tyrone Taylor MIL 20 85%
Hunter Renfroe SDP 22 84%
Josh Bell PIT 21 84%
Raul Mondesi KCR 18 83%
Daniel Robertson OAK 20 83%
Jorge Polanco MIN 20 81%
Dilson Herrera NYM 20 77%
Breyvic Valera STL 21 77%
Peter O’Brien NYY 23 76%
Matt Olson OAK 20 75%
Jorge Alfaro TEX 21 75%
Patrick Leonard TBR 21 75%
Dalton Pompey TOR 21 73%
Billy McKinney OAK 19 73%
Teoscar Hernandez HOU 21 73%
Brandon Nimmo NYM 21 72%
Jose Rondon LAA 20 70%
Rio Ruiz HOU 20 70%
Brandon Drury ARI 21 70%

Next up will be double-A. Unlike A-ball, double-A tends to be a random mishmash of prospects and minor-league lifers, so it will be interesting to see how KATOH handles this wide array of players. And perhaps double-A is where a player’s walk rate finally starts to tell us something about his future success.

Statistics courtesy of Fangraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.




Print This Post

Chris works in economic development by day, but spends most of his nights either watching or thinking about baseball. He writes for Pinstripe Pundits, and is an occasional user of the twitter machine: @_chris_mitchell


7 Responses to “Using High-A Stats to Predict Future Performance”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. jim S. says:

    McKinney was traded to the Cubs in the Shark/Hammel deal.

    Vote -1 Vote +1

  2. Miles says:

    It’d be interesting to see these numbers for some current major leaguers– especially if there are any great players who would’ve had really low %s

    Vote -1 Vote +1

  3. CRPerry13 says:

    Do you use any sort of compensation for the bizzaro offensive environments in High A? Especially if ISO correlates with MLB success. I would imagine that ISO and BABIP correlation with the FSL and Carolina Leagues are vastly more predictive of ML success than those from the CAL league.

    Vote -1 Vote +1

    • I do account a player’s league. So if a player had an ISO .100 higher than his league’s average, I adjust his ISO to be the 2014 average (an average of all A+ leagues) +.100. So a player with league average stats in the FSL would be treated exactly the same as a player with league average stats in the CAL. However, this does not account for ballpark effects.

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *