Applying KATOH to Historical Prospects

Over the last few weeks, I have written a series of posts looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I analyzed hitters in Rookie leagues, Short-Season A, Low-A, High-A, Double-A, and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

After receiving a few requests, I decided to apply the model to players of years past. In what follows, I dive into what KATOH would have said about recent top prospects, look at the highest KATOH scores of the last 20 years, and highlight some instances where KATOH missed the boat on a prospect. If you’re feeling really ambitious, here’s a giant google doc of KATOH scores for all 40,051 player seasons since 1995 ( minimum 100 plate appearances in a short-season league or 200 in full-season ball).

Before I delve into the parade of lists, I want to point out one disclaimer to what I’m doing here. KATOH was derived from the performances of historical players, so applying the model to those same players might make it look a little better than it is. Take a player like Jason Stokes for example. Although he was a very well-regarded prospect in the early 2000′s (#15 and #51 per Baseball America in 2003 and 2004), KATOH consistently gave him probabilities in the 70′s and 80′s. But part of that is likely because Stokes’ data points were incorporated into the model. If I had created KATOH in 2005, Stokes’ MLB% may have been a few percentage points higher. Even so, a few data points generally aren’t enough to substantially change a model that incorporates thousands. In other words, it’s probably safe to assume that a player’s MLB% using today’s KATOH is roughly in line with what he would have received at the time.

Now, onto the results. Here’s what KATOH thought about some of the most recent top 100 prospects:

2013 Top 100 Prospects

Player Year Age Level MLB Probability
Xander Bogaerts 2013 20 AA 99.888%
Xander Bogaerts 2013 20 AAA 99.869%
George Springer 2013 23 AAA 99.816%
Gregory Polanco 2013 21 AA 99.614%
Nick Castellanos 2013 21 AAA 99.608%
Kolten Wong 2013 22 AAA 99.428%
Wil Myers 2013 22 AAA 99.418%
Miguel Sano 2013 20 A+ 99.335%
Tyler Austin 2013 21 AA 99.194%
Jackie Bradley 2013 23 AAA 99.079%
Kaleb Cowart 2013 21 AA 99%
Byron Buxton 2013 19 A+ 98%
Francisco Lindor 2013 19 A+ 98%
Christian Yelich 2013 21 AA 97%
Byron Buxton 2013 19 A 97%
Addison Russell 2013 19 A+ 97%
Billy Hamilton 2013 22 AAA 96%
Brian Goodwin 2013 22 AA 96%
Carlos Correa 2013 18 A 96%
Slade Heathcott 2013 22 AA 96%
Javier Baez 2013 20 A+ 95%
Jake Marisnick 2013 22 AA 95%
Albert Almora 2013 19 A 95%
Jonathan Singleton 2013 21 AAA 94%
Mike Zunino 2013 22 AAA 94%
Alen Hanson 2013 20 A+ 94%
Gregory Polanco 2013 21 A+ 92%
Javier Baez 2013 20 AA 91%
Jorge Soler 2013 21 A+ 90%
Gary Sanchez 2013 20 A+ 89%
Austin Hedges 2013 20 A+ 89%
Mike Olt 2013 24 AAA 87%
Miguel Sano 2013 20 AA 83%
George Springer 2013 23 AA 82%
Mason Williams 2013 21 A+ 78%
Trevor Story 2013 20 A+ 61%
Bubba Starling 2013 20 A 61%
Courtney Hawkins 2013 19 A+ 58%
Roman Quinn 2013 20 A 58%

2012 Top 100 Prospects

Player Year Age Level MLB Probability
Jurickson Profar 2012 19 AA 99.975%
Anthony Rizzo 2012 22 AAA 99.947%
Manny Machado 2012 19 AA 99.937%
Billy Hamilton 2012 21 AA 99.856%
Oscar Taveras 2012 20 AA 99.827%
Kolten Wong 2012 21 AA 99.824%
Nolan Arenado 2012 21 AA 99.759%
Leonys Martin 2012 24 AAA 99.737%
Nick Franklin 2012 21 AA 99.737%
Yasmani Grandal 2012 23 AAA 99.714%
Wil Myers 2012 21 AAA 99.659%
Andrelton Simmons 2012 22 AA 99.566%
Travis D’Arnaud 2012 23 AAA 99.512%
Jedd Gyorko 2012 23 AAA 99.493%
Hak-Ju Lee 2012 21 AA 99.492%
Jonathan Singleton 2012 20 AA 99.482%
Nick Castellanos 2012 20 AA 99.465%
Jonathan Schoop 2012 20 AA 99.443%
Jean Segura 2012 22 AA 99.423%
Nick Castellanos 2012 20 A+ 99.051%
Starling Marte 2012 23 AAA 99.015%
Anthony Gose 2012 21 AAA 99%
Rymer Liriano 2012 21 AA 99%
Jake Marisnick 2012 21 AA 99%
Xander Bogaerts 2012 19 A+ 98%
Michael Choice 2012 22 AA 98%
Gary Brown 2012 23 AA 98%
Christian Yelich 2012 20 A+ 98%
Nick Franklin 2012 21 AAA 97%
Javier Baez 2012 19 A 97%
Brett Jackson 2012 23 AAA 96%
Zack Cox 2012 23 AAA 92%
Mason Williams 2012 20 A 91%
Gary Sanchez 2012 19 A 89%
Jake Marisnick 2012 21 A+ 88%
Francisco Lindor 2012 18 A 88%
Cheslor Cuthbert 2012 19 A+ 87%
Miguel Sano 2012 19 A 86%
Billy Hamilton 2012 21 A+ 83%
George Springer 2012 22 A+ 80%
Christian Villanueva 2012 21 A+ 80%
Mike Olt 2012 23 AA 79%
Matt Szczur 2012 22 A+ 78%
Rymer Liriano 2012 21 A+ 76%
Blake Swihart 2012 20 A 66%
Cory Spangenberg 2012 21 A+ 64%
Bubba Starling 2012 19 R 17%

2011 Top 100 Prospects

Player Year Age Level MLB Probability
Mike Trout 2011 19 AA 99.973%
Brett Lawrie 2011 21 AAA 99.969%
Anthony Rizzo 2011 21 AAA 99.911%
Wil Myers 2011 20 AA 99.654%
Christian Colon 2011 22 AA 99.495%
Brandon Belt 2011 23 AAA 99.414%
Austin Romine 2011 22 AA 99.393%
Jesus Montero 2011 21 AAA 99.379%
Devin Mesoraco 2011 23 AAA 99.205%
Brett Jackson 2011 22 AAA 99.199%
Dustin Ackley 2011 23 AAA 99.196%
Yonder Alonso 2011 24 AAA 99%
Lonnie Chisenhall 2011 22 AAA 99%
Zack Cox 2011 22 AA 98%
Jason Kipnis 2011 24 AAA 98%
Mike Moustakas 2011 22 AAA 98%
Desmond Jennings 2011 24 AAA 98%
Jonathan Villar 2011 20 AA 98%
Matt Dominguez 2011 21 AAA 98%
Jurickson Profar 2011 18 A 97%
Bryce Harper 2011 18 A 97%
Tony Sanchez 2011 23 AA 97%
Dee Gordon 2011 23 AAA 97%
Grant Green 2011 23 AA 97%
Manny Machado 2011 18 A+ 97%
Nolan Arenado 2011 20 A+ 96%
Chris Carter 2011 24 AAA 96%
Travis D’Arnaud 2011 22 AA 96%
Wilmer Flores 2011 19 A+ 95%
Jose Iglesias 2011 21 AAA 95%
Hak-Ju Lee 2011 20 A+ 94%
Brett Jackson 2011 22 AA 93%
Jonathan Singleton 2011 19 A+ 92%
Joe Benson 2011 23 AA 91%
Gary Sanchez 2011 18 A 86%
Wilin Rosario 2011 22 AA 86%
Nick Castellanos 2011 19 A 85%
Nick Franklin 2011 20 A+ 83%
Jean Segura 2011 21 A+ 82%
Cesar Puello 2011 20 A+ 82%
Derek Norris 2011 22 AA 76%
Jonathan Villar 2011 20 A+ 73%
Aaron Hicks 2011 21 A+ 68%
Billy Hamilton 2011 20 A 61%
Miguel Sano 2011 18 R 44%
Josh Sale 2011 19 R 15%

Next, lets take a look at some of the highest KATOH scores of all time, namely those who received a score of at least 99.9%. There aren’t any complete busts among these players, as virtually all of them went on to play in the majors.

All-Time Top KATOH Scores

Player Year Age Level MLB Probability
Sean Burroughs 2000 19 AA 99.998%
Luis Castillo 1996 20 AA 99.995%
Fernando Martinez 2007 18 AA 99.994%
Daric Barton 2005 19 AA 99.992%
Alex Rodriguez 1995 19 AAA 99.992%
Carl Crawford 2001 19 AA 99.992%
Elvis Andrus 2008 19 AA 99.992%
Adam Dunn 2001 21 AAA 99.990%
Joe Mauer 2003 20 AA 99.989%
Ryan Sweeney 2005 20 AA 99.984%
Nick Johnson 1999 20 AA 99.984%
Jose Tabata 2009 20 AA 99.983%
Jose Tabata 2008 19 AA 99.983%
Travis Snider 2009 21 AAA 99.981%
Joaquin Arias 2005 20 AA 99.980%
Matt Kemp 2006 21 AAA 99.979%
Jose Reyes 2002 19 AA 99.979%
Jurickson Profar 2012 19 AA 99.975%
Mike Trout 2011 19 AA 99.973%
Jay Bruce 2008 21 AAA 99.971%
Brett Lawrie 2011 21 AAA 99.969%
B.J. Upton 2004 19 AAA 99.959%
Howie Kendrick 2006 22 AAA 99.951%
Ryan Howard 2005 25 AAA 99.951%
Dioner Navarro 2004 20 AA 99.950%
Luis Rivas 1999 19 AA 99.949%
Lastings Milledge 2005 20 AA 99.948%
Anthony Rizzo 2012 22 AAA 99.947%
Billy Butler 2006 20 AA 99.946%
Fernando Martinez 2008 19 AA 99.944%
Alberto Callaspo 2004 21 AA 99.944%
Jose Lopez 2003 19 AA 99.939%
Freddie Freeman 2010 20 AAA 99.939%
Manny Machado 2012 19 AA 99.937%
Rickie Weeks 2005 22 AAA 99.935%
Casey Kotchman 2004 21 AAA 99.932%
Eric Chavez 1998 20 AAA 99.930%
Adrian Beltre 1998 19 AA 99.927%
Shannon Stewart 1995 21 AA 99.917%
Anthony Rizzo 2011 21 AAA 99.911%
Karim Garcia 1995 19 AAA 99.910%
Jay Bruce 2007 20 AAA 99.907%
Jeff Clement 2008 24 AAA 99.902%
Miguel Cabrera 2003 20 AA 99.900%

All of the players who registered a KATOH score of at least 99.9% did so while playing in either Double- or Triple-A. This isn’t all that surprising since these are the levels closest to the big leagues. But what about the lower levels? Like we saw in Double- and Triple-A, there weren’t any complete busts among the highest ranking hitters from full-season A-ball. For both full-season leagues, each of the 20 top ranked players has either made it to the majors, or in the case of Carlos Correa, is young enough to still has an excellent chance to do so. But on the bottom two rungs on the minor league ladder, we come across a few instances where KATOH whiffed, most notably in Garrett Guzman (74%), Richard Stuart (72%), and Pat Manning (72%).

Top KATOH Scores for Seasons in High-A

Player Year Age Level MLB Probability
Adrian Beltre 1997 18 A+ 99.863%
Andruw Jones 1996 19 A+ 99.568%
Giancarlo Stanton 2009 19 A+ 99.405%
Billy Butler 2005 19 A+ 99.348%
Miguel Sano 2013 20 A+ 99.335%
Chris Snelling 2001 19 A+ 99.241%
Jason Heyward 2009 19 A+ 99.097%
Andy LaRoche 2005 21 A+ 99.091%
Wilmer Flores 2010 18 A+ 99.075%
Nick Castellanos 2012 20 A+ 99.051%
Jose Reyes 2002 19 A+ 99%
Casey Kotchman 2003 20 A+ 99%
Vernon Wells 1999 20 A+ 99%
Travis Lee 1997 22 A+ 99%
Brandon Wood 2005 20 A+ 98%
Xander Bogaerts 2012 19 A+ 98%
Justin Huber 2003 20 A+ 98%
Aramis Ramirez 1997 19 A+ 98%
Jay Bruce 2007 20 A+ 98%
Byron Buxton 2013 19 A+ 98%

Top KATOH Scores for Seasons in Low-A

Player Year Age Level MLB Probability
Mike Trout 2010 18 A 99%
Adrian Beltre 1996 17 A 98%
Jurickson Profar 2011 18 A 97%
Bryce Harper 2011 18 A 97%
Sean Burroughs 1999 18 A 97%
Andruw Jones 1995 18 A 97%
Byron Buxton 2013 19 A 97%
Jason Heyward 2008 18 A 97%
Corey Patterson 1999 19 A 97%
Vladimir Guerrero 1995 20 A 97%
Javier Baez 2012 19 A 97%
Ian Stewart 2004 19 A 96%
Lastings Milledge 2004 19 A 96%
Carlos Correa 2013 18 A 96%
Prince Fielder 2003 19 A 96%
Delmon Young 2004 18 A 96%
Josh Vitters 2009 19 A 96%
Chad Hermansen 1996 18 A 95%
Wilmer Flores 2010 18 A 95%
B.J. Upton 2003 18 A 95%

Top KATOH Scores for Seasons in Short-Season A

Player Year Age Level MLB Probability Played in Majors
Chris Snelling 1999 17 A- 82% 1
Richard Stuart 1996 19 A- 72% 0
Aramis Ramirez 1996 18 A- 71% 1
Ryan Kalish 2007 19 A- 71% 1
Cory Spangenberg 2011 20 A- 66% 0
Hanley Ramirez 2002 18 A- 66% 1
Wilson Betemit 2000 18 A- 65% 1
Ismael Castro 2002 18 A- 65% 0
Vernon Wells 1997 18 A- 64% 1
Carlos Figueroa 2000 17 A- 61% 0
Carson Kelly 2013 18 A- 61% 0
Pablo Sandoval 2005 18 A- 60% 1
Dan Vogelbach 2012 19 A- 59% 0
Manny Ravelo 2000 18 A- 57% 0
Chip Ambres 1999 19 A- 57% 1
Maikel Franco 2011 18 A- 55% 0
Jurickson Profar 2010 17 A- 55% 1
Derek Norris 2008 19 A- 54% 1
Cesar Saba 1999 17 A- 54% 0
Edinson Rincon 2009 18 A- 52% 0

Top KATOH Scores for Seasons in Rookie ball

Player Year Age Level MLB Probability Played in Majors
Jeff Bianchi 2005 18 R 76% >1
Justin Morneau 2000 19 R 74% 1
Addison Russell 2012 18 R 74% 0
Garrett Guzman 2001 18 R 74% 0
James Loney 2002 18 R 74% 1
Prince Fielder 2002 18 R 73% 1
Pat Manning 1999 19 R 72% 0
Wilmer Flores 2008 16 R 70% 1
Alex Fernandez 1998 17 R 70% 0
Dorssys Paulino 2012 17 R 69% 0
Tony Blanco 2000 18 R 69% 1
Hank Blalock 1999 18 R 69% 1
Joe Mauer 2001 18 R 69% 1
Hanley Ramirez 2002 18 R 69% 1
Ramon Hernandez 1995 19 R 68% 1
Angel Salome 2005 19 R 68% 1
Marcos Vechionacci 2004 17 R 67% 0
Gary Sanchez 2010 17 R 66% 0
Scott Heard 2000 18 R 65% 0
Jose Tabata 2005 16 R 65% 1

Now for KATOH’s biggest whiffs. Looking at seasons prior to 2011, the following players had very high KATOH ratings, but never made it to baseball’s highest level. The biggest miss was Cesar King, a defensive-minded catcher from the Rangers organization. Though to KATOH’s credit, King did spend five days on the Kansas City Royals’ roster in 2001 without getting into a game. Following King are a couple of busted Yankees prospects in Jackson Melian and Eric Duncan. Not to make excuses for KATOH, but these guys’ high scores may have had something to do with the way the Yankees over-hyped their prospects back then. If those two weren’t on Baseball America’s top 100 list, KATOH would have pegged them in the 70′s, rather than in the high-90′s.

KATOH’s Biggest Misses

Player Year Age Level MLB Probability
Cesar King 1998 20 AA 99.427%
Jackson Melian 2000 20 AA 99%
Eric Duncan 2005 20 AA 98%
Matt Moses 2006 21 AA 98%
Juan Williams 1995 21 AA 98%
Jeff Natale 2005 22 AA 97%
Eric Duncan 2006 21 AA 97%
Nick Weglarz 2010 22 AAA 96%
Nick Weglarz 2009 21 AA 96%
Tony Mota 1999 21 AA 95%
Micah Franklin 1998 26 AAA 94%
Billy Martin 2003 27 AAA 94%
Bill McCarthy 2004 24 AAA 94%
Jackson Melian 1999 19 A+ 94%
Tagg Bozied 2004 24 AAA 94%
Kevin Grijak 1995 23 AAA 93%
Angel Villalona 2008 17 A 93%
Danny Dorn 2010 25 AAA 93%
Nic Jackson 2003 23 AAA 92%
Pat Cline 1997 22 AA 92%

And here are the major leaguers who KATOH deemed least likely to make it when they were in the minors. Its worth noting that a couple of them — Jorge Sosa and Jason Roach — made it as pitchers.

Worst KATOH Scores Who Made it to the Majors

Player Year Age Level MLB Probability
Justin Christian 2004 24 A- 0.017%
Jorge Sosa 1999 21 A- 0.027%
Tyler Graham 2006 22 A- 0.087%
Gary Johnson 1999 23 A- 0.136%
Bo Hart 1999 22 A- 0.155%
Tommy Manzella 2005 22 A- 0.181%
Michael Martinez 2006 23 A- 0.185%
Eddy Rodriguez 2012 26 A+ 0.194%
Kevin Mahar 2004 23 A- 0.215%
Will Venable 2005 22 A- 0.232%
Brent Dlugach 2004 21 A- 0.268%
Sean Barker 2002 22 A- 0.270%
Steve Holm 2002 22 A- 0.301%
Edgar V. Gonzalez 2000 22 A- 0.315%
Peter Zoccolillo 1999 22 A- 0.328%
Konrad Schmidt 2007 22 A- 0.337%
Tommy Medica 2010 22 A- 0.365%
Brian Esposito 2008 29 AA 0.392%
Jason Roach 1997 21 A- 0.396%
Jorge Sosa 2000 22 A- 0.439%

KATOH’s far from perfect, but overall, I think it does a pretty decent job of forecasting which players will make it to the majors. That being said, it’s still a work in progress, and I have a few ideas rolling around in my head to improve on the model. Furthermore, I’m working to develop something that will forecast how a minor leaguer will perform upon reaching the majors, to complement his MLB%. I’ll be dropping these new and improved KATOH projections (for both hitters and pitchers) after this year’s World Series, when we’ll all be desperate for something baseball-related to get us through the winter.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.




Print This Post

Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, and is an occasional user of the twitter machine: @_chris_mitchell


2 Responses to “Applying KATOH to Historical Prospects”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. tz says:

    Great stuff once again Chris.

    I remember Jackson Melian as a l6-year-old that the Yankees signed to a huge bonus. Unfortunately, at age 18 his parents were killed in an car crash while traveling in NC on a visit to watch him play in A ball. This may have something to do with his failure to make the majors, hard to say.

    http://en.wikipedia.org/wiki/Jackson_Meli%C3%A1n

    Vote -1 Vote +1

    • I’m a little too young to remember Jackson Melian, but have read about how highly touted he was when he signed. An experience like that definitely seems like something that could derail a kid’s career.

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *