## Heyward, Stanton, and 20 year-old studs

Eno Saris’s recent article on Jason Heyward comps got me thinking about comps. It also happens to coincide with the day that I got my Baseball-reference subscription. That I would start looking at seasons from 20 year-olds was inevitable.

It was maybe the third or fourth thing I noticed: 2010 featured another remarkable season from a 20 year-old hitter: Mike Stanton. Here’s a fun fact about Heyward: among 20 year-olds, only two guys walked in more plate appearances than the Braves’ young stud. (Ted Williams and Mel Ott.) Here’s a fun fact about Stanton: the guy closest to him in batted balls for home runs, among 20 year-olds, is Mel Ott, but Mike Staton sent a greater percentage of batted balls over the fence than any age 20 hitter in the retro-sheet era. (Perhaps less fun: he has the highest K% among 20 year-olds too.)

But who are the players most comparable to Stanton and Heyward? To answer this question, I started focusing on three true outcome rate stats (since those are more stable in small samples than ball-in-play stats) in seasons from 20 year-old hitters (regardless of experience). While it’s tempting to focus on rookies, there are just 102 seasons with 200+ PA from a 20 year-old since 1920, so focusing on similarly young rookies just shrinks an already small group. To expand the group a little, I added 21 year-old in their first season (also cut off at 200 PA).

To compare these players, I developed z-scores for players BB/PA, K/AB, and HR/batted ball (AB-K). (See a technical section below on these scores.) Then, treating each 20 year-olds 3 z-scores as a vector, I found the distance of their vector from Heyward’s and Stanton’s vectors. The *smaller *this distance from their vector, the more comparable they are.

Heyward |
similarity score |

Ed Stevens | 0.69 |

Jimmie Foxx | 0.69 |

Harlond Clift | 0.79 |

Jason Thompson | 0.80 |

Barry Bonds | 0.81 |

Pee Wee Reese | 0.89 |

Frankie Crosetti | 1.05 |

Frank Brazill | 1.13 |

Clint Hurdle | 1.15 |

Curt Blefary | 1.18 |

Stanton |
similarity score |

Darryl Strawberry | 0.57 |

Eddie Mathews | 1.06 |

Ron Swoboda | 1.30 |

Tony Conigliaro | 1.42 |

Jay Bruce | 1.51 |

Adam Dunn | 1.66 |

Willie McCovey | 1.67 |

Mickey Mantle | 1.67 |

Frank Robinson | 1.70 |

Rene Lachemann | 1.76 |

(Some one will look at this and say “so and so isn’t comparable to Staton/Heyward!” just by looking at the raw stats. Before you do, please note that these have a rough adjustments for the era, and these rates have varied a lot since 1900. The league average strike out rate was about 8% in 1930. I’ve tried to control for these variations in the scores; see the technical discussion below on how.)

Both players comp to some of the luminaries of the game. Heyward is, overall, much more comparable to past young studs than Stanton is. With the exception of Strawberry and Mathews, no player in Staton’s list is as comparable to him as any player in Heyward’s is to Heyward. You can’t see it here, but Heyward is actually more comparable to Ted Williams (1.38) and Mickey Mantle (1.45) than Stanton is to Jay Bruce. Willie Mays scores a 1.67 against Heyward, so, yeah, this Heyward kid is pretty good.

**Technical Discussion**

For those interested in the technical details of how these scores are generated, read on. For those who are happy with everything so far, the rest is just math in this section.

The scores are generated by looking at each player’s BB/PA, K/AB, and HR/(AB-K). I chose these because they’re relatively stable in small samples and independent of one another. Above, I said I generated a z-score, but that isn’t quite right. From each player’s stats I subtracted the league average stat in the year of that season. This creates a quick-and-dirty way to make cross era comparisons. This is critical, since, for example, when Mel Ott put up his gaudy 1929 season, the league average K rate was around 8%, and Ott was actually a tick below average in Ks. I then divided this sum by the standard deviation among all seasons from 2006 to 2010. This, by the way, is a serious limitation and I’m not sure that it works. It assumes that the dispersion of major league talent regarding these three stats stays pretty constant from era to era, even if the averages change a lot.

These three z-scores form a vector. Similarity is determined by the distance of the vectors: where the vectors are <a1,b1,c1> and <a2,b2,c2>, the distance between squared, d^2, equals (a1-a2)^2+(b1-b2)^2+(c1-c2)^2.

There are obviously some limitations to the system (besides the one already mentioned with the standard deviations.) One limitation is that it weighs all three stats equally, but these stats will not evolve equally. Moreover, the whole points of comparables is that they are supposed to inform our judgement about a player’s development: ideally, we weight these scores according to how much they tell us about a player’s future value.

I’d be happy to hear thoughts on how to improve the method of these similarity scores, so fire away if you have them.

Print This Post

Very sexy headline

Well done dude! See Barry up there in the Heyward mix, and Blefary – surprised by the other names though.

Well done, especially with the methodical caveats With average K rate about 18% now versus 7% in 1929 I would be surprised if the SD was not considerably higher today. The HR rate has also changed a lot historically, the walk rate less so. I think you made a good start and you seem to see your way forward from here. Go to it!

Isn’t Blefary the name of a pokemon? O.o

Very nice article.

Great article. I can only hope that Heyward’s career numbers come close to a Foxx or Bonds.

Nicely done, really enjoyed the read.

All in all, I still like the Mays comp for JH. Surprised he and Kaline didn’t show up for Heyward.

Some might think he’s in good company with that comp list. Personally, I think that comp list is lucky to be compared with him. Just my present-day bias though…

Just a heads up–there are quite a few references to some guy named Staton instead of Stanton.

I guess this would all serve as a reason to believe Heyward is more-accurately projectable than Stanton.

I’m surprised Justin Upton is nowhere to be found on these lists — maybe not enough walks?

“I then divided this sum by the standard deviation among all seasons from 2006 to 2010.”

Can you describe this a little more. Standard deviation of what?

Thanks

@Matt

I take every player with 300 or more PA in 2006 and determine his BB/PA, K/AB and HR/(AB-K). This process is repeated for every player with 300 PA in 2007, 8, 9 and 10 as well. All of these are lumped into one big group. I find the standard deviation in BB/PA, K/AB, and HR/(AB-K). A player’s “z-score” for BB/PA is his BB/PA-lgBB/PA in the season in question divided by the standard deviation of BB/PA derived from this process. Likewise for the other to rates.