Ready when you are?

As part of my job with Michigan’s baseball team, I’ve spent the past couple of weeks putting together a database of every player on 40-man rosters at the end of last season, in addition to some biographical, draft, and performance data. If I’m lucky enough to be employed by a team someday, a task like this will likely be accomplished through a few keystrokes and a streamlined SQL query. Sadly, I’m not fortunate enough to have that data at my fingertips and that means long hours of the joy that is data entry. Luckily, I DVR’d all 10 parts of Ken Burns’ documentary, Baseball, which helped me get through those long and tedious hours.

Part of this data included the players’ state of origin, defined as the state in which they played their high school ball. I also took the years between the date they were drafted (for the last time, if a player had been drafted and then returned to school) and their major league debut from Baseball Reference. It struck me that there might be a connection between the two worth investigating.

The states of origin, as you might have guessed, are utterly dominated by three baseball hotbeds; California, Texas and Florida. Among the three, they produce about 40 percent of drafted current big-leaguers (for the international market, it’s even more lopsided, with 75 percent of major league players hailing from the Dominican or Venezuela).

That 40 percent mark holds up in examining the sub-samples of high school and college players. While part of this comes from the fact that these are three of the four most populous states in the union, I don’t think it’s a stretch to say that these states have a higher level of competition in high school than you’d find anywhere else in the country.

I wondered, then, whether draftees from these states were more prepared for professional baseball than their counterparts from other states that produce less major league talent. If these players have been subjected to higher-level competition from an early age, it’s possible that they would experience less of an adjustment period in the minors, leading to strong performance earlier in their career. Those results might cause teams to fast-track them to the big leagues, resulting in an earlier debut than their peers.

I’ll cut to the chase here; it didn’t, or at least not by much. However, I’ve read a few articles lately, chiefly Bill Petti’s Fangraphs article on predicting decline, that address the idea of publishing negative results. Maybe this investigation didn’t produce a significant finding, but I think it’s important to put it out there, especially considering that it might get some analyst’s creative juices flowing about other ways to address this subject.

Average years to major league debut by origin

Using Recurrent Neural Networks to Predict Player Performance
Technology is rapidly advancing possibilities in decision-making.

Origin HS draftees
College draftees
All States 4.37 3.14
California 4.41 2.98
Florida 4.26 3.11
Texas 4.11 2.8
Hotbeds 4.28 3.01
Non-Hotbeds 4.43 3.23

As you can see from this table, players from traditional baseball hotbeds outpaced draftees from other states to the big leagues by something like two tenths of a year on average. I would argue that this finding is likely not a significant result, at least in terms of its potential impact as actionable intelligence for a major league team’s draft strategy. This is especially true when you consider the dataset produces the entirely unsurprising result that players drafted in higher rounds are faster to the majors, and for both high school and college draftees, players from the hotbeds were drafted an average of roughly three-quarters of a round before players from other states.

This dataset also produces several limitations that make it even more difficult to say whether there’s something of interest in the idea that players from states with better high school competition are more prepared for the game at the professional level.

I don’t have the player’s age at signing, so we don’t know whether players who took less time to the majors did so in part because they were drafted at a more advanced age. We also don’t know from these data whether the player stuck in the majors; that is, for some of these players their debut might represent only a September cup o’ coffee and they might have returned to the minors for the beginning or even all of the next season. Finally, the sample of 40-man roster players doesn’t include the players who didn’t make it, so it’s obviously quite selective and that issue may introduce bias into the data.

With more resources, better data, or simply all the time in the world, there are several other approaches to this problem that might bear fruit even if this particular angle did not. It would be interesting to see whether players from the hotbeds are more prepared for their stays at the professional level by considering whether their performance is better, either at the lower levels of the minors or as a pre-arbitration player in the majors.

I’d also like to address some of the selectivity problem by determining how the big-league sample compares to the population of drafted players. If 40 percent of major league players who were subject to the draft are from these three states, do the same percentage of draftees hail from these states, or are players from hotbeds more likely to make it because they’ve faced stiffer competition before reaching the professional game and are potentially more prepared to handle the temporary failure that faces most prospects at some level of the minors as they adjust to more and more difficult competition?

There’s also the fact that I considered all players simply based on where they played their high school ball. While there’s some chance these players can stay ahead of the curve through college, another approach to the competition question would be to consider college opponents. While the data would obviously need to be adjusted for how early the players were drafted, it would be interesting to see whether draftees who play college baseball in conferences with stiffer competition have a faster route to the majors than players taken from smaller schools or those who face lower-level competition.

If there’s any takeaway, it’s this; the draft is a very complex animal, and with the new rules in place there is work to be done understanding how teams can optimally approach it. While I wasn’t able to do that in this investigation, I hope this can be a step toward isolating the next big draft inefficiency and understand how teams can more consistently use picks to create valuable, productive, and cost-controlled big-league talent.

Print This Post
Sort by:   newest | oldest | most voted
David P Stokes
David P Stokes

There may be some selection bias at work here.  It’s possible that a player who isn’t from one of the 3 “hotbed” states might not be as well scouted as an equally talented and “ready” player who is from one of them, and therefore might not be drafted as soon (or at all) and might not be considered ready as quickly, even though he is.


Hotbed v. non-hotbed sounds artificial, especially if you don’t include Georgia among hotbeds. I think the key is hot state v. cold state.

Let me chime in on Colorado baseball players.  First the only major college baseball program in the state is at the Air Force Academy.  Second, when looking at high school players from Colorado, scouts say the only players worth watching are pitchers.  It is too cold too late in the season for position players to get in their work, but pitchers can play in any weather because by the nature of their position, they don’t stand around and get cold.  History bears this out.  The best players from Colorado in recent history have been Stan Williams, Goose Gossage, Roy Halladay,… Read more »

Hey Doug, this was a nice piece of research. Even without a dramatic result, thanks for sharing it.