Alphabetism in Baseball

You may already be aware of this, dear reader, but alphabet discrimination exists.  People with surnames near the beginning of the alphabet own a slight but noticeable advantage over their late-alphabet colleagues.  They appear earlier in directories, leading to more phone calls.  They receive more applause at awards ceremonies and graduations, because people tend to get tired of clapping by the time the T’s roll around.  They even are more likely to receive tenure and Nobel Prizes, according to a study by Liran Einav and Leeat Yariv, because authors of collaborated work in certain fields tend to be recognized in alphabetical order.

The alphabet is important in baseball, too.  David Aardsma, despite the success he’s found in an eight-year career, is still best known for supplanting Hank Aaron as the first player listed in the alphabetical list of players.  This fact is the second sentence in his Wikipedia article. People are still upset by this.

But is there alphabet discrimination in baseball?  I collected the performances of every hitter in baseball history (this is an activity which sounds far more impressive than it actually is), organized them by surname, and averaged them by their hitting ability, as represented by FanGraph’s own fRC+.  The stunning and aesthetically pleasing result:

(Note: Each player’s career wRC+ is counted once, no matter how many seasons they played.  Since a superior player is more likely to last multiple seasons than an inferior player, the graph doesn’t average out at 100 even if the average player does.)

From this beautiful and concise graph we can draw several conclusions:

  • The next player whose last name starts with X will be the greatest player whose last name starts with X… of all time.
  • Having a last name beginning with a Q is the kiss of death.  In fact, the letter Q owes its recent success to the performance of Carlos Quentin; without him, the average wRC+ would be 76.
  • Other than that, not much.

But why stop there?  Why not examine hitting ability based on something even more arbitrary, such as the length of a player’s last name?

Bringing up the rear there is America’s favorite Saltalamacchia, proud owner of a career .699 OPS.  But what’s surprising is the statistical significance of the data.  For you kids at home with the graphing calculators, the data sports a r-squared of .69, and it jumps to .78 if we boot out a certain busted catching prospect.

The causes of this, if any, lie in obscurity.  Perhaps players lose confidence when the PA announcer botches their name at home games; perhaps scouts are more likely to remember short names when scanning for talent.  Who can say?  The world is full of biases, swirling and eddying around us all.

Print This Post

Patrick Dubuque writes for NotGraphs and The Hardball Times, and he served as former Bill Spaceman Lee Visiting Professor for Baseball Exploration at Pitchers & Poets. Follow him on Twitter @euqubud.

15 Responses to “Alphabetism in Baseball”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Theo says:

    As you can see from the first graph, children, the average player is, statistically, below average.

    Vote -1 Vote +1

    • Thanks for pointing that out. I added a note above to clarify the numbers.

      Vote -1 Vote +1

    • Friedman says:

      I definitely thought the same thing.

      The only explanation that I can think of is that he didn’t weight wRC+ by PA. Players with high wRC+ will stay around much longer and for every Albert Pujols, there will be multiple replacement level players with subpar wRC+s. If they’re given equal weight, this might result in the below-average error.

      Vote -1 Vote +1

  2. SquirrelBoy says:

    Damn! Going by this I would be an effing terrible ML Player… 8 Letters beginning with R…

    Vote -1 Vote +1

    • AustinRHL says:

      Not really. 8 letters and R are both slightly below average, but not too far. By contrast, I have a hyphenated last name that comes out to twelve letters (thirteen total characters with the hyphen), which is decidedly below average. It does at least start with H, which is an average-ish letter.

      Vote -1 Vote +1

    • Andrew says:

      4 letters beginning with J, second best possible outcome! (Or maybe Y is higher? Can’t tell)

      Vote -1 Vote +1

  3. TheGrandslamwich says:

    Whoa whoa whoa! Graphs in Notgraphs? Not cool.

    Also, interesting stuff.

    Vote -1 Vote +1

  4. AustinRHL says:

    The general trend for wRC+ versus length of name shouldn’t be surprising. There are more people with shorter last names, which means that they comprise a larger pool of people from whom the best MLB players end up being selected. It’s sort of like how it’s harder to find left-handed pitchers who throw hard. I’m too lazy to plot the frequency of letters beginning a last name versus wRC+, but there should be a positive correlation there, too.

    Vote -1 Vote +1

    • Rob says:

      I’d imagine it’s more of a bell curve, with 5-6 letter last names being the most common. Your theory doesn’t explain why 3 letter names are the most successful with fewer players.

      And given that wRC+ is a weighted average, a higher number of players to comprise a “pool” would not imply a higher average… It would only increase the likelihood that that categories’ average is closer to the league-wide average. Smaller groups are more likely to produce outlier results, such as Salty ( the only player in history with 14 letters) being the one and only player to yield a much lower wRC+ than the league average. He could have just as easily been that much higher.

      It appears that a 93 is the average player’s wRC+ (without correcting for PA). If you look at the two graphs, the more common groups hover around 93, whereas those that are significantly higher or lower tend to be less common (last names that begin with S, T, and M are at least very common last names in my cell phone’s address book, whereas I know no one with the last name beginning with a Y and very few with an E).

      Vote -1 Vote +1

      • kylemcg says:

        What we really need here are error bars!

        Then again this is /not/ graphs.

        Vote -1 Vote +1

      • Bryz says:

        “Your theory doesn’t explain why 3 letter names are the most successful with fewer players.”

        Mel Ott may have something to do with it.

        Vote -1 Vote +1

        • John R. says:

          I had the same thought. Just how many players with three-letter names have there been? Is it few enough that Mel Ott could single-handedly pull up an otherwise-average group?

          Vote -1 Vote +1

          • Well, since you asked, good gentlemen:

            There are forty-two “threes” in baseball history, which includes eight Lees, seven Mays and a pair of Otts. Mel tops the list, but he’s helped by Dave Orr, Jason Bay (for the moment), and Derrek Lee. It’s a decent list, but nothing compared to Ruth/Cobb/Foxx/Mays.

            Instead, it’s the lack of failure that helps the threes out so much: only four have a wRC+ of 70 or below (9.5%), compared to 16.4% of the fours and 12.3% of the fives.

            Vote -1 Vote +1

  5. kylemcg says:

    The Q’s must pitch better than they hit. Quisenberry, Qualls, Quantrill?

    Vote -1 Vote +1

    • Chris says:

      I dunno, last year Qualls would probably have been a better hitter than pitcher had he been given the opportunity.

      Vote -1 Vote +1