College Statistics

With this crop of prospects for the 2009 MLB draft this summer being college heavy, a lot of fans of teams with high picks are following the performances of guys like Stephen Strasburg, Dustin Ackley, and Grant Green very closely. Right now, D.J. LeMahieu is getting a lot of buzz for his blistering start for LSU. Because baseball is such a statistical game, this is only natural. I just want to issue a word of caution – college statistics are just not that valuable of a predictive tool.

From the use of metal bats, the huge variances in quality of opponents, some parks that heavily impact run environments, and the smaller sample of games played, there are all kinds of adjustments that need to be made to try to translate NCAA statistics into something that resembles context-neutral. And, once you’ve done all that work, there is still limited value in the numbers.

For instance, let’s take Dustin Ackley – scouts rave about his advanced approach at the plate, and he’s universally acknowledged as the best hitting prospect in this draft. He has a compact, line drive swing and makes excellent contact. The only real questions surrounding him are how much power he’ll develop and what position he’ll play in the majors.

Since I have a database that contains a significant amount of college statistics dating back to the 1980s, I ran a query to try to find some comparable players to Ackley statistically. I wanted to see how many of these high BB/low K/gap power hitters there were, and how they did in the majors. Some of the names on the list may surprise you.

For instance, Brad Wilkerson was an absolute monster in college. His junior year at Florida, he hit .347/.538/.743, drawing 85 walks and striking out just 29 times in 222 at-bats. He also launched 23 home runs as 49 of his 77 hits went for extra bases. Oh, and he pitched, too. From a pure numbers standpoint, Wilkerson was as good offensively as anyone in recent college history. Obviously, that didn’t translate to the major leagues, as he’s been just a decent hitter, posting a career .341 wOBA.

It’s not just Wilkerson, either. Khalil Greene (.470/.552/.877) and Chris Burke (.435/.537/.815) had two of the best offensive seasons for a middle infielder in college history, and neither of them have been able to duplicate their success with wood bats. Mark Teixeira was a monster in college, but his numbers were surpassed by Dan Johnson. Alex Gordon and Michael Aubrey have virtually indistinguishable seasonal marks. Ryan Braun hit the snot out of the ball at Miami, but so did Jamie D’Antona at Wake Forest. If I showed you Chase Utley‘s 2000 season next to Greg Dobbs‘ 2001 season, you couldn’t tell them apart.

Good hitting prospects hit well in NCAA ball, but so do less good hitting prospects, and just using numbers, it’s basically impossible to tell them apart. We’re big fans of statistical analysis here, obviously, but we also need to know the limits of what numbers can tell us. When it comes to college performances, scouting reports are what you want – the guys hitting the fields everyday and looking at swings and athleticism do a better job of predicting which college players will hit in the majors and which ones won’t.

Dustin Ackley is probably going to hit in the majors. I’m saying that because scouts think so, not because he’s hitting in college.

Print This Post

Dave is a co-founder of and contributes to the Wall Street Journal.

24 Responses to “College Statistics”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Ryan B says:

    Does this hold true for pitchers as well? Since the use of metal bats would be a hindrance to pitchers, wouldn’t a pitcher who thrives during their use be even better if the bats are wooden? I realize they will face better batters in the minors, and I don’t know how minor league parks compare to college parks (if it’s an advantage for batters or hitters), but I wonder if pitchers have the same uncertainty as batters do….

    Vote -1 Vote +1

  2. MS says:

    Hopefully my comment doesnt get published, but Ryan Braun went to Miami.

    Vote -1 Vote +1

  3. JH says:

    Do you think it’s fair to say that a highly touted prospect’s failure to hit at the collegiate level is more instructive than success?

    Vote -1 Vote +1

  4. Rafa says:

    As a Cal alum I wish Ryan Braun went to Cal, but he did in fact go to Miami.

    Vote -1 Vote +1

  5. Drew says:

    Jamie D’Antona has hit a bit. 303/361/495 as a minor leaguer… No real room for him in the bigs with Renoylds and Jackson (theres a Cal guy!) blocking his way.

    Vote -1 Vote +1

  6. Brian Cartwright says:

    Metal bats, park factors, level of competition, sample size…all of these are valid concerns and must be accounted for.

    However, in attempting to make your point, you only quote one year at a time for the players you cite, which does nothing to help solve the sample size issues. I could cherry pick outlier seasons of players already in the majors. Would that mean we can’t predict future mlb performance? Use all available seasons, giving more weight to the more recent, and then view season by season to look for outlier performances that are not likely to repeat themselves.

    I have Khalil Greene’s translated college wOBA, 1999-2002, as 332, 342, 300, 461. Two above average (317) for ss, one below, one monster. As a pro, 341, 304, 345, 323, 334, 328, 268. He hit for almost as much power in MLB in 2007 as he did in college in 2002, but with a poor BABIP which drove down his BA.

    My projections show two years of Braun at Miami translated to a MLE of 274/332/480 346 wOBA, above average for 3b. Since then he’s cut his K’s and added some HRs. Despite three seasons with 20+ HRs, I translate D’Antona’s college numbers to 258/323/472 341 wOBA, below average for 1b, and not too far off what he did in the minors before a BA spike in 2008.

    Alex Gordon, 258/334/459 343 wOBA in college, had a career year in 2005 at AA, but his MLB numbers have been consistent with what he did in college. Michael Aubrey, 280/334/453 341 wOBA in college. Thru 2007, 264/317/452 332 wOBA.

    Positions are an issue as well. A 340 wOBA is above average at 3b, but it will almost never get anyone to the majors at 1b. That’s what’s holding back guys like D’Antona and Aubrey.

    Dustin Ackley? 310/365/454. Good contact, singles and doubles, average walks, below average HRs. A 357 wOBA is pretty good in cf, expecially if he has a glove to go with it. It’s only MLB average at 1b.

    Vote -1 Vote +1

    • Dave Cameron says:

      Your translated numbers show Braun, D’Antona, Aubrey and Gordon as offensive equals. That’s a problem, not a success.

      Translating statistics is fine, but you have to be able to show that they have tangible predictive value. The studies that I’ve seen on college statistics show very little predictive value. If you’ve got a study that shows differently, I’d love to see it.

      I’d love to see how well Oliver matches up with future performance too, while you’re at it. I know I’m not the only one who sees Bryan Myrow with a higher projected wOBA than Adrian Gonzalez and says “hmm, he’s doing something wrong”.

      Vote -1 Vote +1

      • Johnny Dickshot says:

        It’s funny how you demand proof of others yet refuse to hold yourself accountable for such claims of your own—I think you know which thread I’m talking about. You say go to Inside the Book and search ‘projections’, but here’s the problem with that: it yields seventeen hundred results. ‘Projection’ yields 93 more. If the article existed, I’d have to imagine you’d have linked to it by now.

        We’re still waiting, Dave.

        Vote -1 Vote +1

      • Jay in BMore says:

        I hope anyone who is still waiting is not holding their breath.

        Vote -1 Vote +1

      • Brian Cartwright says:

        I’m not going to bring up any outside sources to criticize Dave, and I have to admit I don’t know his body of work that well. I do find it interesting that Dave defends CHONE’s ability to project minor leaguers. I will soon publish accuracy results for the Oliver projections, and hopefully I can do some comparisons to others like CHONE or Zips.

        As far as this article, I am saying that he has brought up the relevant questions, but has failed to prove his point. Personally I do think we can do college projections, my results so far have not been any worse than projecting minor leagues, but there’s still work to be done to improve what I have.

        Vote -1 Vote +1

      • ebc says:

        Wow, that’s an ugly thread.

        Cameron’s almost certainly right about that particular case. I don’t know about CHONE, but PECOTA has considerably more confidence in its projection for Wood than for Abreu.

        Why? Because there’s a big difference between a generic minor leaguer and 24-yr-old with 1500 pa’s in the high minors. And there’s a big difference between a generic major leaguer and a 35-yr-old who seems to be losing his skills in a hurry.

        Vote -1 Vote +1

      • Johnny Dickshot says:

        I’d debate that point, ebc, but that’s really not the issue. I will say, however, that CHONE already takes into account everything you’ve said.

        Vote -1 Vote +1

      • ebc says:

        It takes all that into account when generating its projection; but that’s not what we’re talking about. The issue at hand is how accurate that projection is, how much faith we should have in it.

        CHONE, as far as i can tell, doesn’t sum up its projection’s volatility with a beta number as PECOTA does, but you can see from looking at the 10% and 90% projections that Abreu has a wider range of outcomes than Wood does. Which means he’s less predictable. If you’re comparing, say, Albert Pujols to Madison Bumgarner, then yes, the proven major leaguer is far more reliable; but not in this case.

        I mean, feel free to obsess about whether Dave Cameron is a dick or not if that’s what interests you, but as far as the issue at hand, it seems clear that SOME minor leaguers are more predictable than SOME major leaguers.

        Vote -1 Vote +1

      • Johnny Dickshot says:

        “The issue at hand is how accurate that projection is, how much faith we should have in it.”

        That’s an issue, but not my issue.

        “You can see from looking at the 10% and 90% projections [for CHONE] that Abreu has a wider range of outcomes than Wood does.”

        No, you can’t, because that’s what I’m doing right now. There’s a 77 point band around Wood’s top- and bottom-shelf OBP projections and a 149 point band around his SLG while there are only a 65 and 122 point band around Abreu’s OBP and SLG. I’m not sure how you came to the conclusion you did, but if you were looking at LW Runs you were in error to do so because it is highly contingent on playing time and vice versa.

        That said, I’m not sure if using a projection to prove the accuracy of a projection is the best way to go about it anyway. That’s like saying “The Bible is God’s word because it says it is!”

        “I mean, feel free to obsess about whether Dave Cameron is a dick or not if that’s what interests you…”

        It is, and I appreciate your blessing.

        “…but as far as the issue at hand, it seems clear that SOME minor leaguers are more predictable than SOME major leaguers.”

        That might be true, but I’m not sure if you’d have any way to know until after the fact. Again, still not my issue. I don’t care that Dave is a caustic dick because I am too. I don’t care that he makes mistakes because we’re all human. What I care about is that we have an otherwise outstanding website going here and having a prominent analyst who makes bold proclamations without regard to facts and employs the “I’m right, you’re wrong, end of conversation” tactic when cornered undermines the credibility of the whole operation.

        Vote -1 Vote +1

      • djw says:

        Obviously he wasn’t working with your ultra-modern super fancy projection systems, but I seem to recall Bill James being pretty pesuasive about the comparable projectability of major and minor league stats in one of the old abstracts. I probably reveal myself as a dinosaur by citing James, but I’d thought this particular question had been more or less settled. If there’s new data, or if we’ve developed a more successful projection system for the majors that doesn’t apply to the minors, I’d certainly be interested in seeing the data.

        Vote -1 Vote +1

      • Brian Cartwright says:

        The projection system isn’t ultra-modern, the guts of it I worked out over 20 years ago from reading the Bill James Baseball Abstract. What I didn’t have then was a relational database on a fast computer and the internet, as opposed to a stack of index cards and a Who’s Who in Baseball

        Vote -1 Vote +1

      • Johnny Dickshot says:

        Old abstracts being the operative phrase.

        Vote -1 Vote +1

  7. Brian Cartwright says:

    Yes, those four do show as being pretty equal, but remember it’s at an early age, at most 22. Braun projected as a better than average bat at 3b, and he has improved. Gordon had one very good season at AA, but otherwise has stayed the same. D’Antona and Aubrey have stayed the same. That level is not good enough for a MLB 1b, but it can be at 3b or lf.

    Myrow has only had two seasons with a MLE hr% above MLB avg. He’s a high BABIP high walk guy. Offensively he’s a MLB avg 1b. TotalZone shows his minor league defense primarily at 1b with some 3b and lf as slightly below average. I would probably assume he’s slow, so once you add it all up, he’s no more productive than a bunch of other guys who can play 1b.

    Adrian Gonzalez has had three good, consistent years with SD, but not outstanding. Single season, park adjusted, unregressed wOBAs of 379, 369, 375. Projections have been, after those seasons, 350, 358, 363. Because the projections weigh in past seasons, there is a lag when talent is on the way up or down, there is an attempt to adjust for that, and there’s regression, but it does look like Gonzalez’s projection should be 370-375 instead of 363. Not a large difference, but noticeable. I have decreased the weight of past seasons relevent to the most recent, but that also effectively drops the sample size, and thus increases the effect of regression. I may need to then also back off on the amount of regression.

    I am studying these things and looking at ways to improve my system. I’m working on an article which will talk about accuracy, aging, biases, etc. I’m also currently studying college batting stats looking for ways to improve translations.

    Vote -1 Vote +1

  8. Brian Cartwright says:

    Check out this article
    For 18 of 33 1st rounder hitters 2006-2008 the college wOBA was within 15 pts of their pro MLE wOBA.

    Since then I have added more college players to my database. I will soon be using GameDay play by play 2005-2008 for minor league stats, which will allow me to get better park factors for minor league teams, and I am testing some different techniques to get better results.

    Vote -1 Vote +1

  9. Scappy says:

    A change to wooden bats would greatly improve the quality of the data. The slower bat speed and the nature of wood would make maintaining a high slash line much more difficult and allow the cream to rise to the top a little more.

    Hell its also safer for the pitchers.

    Vote -1 Vote +1

  10. lookatthosetwins says:

    I’m paraphrasing, but I remember someone asked Bill James what percentage someone should use statistics vs. scouting reports for college players.

    He said something like 95% scouting, 5% statistics.

    The problem with college statistics is not just that it’s nearly impossible to make them context neutral, it’s also that the players are so far from their true talent level. Even in low minor leagues, making the statistics context neutral doesn’t solve the problem, because these players are still growing and developing.

    Vote -1 Vote +1

  11. Rick in WNY says:

    Wood?…metal?….you guys forget that many of these NCAA guys the past few years are using 100% composite bats! Screw funky aluminum barrels, go hit with one of these hot composite barreled babies. “Cluck…”…that ones 375ft!
    These composite rockets get better (more barrel flex = more bounce) after they are broken in. I read somewhere that the NCAA tested all the bats this year at the finals, many of the 100% composite ones failed ie were too hot, though they passed specs when new.

    The NCAA looks like they are going to do something soon. They already have New rules for 2012 (BBCOR?), but rumor has it something is going to happen sooner.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>