Projections Differences Part I: Hitters

This is the time of year I begin my annual ritual of collecting and merging together various player projection systems in preparation for my Scoresheet draft. This is often a long and arduous process that involves lots of merging on string variables — thanks to no common id — and lots of data cleaning. Thankfully, FanGraphs provides a number of projection systems free of charge on the website that are both exportable and available with a common id number!

In this post I’ll focus on three projection systems — Steamer, Marcel, ZiPS — to see if there are significant differences in how they project various players to perform in 2012. Read this post by Matt Schwartz to see how this systems performed in the aggregate for 2011 and be sure to visit the websites of the authors of these systems to learn more about the details of how they are estimated.

The first thing to note about these systems is that they vary widely in playing time assumptions. In the common sample of 458 hitters, ZiPS projects a mean number of at-bats of 439, Marcel comes in at 360, while Steamer comes in at 339. As the table below shows, the correlations among the systems are not particularly high either. These differences in playing time will produce vast differences in counting stat projections, so we are going to ignore those and focus instead of rate stats.

At-bats      
  Marcel Steamer ZiPS
Marcel 1    
Steamer 0.76 1  
ZiPS 0.41 0.5 1

Turning to rate stats in the aggregate, we see a lot more similarity. The tables below report correlations for OPS and wOBA for all three projection systems. All three systems have very high correlations with the other systems. Steamer and ZiPS correlate higher with each other than either of them does with Marcel, which is likely due to the intended naiveté of the Marcel system.

OPS      
  Marcel Steamer ZiPS
Marcel 1    
Steamer 0.87 1  
ZiPS 0.86 0.94 1
wOBA      
  Marcel Steamer ZiPS
Marcel 1    
Steamer 0.86 1  
ZiPS 0.85 0.94 1

The differences among the systems get more interesting if we focus on individual players. The table below lists 2012 OPS predictions for the 15 position players who have the highest projected variance in their the OPS. To restrict this post to mostly everyday players, I set a minimum of 300 projected at-bats. In addition to the projections, the table presents 2011 OPS and career OPS for each of these 15 players.

This is quite a diverse group. The list is heavy on young players with either limited MLB playing time (Weeks, Mayberry, Guzman) or large variance in past performance (Heyward, Alvarez, Snider). There are two run of the mill veteran catchers (Olivo, Barajas), several players who are changing teams (Cuddyer, Reddick, Smith, Pujols), and two of the best players in baseball (Longoria, Pujols).

Name Marcel OPS Steamer OPS ZiPS OPS 2011 OPS Career OPS
Josh Reddick 0.749 0.764 0.680 0.784 0.706
Jemile Weeks 0.770 0.695 0.692 0.761 0.761
Travis Snider 0.729 0.788 0.711 0.617 0.730
Wilson Valdez 0.659 0.584 0.630 0.630 0.620
John Mayberry 0.797 0.746 0.727 0.854 0.846
Michael Cuddyer 0.769 0.815 0.837 0.805 0.794
Jesus Guzman 0.790 0.726 0.723 0.847 0.822
Jason Heyward 0.802 0.853 0.787 0.708 0.789
Rod Barajas 0.684 0.665 0.731 0.717 0.698
Albert Pujols 0.933 0.999 0.952 0.907 1.037
Seth Smith 0.795 0.762 0.730 0.830 0.833
Evan Longoria 0.846 0.910 0.881 0.850 0.875
Miguel Olivo 0.701 0.650 0.639 0.641 0.700
Giancarlo Stanton 0.873 0.935 0.910 0.893 0.869
Pedro Alvarez 0.712 0.773 0.770 0.561 0.696

Are there common factors that appear to lead to divergent projections? The short answer is yes. Young players with either limited experience or divergent previous results and players changing teams make up the bulk of this list. The perfect storm for projection systems is Josh Reddick. Reddick made the best of his first shot at significant playing time last year posting a .784 OPS in 278 plate appearances. Marcel and Steamer see his 2011 line as a good model for his 2012, while ZiPS is quite negative on Reddick, with a .680 projected line. Marcel and Steamer buy into Reddick’s 2011 performance, while ZiPS uses Reddick’s poor 2009 and 2010 lines in limited playing time to hold down his forecast. Moving to Oakland will not help Reddick’s power numbers, but the systems disagree most on his OBP, with ZiPS projecting the lowest walk rate and BABIP for Reddick.

With the exception of Jemile Weeks, Steamer is the most optimistic about the younger players on this list. Steamer sees Travis Snider producing at a level that most have projected for years. Steamer also projects that Jason Heyward will revert to 2010 form, that Pedro Alvarez will bounce back, and sees increased production for Giancarlo Stanton. Steamer’s optimism about young players is consistent with Matt Schwartz’s work referenced above, which suggested that Steamer projections tended to be riskier in nature.

Two of the surprising names on this list are Albert Pujols and Evan Longoria. Both are established stars with consistent past production. Steamer sees a career-best year for Longoria in his age 26 season, with no lingering effects of his 2011 injuries, which is not a bad guess at all considering that Longoria should be entering the peak production phase of his career. This is where Marcel’s simplicity works may lead to it being too conservative as it project a career worst year for Longoria, which is not the typical production pattern for a star-level player entering his prime.

Steamer is the most optimistic system for Albert Pujols as well. All three of these systems see him rebounding from a 2011 in which he posted his first ever season with a wOBA less than .400, but only Steamer sees him back above .400 in 2012. Steamer essentially sees vintage 2010 Pujols showing up this year, while Marcel and ZiPS see his declining OBP and power continuing unabated. In general I’d be inclined to side with Marcels and ZiPS in this case as players on the wrong side of 30 are not often able to avert the decline phase of their careers, but Pujols is not your normal 32 year old hitter. Given his new contract it is clear that the Angels fans hope that Steamer is correct.

The bottom line on projections for hitters is that there is not a lot of variance to discuss. The median variance in OPS for this sample is only .025, which is not enough to get overly excited about. The 15 guys in the table above standout, but even for these guys the prediction variance is not enough to fundamentally change how we view a player. Pujols will be a lineup force whether his OPS is closer to .930 or closer to 1.000, while Phillies fans will continue to hope that Wilson Valdez does not get a lot of playing whether he has a .580 or a .650 OPS. Where the systems do differ quite a bit is in their pitching projections, which will be covered next week.




Print This Post



I am political science professor at the University of North Carolina. I grew up watching the Braves on TBS and acquired Red Sox fandom during the 1986 World Series. My other hobbies include cooking, good red wine, curing meats, and obsessing over Alabama football---Roll Tide! Follow me on Twitter @ProfJRoberts.


20 Responses to “Projections Differences Part I: Hitters”

You can follow any responses to this entry through the
RSS 2.0 feed.
  1. Mario Mendoza of commenters says:

    Josh Hamilton should be on your list, too. Can anyone explain why ZiPS is so down on him? You won’t find him until page 2 of hitters sorted by wOBA, behind the likes of Cuddyer, Beltran, Lawrie, and Utley.

    Vote -1 Vote +1

  2. Mike says:

    Correction: Wilson Valdez was traded this offseason to the Reds.

    Vote -1 Vote +1

  3. Richie says:

    I’d agree that Pujols is not your normal 32-year-old hitter. But then people scream insults at me whenever I do that.

    Vote -1 Vote +1

    • bobsmith says:

      Because he’s your abnormal 34-year-old hitter, amirite?

      Vote -1 Vote +1

    • phoenix2042 says:

      pujols has a career OPS above 1.000… good god that is impressive.

      Vote -1 Vote +1

    • themiddle54 says:

      I wouldn’t insult anyone. But I’d say that you can’t assume because Pujols was an extreme performance outlier as an under-30 that he will be an aging-curve outlier as an over-30.

      So far, he has aged quickly post-29. There’s enough there that the folks crying that he’s Pujols and he’ll always be awesome sound crazier to me than the ones who say he’s lost 100 points of OPS each season for two years and he’s likely to continue that downhill trend.

      Vote -1 Vote +1

  4. Snowblind says:

    Interesting take, been wondering how some projections line up side by side for a bit. But why is Marcel included?

    I thought the point of the Marcel system was to demonstrate the limitations and uses of projection systems overall, and not as anything close to accurate.

    (That sounds overly harsh, sorry… more like, “in as much as any projection system is reliable, Marcel is the floor… the writeup of the system itself says, paraphrasing, ‘any projection system better be able to beat this one…’”. )

    Vote -1 Vote +1

    • Tangotiger says:

      Except there has been NO system that has been proven to be consistently better than Marcel.

      That said, anyone with a reliability of under .50 should not really be used for these purposes.

      Vote -1 Vote +1

  5. guesswork says:

    One of the problems here is that sluggers will naturally have a higher variance, thus several of those guys simply are on the list because they hit a lot of home runs.

    For example, consider home run rate. A player who has a home run rate of 5% of their at bats will have a standard deviation over 500 PA of 1%. That means over 500 PA, we expect anywhere from 15 to 35 home runs, which will lead to a massive swing in SLG and thus OPS.

    Conversely, a guy with a home run rate of 0.5% will have a standard deviation of about 0.3%, or a range of about 0 to 6 home runs over 500 PA. His OPS barely changes relative to the slugger.

    Of course, the reason I consider home runs is because they are weighted so heavily in SLG, and this weight is squared when considering variance! A home run will account for 16 times as much variance in SLG as a single, and a little bit less for OPS.

    Vote -1 Vote +1

  6. jim says:

    why hasn’t stanton’s player page been updated yet?

    Vote -1 Vote +1

  7. J. Cross says:

    Good stuff, Jason. I think this gives a good sense of the systems and what they do differently… I’ll have to take another look at Travis Snyder and Jason Heyward and try to figure out why Steamer likes them more than other systems (and Wilson Valdez so much less). I feel good about the Heyward projection.

    Vote -1 Vote +1

  8. Sean says:

    Could someone tell me how I can easily merge multiple excel pages with common id numbers?

    Vote -1 Vote +1

    • J. Cross says:

      Use the vlookup funtion. =vlookup(cell_with_id#, array in which to search for id number which must be in the left most column of the array, number of column with data to retrieve, false)

      Vote -1 Vote +1

  9. Herbstr8t says:

    We sorely need common unique IDs for baseball statistics on the web (paging Dave Cameron)!

    That said, Jason you could save time by using the Zeile Consensus Projections (merges 11 projections) put together by the folks at Fantasy Pros.

    There have been many posts on FanGraphs and elsewhere touting the virtue of merging lots of projections together to look at the relative performance of players. Long story short, I’m using the Zeile projections to calculate dollar values for the my auction. Of note, in my 6×6 OPS league Braun is ahead of Kemp, Votto ahead of Joey Bats and HanRam ahead of Reyes.

    Vote -1 Vote +1

  10. Michael Scott says:

    Zips has 13 guys with a wOBA over .375 this year, last year there were 26. It also has 2 guys with a wOBA over .400 while last year there were 10. Why is ZIPS so pessimistic?

    Vote -1 Vote +1

    • CircleChange11 says:

      Because projections are made off part performance.

      The “extra” guys that wOBA .375+ or .400 + and guys that hit 40+ HR and pitchers with 2.25 ERAs had seasons where things like BABIP, HR/FB, LOB%, etc were well above/below their career norms or recent past.

      In other words, you could try to predict which guys will have “lucky” seasons, but you would project such seasons.

      So, last year there were 10 guys that wOBA’d .400+. Which 10 guys will do it this year? Of the 10 that did it last year, how many experienced things that they didn;t experience before and won’t project to do it again (BABIP, HR/FB, BB%, etc).

      Vote -1 Vote +1

      • CircleChange11 says:

        2011 .400+ wOBA Players (’11 BABIP/Career; ’11 HR/FB/Career)
        ——————————
        J.BAUTISTA .441 (.309/.277; 22.5/15.5)
        M.CABRERA .436 (.365/.347; 18.2/18.3)
        R.BRAUN .433 (.350/.339; 18.8/17.9)
        M.KEMP .419 (.380/.352; 21.5/15.9)
        P.FIELDER .408 (.306/.300; 21.8/20.3)
        A. GONZALEZ .406 (.380/.322; 16.4/17.0)
        D. ORTIZ .405 (.321/.303; 17.5/18.6)
        J.VOTTO .403 (.349/.352; 18.2/19.4)
        L.BERKMAN .402 (.315/.317; 19.9/18.9)
        J.ELLSBURY .402 (.336/.325; 16.7/9.5)

        Most guys had + years in BABIP and HR/FB. They guys that didn’t have been excellent players in recent history or guys that experienced “throwback years” to past glory.

        Sabermetrics and/or projection systems can likely project the number of players that will do something like a .400 wOBA based on past seasons, but what they don;t project is which players are going to have an unusual year where they perform much better than previous seasons (Ellsbury, Kemp, etc) or turn back the clock (Ortiz, Berkman, etc) … or even which very good guys are going to do even a little better in BABIP or HR/FB (most of the remaining 10).

        Note: I didn;t look at BB% or non-HR extra-base hits just to keep it simple.

        Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>