Evaluating 2012 Projections

Evaluating 2012 Projections

Hello loyal readers.  It’s time for the annual evaluation of last year’s player projections.  Last year saw Gore, Snapp, and Highly’s Aggpro forecasts win among hitter projections (http://www.fangraphs.com/community/comparing-2011-hitter-forecasts/) and Baseball Dope win among pitchers http://www.fangraphs.com/community/comparing-2011-pitcher-forecasts/ .  In general, projections computed using averages or weighted averages tended to perform best among hitters, while for pitchers, structural models computed using “deep” statistics (k/9, hr/fb%, etc.) did better.

2012 Summary

In 2012, there were 12 projections submitted for hitters and 12 for pitchers (11 submitted projections for both).  The evaluation only considers players where every projection system has a projection.

As Table 1 shows, Dan Rosenheck blew away the competition as the best forecaster, taking 1st among pitchers and 3rd among hitters.  My personal projections (Larson) took 2nd, with the Steamer projections taking 3rd overall.  Bringing up the rear were the Marcel, Guru, and CAIRO projections.

System

Hitters

Pitchers

Average

Rosenheck

4.40

1.25

2.83

Larson

4.40

3.75

4.08

Steamer

5.20

3.50

4.35

CBS Sportsline

3.60

5.50

4.55

Fangraphs Fans

4.60

6.25

5.43

ESPN

6.00

6.25

6.13

ZIPS

6.20

7.75

6.98

Rotochamp

8.20

7.00

7.60

Marcel

9.80

9.75

9.78

Guru

8.80

10.75

9.78

CAIRO

9.60

10.75

10.18

Smith

na

5.50

na

GQE

7.20

na

na

Table notes: Table 1 shows the average projection rank across each category.  For example, Fangraphs Fans took 7th in Runs, 2nd in HRs, 9th in RBIs, 3rd in AVG, and 2nd in SBs, for an average rank of 4.6 in the hitters categories.

Detailed Forecast Analysis: Hitters

I look at two main bases of comparison: the first is the Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example, Marcel projections show very little bias, giving them a low RMSE, but are very poor at predicting variation among players, meaning that it’s not a terribly good forecast if you’re trying to rank expectations of future performance.

Hitter RMSE Results:

system

R

rank

HR

rank

RBI

rank

AVG

rank

SB

rank

AVG rank

Guru

23.523

5

8.525

7

24.807

2

0.032

1

6.630

1

3.2

CAIRO

24.577

7

8.108

2

24.186

1

0.039

4

7.054

7

4.2

GQE

20.954

1

8.377

4

26.829

7

0.040

5

6.889

5

4.4

ESPN

23.950

6

8.066

1

25.113

3

0.042

12

6.729

3

5

Larson

21.284

2

8.153

3

26.316

5

0.041

11

6.779

4

5

CBS Sportsline

25.414

10

8.506

6

26.788

6

0.036

2

6.720

2

5.2

Fangraphs Fans

26.797

12

8.488

5

25.755

4

0.037

3

6.929

6

6

Marcel

23.299

4

9.032

10

27.048

8

0.040

6

7.746

10

7.6

Rosenheck

21.612

3

9.455

12

28.568

10

0.040

7

7.257

8

8

ZIPS

25.408

9

8.918

8

32.646

12

0.041

10

7.532

9

9.6

Rotochamp

26.103

11

8.929

9

28.334

9

0.041

8

7.973

12

9.8

Steamer

24.721

8

9.331

11

28.791

11

0.041

9

7.961

11

10

This table presents the RMSE of the forecasts.  The RMSE is essentially the average forecast error, in absolute value, and is of the units of the statistic.  So, for HRs, each system is between 8 and 10 HR off for each player, on average.  Here, we see that the GURU forecasts have the lowest RMSE overall, doing best at projecting AVG and SBs.  John Grenci’s GQE forecast does the best for Runs, ESPN for HRs, and CAIRO’s for RBIs, respectively.

But what if a projection is great at ranking players but is terrible at projecting their actual output? Such a projection would still hold value because of the information contained in the player-to-player variation. In fact, this information is probably more valuable than the actual level of output for the player.

Hitter Fit Results:

system

R

rank

HR

rank

RBI

rank

AVG

rank

SB

rank

AVG rank

CBS Sportsline

0.336

5

0.430

5

0.388

1

0.303

6

0.602

1

3.6

Larson

0.375

3

0.438

1

0.381

2

0.291

9

0.560

7

4.4

Rosenheck

0.377

2

0.432

3

0.376

4

0.309

5

0.555

8

4.4

Fangraphs Fans

0.322

7

0.432

2

0.298

9

0.312

3

0.602

2

4.6

Steamer

0.326

6

0.431

4

0.356

5

0.339

1

0.545

10

5.2

ESPN

0.345

4

0.424

6

0.380

3

0.225

11

0.567

6

6

ZIPS

0.240

11

0.398

7

0.310

7

0.338

2

0.576

4

6.2

GQE

0.390

1

0.340

12

0.341

6

0.146

12

0.568

5

7.2

Rotochamp

0.270

8

0.360

10

0.291

10

0.277

10

0.586

3

8.2

Guru

0.268

9

0.343

11

0.300

8

0.298

7

0.550

9

8.8

CAIRO

0.204

12

0.369

8

0.261

12

0.309

4

0.500

12

9.6

Marcel

0.247

10

0.364

9

0.280

11

0.291

8

0.522

11

9.8

 

This table is the r^2 of the simple regression: actual=b(1)+b(2)*forecast+e.  The b(1) term captures ex-post bias, allowing b(2) to better capture the information content in the forecast.  Here, we see that while CBS Sportsline is in the middle of the pack when it comes to accuracy, its far and away the best at measuring player-to-player variation, with the best SB and RBI projections.  The Larson projections were best for HRs, GQE for Runs, and Steamer for AVG, respectively.

In general, the Fans, Rosenheck, and Larson projections are all essentially averages of projections.  For hitters, this strategy seems to do the best, and has held for the last three years.

Detailed Forecast Analysis: Pitchers

We can perform the same analysis for pitchers.

Pitcher RMSE Results:

system

W

rank

ERA

rank

WHIP

rank

SO

rank

AVG rank

Rosenheck

3.439

1

1.402

2

0.280

5

38.343

1

2.25

Steamer

3.777

4

1.368

1

0.271

1

41.520

4

2.5

Larson

3.682

3

1.403

3

0.273

2

40.938

3

2.75

Smith

3.606

2

1.428

5

0.290

7

38.750

2

4

ZIPS

4.376

12

1.413

4

0.275

3

44.135

8

6.75

Marcel

3.932

6

1.458

8

0.291

8

42.249

5

6.75

Guru

4.000

8

1.438

6

0.352

11

42.681

6

7.75

CAIRO

3.901

5

1.502

11

0.426

12

43.908

7

8.75

ESPN

4.158

10

1.502

10

0.279

4

46.271

11

8.75

Rotochamp

4.103

9

1.456

7

0.295

9

45.431

10

8.75

Fangraphs Fans

4.193

11

1.476

9

0.283

6

46.558

12

9.5

CBS Sportsline

3.968

7

1.523

12

0.301

10

44.182

9

9.5

Here, we see that Dan Rosenheck’s projections are best in terms of RMSE, leading in Wins and Strikeouts.  Steamer takes the other two categories—ERA and WHIP.  As before, overall fit is probably more interesting, and in the table below, we see the results.

system

W

rank

ERA

rank

WHIP

rank

SO

rank

AVG rank

Rosenheck

0.559

1

0.118

1

0.183

2

0.559

1

1.25

Steamer

0.495

5

0.113

2

0.187

1

0.503

6

3.5

Larson

0.502

4

0.102

3

0.165

3

0.506

5

3.75

CBS Sportsline

0.545

2

0.071

8

0.066

10

0.558

2

5.5

Smith

0.516

3

0.075

7

0.078

9

0.544

3

5.5

ESPN

0.468

7

0.060

9

0.153

5

0.517

4

6.25

Fangraphs Fans

0.479

6

0.081

6

0.136

6

0.500

7

6.25

Rotochamp

0.460

8

0.093

5

0.098

7

0.475

8

7

ZIPS

0.373

12

0.095

4

0.155

4

0.442

11

7.75

Marcel

0.430

10

0.045

12

0.081

8

0.459

9

9.75

Guru

0.426

11

0.045

11

0.059

11

0.455

10

10.8

CAIRO

0.434

9

0.058

10

0.035

12

0.428

12

10.8

 

The Rosenheck projections turn out to not only be the most accurate, but also capture the most player-to-player variation.  For a projection system to do both the best is remarkable, and Dan’s projections lead in Wins, ERA, SO, and took 2nd in WHIP, losing to Steamer.  Clearly, Dan knows something that the rest of us don’t.

I asked Dan the secret to his success, and he suggested that he uses some combination of structural modeling and forecast averaging to arrive at his projections.  This is in contrast to the Larson projections, which are weighted averages of forecasts but contain no structural models, or the Steamer projections which is entirely structural modeling.

Key Points

This is the third year that I have done evaluations of set of forecasts.  For hitters, a single structural forecast does not appear to ever do very well.  However, these structural forecasts can be used in a weighted average of forecasts that seems to do better than any individual one.

On the other hand, pitchers appear to be much more forecastable by an individual system.  Averaging still works, but people have forecast pitchers well without averaging.

Congratulations to Dan on his forecasting dominance in 2012, and best of luck to everyone in 2013.

All of the non-proprietary numbers in this analysis can be found at my little data repository website found at http://www.bbprojectionproject.com.  I encourage everyone to submit projections for consideration in 2013!




Print This Post



19 Responses to “Evaluating 2012 Projections”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Hi Will,

    Thanks for doing this study again, and for your kind words. I would start out by saying that I don’t think this is really a fair fight. As you note, to compile my projections, I rely heavily on the other systems you list for different components, and then add in additional sources of information (e.g. external team defense forecasts or PitchFX data) that I don’t think they include. So I get to stand on the shoulders of what Steamer or ZiPS or whoever is doing, but they don’t get to use my extra special sauce. I am sure that if I were asked to produce forecasts before the other systems are released, or without looking at them, my results would lag far behind theirs.

    That said, it’s good to see that the effort I put in to improve on a simple average of others’ projections seems to have squeezed out some extra accuracy, particularly on the pitching side. We’ll get a better idea after 2013 of whether I’ve actually got something here or just got lucky.

    Dan Rosenheck
    Sports editor, The Economist

    +10 Vote -1 Vote +1

  2. J. Cross says:

    Nicely done.

    btw, we’re now officially claiming Dan (once a Saint Ann’s Steamer himself) as one of our own and deeming his projections to be Steamer projections too.

    Vote -1 Vote +1

  3. John Grenci says:

    Will, thanks very much for doing this. And Dan, you make a very good point regarding using bits and pieces of other projection systemsf and its always great to see Newton quoted.

    I wonder if there might be a way to work around this. If a projection system does not use averages of some sort of other systems, then it should have some predictions that are either the max or min for some players for some categories. ie, any system will have somebody scoring the least number of runs (compared to the other systems).

    now, it is true that inherent to some systems, there may be a dampening effect that guards against extreme values. however, it seems to me that a system that is not dependant on any ohter systems will have its share of extreme values. so, in addtion to assessing by the few ways mentioned above, is there a reasonable way to assess by looking at the extreme picks?

    there are various ways this can be done. is it fair? of course, as participants are graded on only their extreme picks, they are up against the more ‘typical’ pick, and likely not to do as well (I would guess). I would say it is akin to having to pick at least two teams to win 100 games in major league baseball. you pick the angels. somebody else picks them to win 90. well, that somebody else is the favorite to be closer if they are only projected to win 90.

    now, I am not saying you have to intentionally make extreme picks. I am saying that any system that does not use any other system will inherently have some exreme picks. if a system does not have enough (that number to be decided upon) extreme pikcs, then it would not qualify to be assessed by the ‘extreme’ method.

    okay, i feel like I rambled. but it would yet another way to assess and to replace the two methods above, and I think it would be a fairer way to assess in light of the fact that it is FAR too easy for anybody to use bits and pieces of other systems.

    Vote -1 Vote +1

  4. #SSAC13 says:

    Dan, any chance you’d make your 2013 projections publicly available this year? I have some smaht kids to beat in the MIT Sloan Sports Conference Organizer’s league

    Vote -1 Vote +1

  5. Dan Rosenheck says:

    Tru dat, Jared. Whoever knew such an artsy school would produce such quantitative brilliance. :)

    My personal recommendation would be that in the future, this comparison be broken down into two categories: aggregators, and projections that are 100% independently generated. I actually do produce one set of forecasts that doesn’t rely on anybody else, but then I weight it just like I weight everyone else’s. Perhaps next year I’ll submit both sets of numbers. I would expect my “solo” sheet to finish very close to the bottom.

    #SSAC13, no, they’re a trade secret until the end of the season. But if you promise not to distribute them, I’m happy to email you a copy once I compile them. That said, I will be presenting some of the research on which the forecasts are based during my EOS talk at the Sloan conference, so some of the secret sauce will be spilling out very soon.

    By the way, there are a few spots left in my own hyper-competitive high-stakes fantasy league. Let me know if you’re interested! :)

    Dan

    Vote -1 Vote +1

  6. Will says:

    Maybe I’ll have two categories next year: Structural forecasts and “do whatever it takes” forecasts that can incorporate forecast averaging or whatever. I’d invite people to submit both, if they’d like. I’ll have to figure out how to handle things like CBS or ESPN who are opaque about how they create their forecasts, and almost certainly do so based on what other folks have done.

    It also might be interesting to look at who does best at what types of players (early vs mid vs late career players, for example).

    Vote -1 Vote +1

  7. #SSAC13 says:

    Dan, promise not to distribute. I’m organizing the First Pitch Case competition this year, feel free to join us Friday and come up to say hi. If not I’ll catch your EOS talk Saturday!

    Vote -1 Vote +1

  8. RotoChamp says:

    I appreciate your hard work, Will, and don’t want to be critical. However, where did you get our projection set in 2012? I fear that you grabbed it from FanGraphs and not our website (which is constantly updated). We submitted to FG in February last year before anybody else and didn’t get updated on the site. For example, I noticed in the data on your site that you had us projecting Ryan Braun for 380 ABs, which was our assessment in February before he was cleared of the possible PED suspension.

    I hate to see our February projections being compared to late March projections by everybody else.

    FantasyPros just released their evaluations of 50 experts in 2012, which were based on the latest rankings from our website, and we ranked 3rd overall. Granted, they are evaluating rankings and not the raw projections, but our rankings are based on our projections, so there should be some correlation.

    http://www.fantasypros.com/2013/02/most-accurate-fantasy-baseball-experts/

    I’d love for you to include our 2013 projections, but grab them from our website at the same time you grab CBS, FanGraphs, etc…..

    Also, it might be interesting to see how the Composite projections, which include 5 reputable systems, perform against the individual sets.

    Vote -1 Vote +1

  9. tbonemacd says:

    I’m wondering what the verdict is regarding the value of these projection systems compared to the typical hardcore fans who look at the numbers, read the scouting reports and make a prediction (guess) about what his favorite players/team will do that year. Would that be Fangraph Fans or are those piggybacking off some of the other systems? Honestly, just asking.

    Vote -1 Vote +1

  10. Will Larson says:

    @roto champ: I got them from your website. It sounds like I used a non-final vintage of your forecasts. For that, I apologize, but it’s difficult to keep track of the release schedules of everyone’s different projections if they aren’t submitting them to me themselves. Care to send along your final 2012 projections so I can run those instead of your early-pre-season projections? I’m sure it will improve your standing.

    I’m also sure that your composite projection will do well. I’d encourage you to send that as well!

    @tbonemacd: My hunch is that individually, fans won’t do a very good job, but that as a whole, fans actually come up with a good (albeit optimistic when it comes to absolute production) ranking of players. Also, you’re right–it’s likely that some fans are using information from others’ forecasts when making their forecasts.

    Vote -1 Vote +1

  11. Jaker says:

    Awesome work!

    What I really find interesting is just how well the FG Fans projections performed. For all the recent articles on FG criticizing how fans tend to overestimate in comparison to ZIPs, it’s interesting to see FG Fans faired better than ZIPs in 2012.

    Vote -1 Vote +1

  12. ncklm says:

    Will, are you publishing your 2013 projections anywhere?

    Vote -1 Vote +1

  13. Sam Swindell says:

    Hey Dan,

    Any chance you could email me a copy of this year’s projections? I would never distribute them! Just looking for an edge in my league this year.

    Thanks a ton,
    Sam Swindell

    Vote -1 Vote +1

  14. Dan Rosenheck says:

    I won’t have them ready until 3/28. Email me then and I’m happy to share.

    Vote -1 Vote +1

  15. Sam Swindell says:

    Thanks! What is your email?

    Vote -1 Vote +1

  16. Shawn says:

    Will,

    Do you provide your projections to the public?

    Vote -1 Vote +1

  17. kolatch says:

    Will,

    Along the same lines as Shawn’s question, in 2010 you provided some of the weights you planned to use to form your own projections.
    Are you using those weights or do yo adjust them ever year?
    Does CBS still provide no additional information?
    When forming your own projections do you look at weights for all of the sites that provide you with their numbers or just the sites you can also get free projections from for the upcoming year?

    Do you adjust for additional factors after you aggregate as Dan does or is it just a weighted average?

    Would you mind sharing which sites projections you actually used in your numbers for last year and if possible which sites you will be using for your 2013 projections?

    Would you mind sharing the weights you will use for the upcoming season? Or sharing your projections?

    Thanks for all your research.

    Vote -1 Vote +1

  18. evo34 says:

    New rule: a projection “system” shouldn’t merit a name if is a weighted average of other, original projection systems.

    Vote -1 Vote +1

  19. Rico says:

    If you were using Roto Champ software to calculate custom auction values for your league, which projection system would you use given their options: Composite (not sure what it’s based on, it’s not an average of the following), ZIPS, RazzSteamer, RotoChamp, Cairo, FanGraphs fans. If you’d use combinations such as RazzSteamer for pitchers and composite for hitters (my theory), please state.

    Vote -1 Vote +1