For a few years now, I’ve been running a set of baseball player forecasts that are affectionately called “The Marcels”, or unaffectionately called the “Marcel The Monkey Forecasting System.” The idea behind The Marcels is that anyone can look at the back of baseball cards and come up with a decent estimate for the upcoming season.

For every player, Marcel does a three step process:
1. Look at the performance of the player over the last three years, giving more weight to the most recent seasons.
2. Regress the player’s performance toward the league mean, based on the number of plate appearances or innings pitched. The more data, the less you regress.
3. Apply an age adjustment.

There are many, many problems with this. That’s why I call it a monkey system. I developed the worst forecasting system that you could possibly accept. And it takes minutes to run. Any other forecasting system out there, that spends countless hours to design, develop and execute, better be able to beat this system.

Interlude – One Year Ago

Tangotiger – Thursday, January 27 2005 @ 10:43 PM EST

Look at all the guys forecasted for 28 to 30 home runs: Adrian Beltre, Gary Sheffield, Carlos Delgado, Mark Teixeira, Andruw Jones, Alfonso Soriano, Miguel Tejada, Todd Helton, Lance Berkman, Paul Konerko, Rafael Palmeiro, Jeromy Burnitz, Carlos Beltran. Half of those guys will hit more than 29 home runs, and half will hit less.

But, you, me, and everyone else has no idea who will hit 30 or 35 home runs. Bad luck, good luck, injuries, whatever … everything plays a role in this. Marcel’s best guess is that those 13 hitters will average 29 home runs. If you wanted Marcel to forecast number of home runs without attaching names to it, that’d be a lot easier, and the range would be wider. Think of these forecasts as over/unders.

Interlude – One Month Ago

TangoTiger – Wednesday, January 18 2006 @ 04:49 PM EST

It’s one year later. OK, so I decided to find out what those 13 guys I listed did. Here they are:

Player           HR
Andruw Jones     51 
Mark Teixeira    43 
Paul Konerko     40 
Alfonso Soriano  36 
Gary Sheffield   34 
Carlos Delgado   33 
Miguel Tejada    26 
Lance Berkman    24 
Jeromy Burnitz   24 
Todd Helton      20 
Adrian Beltre    19 
Rafael Palmeiro  18 
Carlos Beltran   16 

6 guys with more than 30, and 7 less than 28. Average? 29.5.


I think this illustrates the power and foolishness of forecasting. At the group level, forecasting systems are very powerful. Here we have results of 13 hitters that are as far apart as possible. And yet the mean of their actual results was virtually a match for the mean of the expectation. Did I just get lucky here? That first post last year was done because someone brought up Teixeira, who was forecast with 29 home runs.

So, let’s try a few more shall we? Let me throw a number out there: 90 RBIs. I swear I’m doing this as I’m writing this. There were 13 guys forecast between 88 and 92 RBIs, and this is how they did in 2005:

Player           RBI    
Mark Teixeira    144	
Andruw Jones     128	
Hideki Matsui    116	
Carlos Lee       114	
Jeff Kent        105	
Paul Konerko     100	
Aramis Ramirez    92	
Hank Blalock      92	
Jim Edmonds       89	
Adrian Beltre     87	
Todd Helton       79	
Carlos Beltran    78	
Barry Bonds       10	

Average? 95. Not bad. Of course, some of you are thinking that the overall average was brought down by Bonds, but others could say the overall average was brought up by a few guys. The median was 92.

The highest forecasted RBIs were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.

But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that’s not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.


Let’s look at pitchers. Marcel had 8 pitchers with 14 wins. Now, before I run this while you watch me, I have to say that one of the weakest part of the forecast would be wins. It’s heavily team-dependent, and Marcel doesn’t even look at a pitcher’s ERA to make its forecast. Marcel only looks at the wins totals of the prior years. Anyway, let’s see what happens.

Player           Wins
Bartolo Colon     21	
Mark Buehrle      16	
Johan Santana     16	
Pedro Martinez    15	
Roger Clemens     13	
Jason Schmidt     12	
Curt Schilling     8	
Russ Ortiz         5	

Average? 13. Median? 14. You just never know when injuries hit. OK, how about strikeouts? How about 150? There were 14 pitchers forecast with between 140 and 160 strikeouts. Let’s go to the tape.

Player           Strikeouts
Jake Peavy          216	
Carlos Zambrano     202	
Javier Vazquez      192	
Mark Prior          188	
Brandon Webb        172	
Barry Zito          171	
Josh Beckett        166	
Livan Hernandez     147	
Mark Clement        146	
Freddy Garcia       146	
Rich Harden         121	
Ted Lilly            96	
Kerry Wood           77	
Kelvim Escobar       63	

Average forecast? 151. Actual? 150. Look, I’m getting as bored as you are. Let me try one last one, probably the hardest of them all: saves. It’s team-dependent and manager-dependent. Eleven pitchers were forecast for 20 to 29 saves.

Player            Saves
Trevor Hoffman      43	
Jason Isringhausen  39	
Billy Wagner        38	
Francisco Cordero   37	
Eddie Guardado      36	
Jose Mesa           27	
Armando Benitez     19	
Keith Foulke        15	
Danny Kolb          11	
Troy Percival        8	
Jorge Julio          0

Average forecast? 23. Actual saves? 25.


So, what to do? Trust the forecast for a group of players, but don’t go betting on any one forecast. There’s not a single person in the whole world who can help you there. There’s no book, there’s no program, there’s nothing to help you with any single forecast. That’s why we play the darn game, and that’s why we love its drama.

While you keep your nose out of your spreadsheet, here’s the Marcel 2006 forecast for you to download.

