Log In - Register?
by David Appelman - February 3, 2011
Tangotiger just released the 2011 Marcel Projections. They’re also up and running on the FanGraphs player pages and in our sortable projection section too!
Geeze, if Marcel is right, it’s going to be one brutal season for the Blue Jays.
The projections, for example, show Albert Pujols as the HR leader with 34 – so I’m hoping they’re a bit on the conservative side.
LOL, if you RTFA there’s an explanation linked from the first page stating that the projections are definitively *not* projecting Pujols to lead in HR with 34.
That’s still what the projection says, whether or not the creator stands by them.
No, it isn’t what the projection says. It says that Albert Pujols, on average, will hit 34 home runs this season, which is more than any other hitter’s median/mean outcome. It is almost guaranteed that someone in the Major Leagues will hit more than 34 home runs, because approximately half of the players will outperform their projections. It’s just like projecting teams’ finishes in a division: the teams’ median outcomes (which are what you are projecting) will probably be quite a bit closer than the actual results of the division at the end of the season, in which some teams will have underperformed and some will have overperformed. For instance, team A might be projected to win the most games on average, but that average would be only, say, 89 games. However, if you simulate the season, the winner of the division might win 92 games on average, but the division winner will not be the same team every time. It’s a crucial point that you absolutely have to understand when looking at a list of projections.
Austin: wonderful post, thank you for being so clear and concise.
Publish the distributions!
Let’s see means + standard errors then. If you want people to understand, make it easier. If you wanted to get creative I’m sure you could use R to gin up visualizations that could be attached to each projection value.
Say I click on Pujols .414 wOBA Marcel projection – I could see a distribution of the simulated results rather than just the simple mean. I’m sure this would be useful for fantasy as well. Rather than reading an article praising a player’s upside I could just note it in the distribution.
Just my two cents. Keep up the great work!
I actually thought the Jays projections were pretty decent, considering Marcel hates everyone. The pitchers projections looked solid, especially Drabek
Awesome, I literally debated downloading the entire file earlier. But decided against it, because I figured they’d be up here sometime.
I’m not saying that system is somehow biased or something, but it has every single one of the Red Sox 9 starters finishing below both their Bill James and fan projections in terms of wOBA, which seems crazy to me.
Marcel projections are uber-conservative every season.
I’d bet you ten dollars that Marels is the most accurate of the three systems you just mentioned. Fans and Bill James are both way too genereous.
And I’ll bet you ten dollars that Adrian Gonzalez has a higher SLG% than .495.
And I’ll bet anybody 10 dollars that more than 2 players hit over 100RBIs and the league leader hits more than 104.
Matt C, you are really missing the point. Everyone agrees the the MLB leader will likely have more than 104 RBIs. However, predicting which specific player should be expected to have more than 104 RBIs is the trick.
I guess I am missing the point because I don’t see what is so hard about predicting guys like Pujols and Cabrera to have more than that when they have literally done it every full year of their career. Pujols has only had less than 117RBIs once in his career, and Cabrera’s low mark was 103 and other than that he has had 112 every other year. In Cabrera’s case he is now entering the prime of his career and coming off a great season(which wasn’t inflated due to a fluke BABIP or HR/FB%) he’s suddenly going to produce the worst numbers of his life? I’m sorry but that just doesn’t make much sense to me.
Oh and for the record I know RBIs are basically worthless when evaluating a player I just brought them up because out of all the offensive categories that one seemed like it was the most conservative to me.
Just take 100AB’s and 10 home runs, and .030 BA/OBP points off the Bill James projections and you’ve got this years MARCEL
I know the players with low Rel aren’t expected to get their projections, but the 3rd, 4th, and 5th ranked SS’s in wOBA make me laugh.
There has to be an error going on here with the projections of minor leaguers. Marcel is projecting Ciriaco for a .340+ OBP when he was at .281 and .319 the last two seasons in AA/AAA.
Look at the reliability ratings – they are basically zero. Marcel doesn’t include minor league numbers, so it is only using age and major league performance as the variables. For anyone with a reliability score under .3 or so, you can probably ignore Marcel – it just doesn’t have enough information to make a good projection at that point.
Oh, I thought it was using minor league equivalents like CAIRO but it’s just using the league mean. That makes sense.
Wow… after looking at a few projections I’ve come to the come to the conclusion that Marcel hates everyone.
I’m not sure it’s correct to say Marcel “hates” anyone. Marcel is, as Tango says, the most simplistic projection system that could be acceptable. All it does is weight performance for the past three years, regress to the mean, and toss in an age factor.
More intelligent systems probably regress less, and thus appear to “hate” good players less.
You think that the fans are being optimistic about Heyward, and then Marcel goes and predicts him to be the 10th best hitter in the league.
but he’s still more conservative than James or the fans
It’s not a ‘he’, it’s a machine.
Tango didn’t sit down and actually make these projections, he just ran a program that comes up with the numbers.
Seeing the posts here, and it seems that Marcel is “new” to alot of people.
Do me a favor, and before criticizing, read this:
And then read this:
And if you STILL have a problem, then you can criticize. I’d rather hear from an informed critic than a reactionary one.
If I was going to critique it would be this, while Marcel does a good job of capturing the overall ability of the league, it is completely unable (by definition) to predict breakout or bust performances or new contexts. Considering that there are dozens upon dozens of these each year the projections seem inaccurate for a number of players.
Some of them just flat don’t make sense, for example, why are Mark Teixeira and Adrian Gonzalez losing so much of their power despite being in their primes, and in Gonzalez’s case a new ballpark that caters to power much more.
It removes the outliers as much as possible. So no, you’ll never see breakouts or busts predicted by MARCEL. MARCEL is not what you want to use for fantasy.
And yet, if you test it historically, it DOES make sense.
Parks are not included. Marcel does what it says it does. If it’s not enough, just use something else.
I will say that Marcel does just as well as everything else, so you may get more flash, but that doesn’t mean you will be better off.
Was anyone actually even criticizing Marcel in this thread…?
Okay, an example for my own edification, if someone will indulge:
Pedro Ciriaco’s had 6 PA in 2010 with a 1.500 OPS, his only ML stats
Brent Dlugach had 3 PA in 2009 with a 0.000 OPS, his only ML stats
Dlugach is ~2 years older (born 83 vs 85).
Ciriaco is projected for a .345 WOBA, Dlugach a .323 WOBA
So, since there are no other factors included, the small age gap and 6 pa of more recent 1.500 ops ball results in a .022 jump in projected woba?
Seems intuitively like the values should be nearly identical since there’s virtually no major league performance for either player, as represented by the low confidence number. What am I missing?
Everything gets rounded. So, even if you have two guys that are close, if you are working off such a small base of PA, then an extra HR (say someone is at 0.6 HR and another is at 0.4 HR) here that another guy doesn’t get, and you can get a 20 point difference. Well, a 1 HR difference given 200 PA is a 10 point difference in wOBA. Like I said, you’ve got rounding in all the stats.
In any case, don’t bother looking at anything with an r under .50.
Also, since the low reliability projections are so tenuous maybe you could include a reliability filter on the MARCEL projections page, similar to the min pa/ip on the leaderboards.
Sort by reliability, delete as many rows as you’d like, resort by whatever statistic you’d like. voila.
Ah, didn’t see you were talking about the projections page, not the csv file. disregard.
Just a question about wOBA differences between the downloadable file and the projections shown in FG. Leftfielders, for example:
Hamilton: .382 in FG, .370 in the file
Holliday: .382 in FG, .370 in the file
Braun: .378 in FG, .370 in the file
Am I missing something?
Hanley Ramirez is 5th in wOBA on the 2011 MARCEL sortable projections page and 13th in wOBA in the downloadable file from Tango’s site. There are other noticeable discrepancies as well.
David calculates wOBA slightly differently. And in my csv file, I rounded to two decimal places.
The individual numbers should show NO discrepancies.
Most of the statistics are easy enough to figure out, but there are a couple that I am unfamiliar with. What is bsrER (pitching) and mSH (batting)?
bsr = BaseRuns …. if you are not familiar with it, ignore it.
SH = sacrifice bunts… should appear on all websites, no?
To Intricatenick, and anyone else who might be able to assist: wouldn’t any distribution graph just be a bell curve, and have very little usefulness?
To anyone: how is it if the projections use the last three years, that, Pujols RBI total, for example, is lower than any of those three years? Is the age factor that big a factor? If it is, shouldn’t younger guys have higher projections, as they enter their peak?
Regression toward the mean.
I haven’t looked at the projections for underwhelming players. Is there a calculated progression to the mean for those who fall under that mean?
The distribution would be Gaussian sure, but the dispersion should be different for different players. Consistency in previous seasons should lead to lower variance although regression will play a role too.
Someone with exactly league average performace the last three years (age 26-28) would have a pretty tight bell curve around his projection.
Let me add, or amend, after reading tangotiger’s material (albeit quickly) — if Marcel projects a cohort, what’s it’s real use? That is, what’s to gain other than saying “measure many and you’ll get an average that is, well, average”
I don’t understand.
Are you asking what’s the point of forecasting 100 September callup rookies from last year?
That’s why I put in a reliability column. If you don’t find any use for those (and I agree that most people won’t), then just select above whatever threshold you want. I suggest r=.50 or better.
No im asking whats the point of saying “the average good power hitting ball player will hit 28.6 home runs” ( or whatever number it actually is).
I was impressed in the material that gave the 2006 comparisons that in a ranged group MARCEL came so close to accurate, I’m just struggling to find a purpose to the system that can make those projections. Not necessarily a purpose for fantasy, in general as a baseball fan.
I’m not trying to denigrate MARCEL, I’m trying to comprehend it in a more meaningful way.
The purpose of the Marcels is to give the absolute minimum standard that any forecasting system should aspire to beat.
It purposely does as little as possible (ignores Minor leagues, ignores changing parks, ignores potential breakthrough performance, ignores draft information).
And then it says: “Alright PECOTA, alright CHONE, alright MGL and Voros, and Oliver and Shandler, and Bill James and ZiPS…. this is what you have to beat.”
It provides a benchmark for comparison. Interestingly, other than the expected (minor leaguers), Marcel does just as well as all of them.
So, it’s all well and fine to say that you are going to have a better forecasting system, but Marcel is here to give us a comparison point. If Marcel is a .500 forecasting system, all the others are between .480 and .530.
I WANT Marcel to be the worst. But, there’s only so much you can slice and dice the data.
That’s the purpose of Marcel.
Regarding other projection systems, it was stated on a blog that JEH had provided projections to FanGraphs. Given his excellent performance over the last two forecaster’s challenge, is it possible that his projections will be posted freely on this site in the near future?
Weird that Marcel predicted only 6 players to hit over 30hr’s, with the top being 34. Last year 18 players hit 30+ HR’s. I don’t really like these projections because they don’t take into account a player over-achieving, they’re just assuming that they will go with their career averages, I guess? Perhaps i’m missing something about these projections from Marcel.
You’re missing everything.
Of those 18 that hit 30+, the vast majority probably beat a reasonable 50th percentile projection going into the season. Of course out of the hundreds of players in the league, some will have better than expected years, some will have worse (the Gaussian curve idea). If those who hit 30+ HRs are mostly players who were in the 75th-99th percentile for expected performance, of course in retrospect, a system giving a 50th percentile projection will look low.
Marcel doesnt think at the end of the year, only 6 players will actually end up with 30 HR. It just thinks there are only 6 players whose 50th percentile projection is 30+ HRs. Just think of these as “over/unders (without park effects, etc)
One question I have though….does Marcel regress to the mean less for players with longer track records of performance? It seems somewhat intuitive that players with 10 years of consistent play should be regressed less than someone with just 1 or 2. Does MARCEL “play dumb” in this aspect as well?
Right, the more playing time, the less regression. Indeed, the amount of regression is described in the “reliability” column.
A rel=.88 means 12% of past performance was regressed to the league mean.
A rel=.05 means 95% of past performance was regressed to the league mean.
Hence, the reason that I say to ignore any forecast at rel=.50 or less.
I just find these projections pretty useless. To each their own though.
If you find that, then you will find that with ALL forecasts, bar none.
The only forecasts that are useless are those with a reliability of .50 or less.