## The Projections

As David announced this morning, the CHONE projections for 2009 are now available on the site. Sean Smith does great work with these, and they’ve been proven to be among the most accurate forecasting systems out there. Unlike the very bare bones Marcels (which are still quite good as a baseline), the CHONE projections include park effects and minor league data, which improve their accuracy a bit, especially for players where we have minimal major league data.

With that announcement, we now have three projections on each player’s page here on the site – Marcel, CHONE, and the Bill James projections from Baseball Info Solutions. One of the things that I’ve seen people do quite a bit is to take an average of the different projection systems available – especially Marcel and the James projections, since some feel Marcel is too pessimistic and James is too optimistic. However, the key is to understand the context of what each system is projecting. I figured today’s announcement would be a good starting point for a look at what I mean.

As an example, let’s take a look at Wladimir Balentien. He’s 24, has had success in the minors, but failed miserably in his first few hundred major league plate appearances last year. Here are the three forecasts for him for 2009 from the projections we have on the site:

Marcel: .239/.298/.402

CHONE: .231/.302/.409

James: .239/.312/.444

All three systems see a guy who isn’t going to hit for a high average due to his contact problems, but they vary a bit on how often he’ll draw a walk and how much power he’ll show. If you just look at the raw projections, you’d say that Marcel and CHONE aren’t big fans, but that the Bill James projections think he’s got some value. After all, it’s a 40 point OPS gap between James and CHONE.

However, take a look at this:

Marcel: 301 AB, -8.3 wRAA, -13.78 wRAA per 500 AB

CHONE: 472 AB, -6.3 wRAA, -6.67 wRAA per 500 AB

James: 426 AB, -6.6 wRAA, -7.74 wRAA per 500 AB

wRAA, of course, is the linear weights runs above (or in this case, below) a league average hitter. CHONE is actually more optimistic about Balentien than James’ projections. The entirity of the 40 point OPS difference is the projected level of offense in baseball next year. It has nothing to do with Balentien, and everything to do with what the various systems forecast as league average offense in 2009. Marcel is the one who really hates Balentien, but that’s to be expected, considering that the only data going in is his 2008 major league performance, while the other two factor in his minor league success.

You can’t look at just the BA/OBP/SLG numbers without adjusting for the projected context of that particular system. For whatever reason (my guess is that they’re not handling minor league translations very well and are overenthusiastic about young players, but that would take some more research to confirm), the Bill James projections always come out very high in forecast offense. The CHONE and Marcel projections usually match the upcoming year’s offensive level a bit better.

So, when looking at the various projection systems and the rate stats of the players, keep the environments that are being projected in mind. The value of a player isn’t in his rate stats, but in the value he provides above the baseline.

Print This Post

Can you guys include a league average wOBA somewhere for these then?

Dave,

Can you post what the league average you used for the three systems? Reverse-engineering your numbers, I get around this (wOBA):

.320 Chone

.330 Marcel

.340 James

Also, how did you calculate the league average? Did you use the same players, same depth chart, for each of the three systems? Or did you use whatever the projected PA each system had for its players?

I think the latter is what would explain the Chone numbers.

For the league average I just calculated whatever it was based on the entire projection set, except for CHONE where I used only players who had made a major league appearance.

I’m just running this against the database pretty much: Sum([BB]-nz([IBB])+nz([HBP])+[H])/Sum([AB]+[BB]-nz([IBB])+nz([HBP])+nz([SF])

Great point Dave.

The baseline wOBA for each system are:

CHONE (for players who have appeared in the majors) – .327

Bill James – .346

Marcel – .332

So especially when looking at wRC and wRAA, it’s not quite a fair comparison. wOBA should be a better comparison between the projections, just like AVG/OBP/SLG or any of the rate stats.

If wRAA is being calculated in the projections the same way its calculated on the site, it should be a fair comparison, since its based on wOBA, right? wRC, sure, that still depends on the environment, but wRAA should adjust for that, no?

Well, I guess it depends how optimistic you think the Bill James projections are, because I’m not changing the actual weights for wOBA based on the projection system, just the baseline. So in CHONE/Bill James, a single is still worth .9-ish runs.

I personally don’t think Bill James is giving all his players a .015 bonus to wOBA and it’s really just leaving off the bottom some percent of all players.

So I think, if you think Bill James’ projections are made for a “real world” setting (.330-ish baseline), then they’re losing out since the baseline is so high. If you think they’re made for a mythical world that’s .015 runs higher, then wRAA is ok to compare.

This sounds like something worthy of further study…

Yeah, it’d definitely be worth a look, maybe they really are giving all players a .015-ish wOBA bonus.

If that’s the case, then it doesn’t make a whole lot of sense and you can’t really compare rate stats with the Bill James projection system.

The Bill James projections only include about 450 players, so they are leaving a ton of guys out.

what is the CHONE avg for players who haven’t appeared in the majors? why the distinction?

If you include players that haven’t yet played in the majors (and there are about 1000 of them in the CHONE projections), you get a baseline of .306 wOBA.

I just didn’t think it was fair to include the 1000 or so major league projected A+, AA, and AAA players in the baseline.

Right, you can’t include it the way that David is summing it, because then you’d end up 300,000 PA instead of 180,000 PA that it should be.

The ONLY fair way to set the league baselines is to give each player the PA that the forecasting system actually expects, such that the total is around 180,000 PA. Or only select as many players such that you get to 180,000 PA.

In Chone’s case, you’d have to simply strike anyone who never played in MLB because his PA forecast for those players are not real.

I think that Chone and I probably use an almost identical baseline.

***

I wrote this to Bill James:

Bill,

The forecasts that BIS has at Fangraphs, if you add up the stats in a weighted fashion, presumes a run environment of about 0.50 runs per game higher than last year, which is quite an enormous difference.

Are you aware of this?

As an example, if we take some semi-random great hitters for 2008 (Pujols, Chipper, Manny, Berkman, Teixeira, ARod, Hanley, Youk, Quentin), three forecasting systems give them this OBP/SLG average:

0.401 0.548 Bill James

0.397 0.529 Chone / Rally

0.392 0.527 Marcel / Tango

This repeats itself across the whole group of players as well. If you work out all your players, if you give them whatever depth chart you want, you will find that your hitters are forecasted to produce some 0.40 to 0.50 more runs per game than last year.

Tom

I wonder what the league wOBA is for James’ pitcher projections. My guess is that he’s optimist on them, too, for something like a .325 environment.

Since we have three different projections with three different methods, why not use this season (and/or past seasons) to put them head-to-head-to-head? This seems like a really worthwhile (and competitively interesting) undertaking.

Which subsets of players are most accurately predicted with which models? Are certain models doing a better job predicting high/low end players? By positions? Should we be more or less confident in the estimates of certain players? Even simple stuff like residual-vs-fitted plots would be interesting to look at heteroskedasticity type problems.

It would be best if we knew the exact models, but we really don’t need them since we have the two most important things for model evaluation, y-observed and y-predicted. Seems interesting to be thinking about the uncertainty associated with each of the three projections, so that we can start thinking about classical or Bayesian testing.

On the website and in other more stat-y fora, we talk about upsides/downsides/potential… but really we’re thinking about risk, and the variance associated with certain estimates. And given that we’re thinking about players from GM’s/business perspective, we want to start getting a handle on thinking about our “portfolio” of players.

I’m new’ish to the website, so maybe this has been looked at before. If so, apologies, and thanks for the interesting reading!

Stephen

“theyâ€™ve been proven to be among the most accurate forecasting systems out there”

Not that I doubt that it is, but to avoid arguments about forecasts when discussing 2009, I use the freely available public projections, and I would love to use CHONE and then be able to point any doubters to a website which proves CHONE to be among the most accurate. I would also be curious to see who else it was compared with, to see if there are other sources I’m not aware about. Thanks.

http://www.insidethebook.com/ee/index.php/site/comments/evaluating_the_2008_forecasting_systems/#15

I’d like to thank everyone who publicly (and for free) releases their projections. I can only begin to imagine the work that goes into them. Thanks so much.

I was wondering why it was Marcel was predicting Taylor Teagarden to be the best-hitting catcher in baseball last year, but a closer inspection of his projection revealed that Marcel was predicting a 22.5 K%, significantly less than his 40.4 in the majors last year his career-best 30.5 in the minors a couple years ago. I didn’t realize until this article that Marcel doesn’t account for minor league performance, which explains the extreme regression toward the mean on Marcel’s part.

Of course, I meant to say that Marcel has him projected to be the best-hitting catcher NEXT YEAR, not last year.

http://www.tangotiger.net/marcel/

Any chance u guys could post the projected league averages and run environment? Thanks!

I asked David to calculate the wOBA for players that are common in Marcel, James, Chone, and that Marcel forecasts for at least 300 PA. David said:

==========

So, out of the 368 players that match:

wOBA avg(wOBA)

Bill James .349 .342

CHONE .340 .337

Marcel .337 .334

===========

The first column is the weighted wOBA, based on however many PA each forecaster thinks the 368 players are going to get. The second column is just the simple average of those same players.

Chone is forecasting 3 wOBA points more than me, among these players.

Marcel is set for runs per 27 outs of 4.7. So, that would put Chone at 4.8. I’m fine with that.

James is 8 or 12 wOBA ahead of Marcel, which would make his run environment around 5.0 runs per game.

What about the pitchers ?

Is there a reason CHONE hates Kelly Shoppach so much compared to MARCEL and Bill James?

James – .352 wOBA. 2.1 wRAA

Marcel- .348 wOBA 5.8 wRAA

CHONE- .329 wOBA. 0.6 wRAA

It might make a very interesting post to talk about the differences between the systems. Find the players in each system with the largest difference between their projections and Marcels projections and then try to deduce what the projection system knows that Marcels doesn’t. If not helpful, it would at least be fun and interesting.

I agree with Bryan. That post might even be able to include some rule of thumb pointers that generally explain the reason why the systems might differ on their projections of players

Here is something else that I do not understand. For Randy Winn the three systems project the following:

Bill James – PA 640, wOBA .340, wRC 73.8

Chone – PA 615, wOBA .336, wRC 78.2

Marcel – PA 599, wOBA .329, wRC 70.6

I can understand how Marcel’s wRC projection is lower then Chone’s since Marcel projects both PA and wOBA to also be lower then Chone. But, I can not understand how Bill James’ wRC is also lower the Chone’s since Bill James projects both PA and wOBA to actually be higher then Chone. Can someone explain this to me?

I think I have found the answer to my question.

Per wRC and wRAA by David Appelman – December 12, 2008

http://www.fangraphs.com/blogs/index.php/wrc-and-wraa

wRC – This is total runs created based of wOBA. It is calculated as (((wOBA – lgwOBA) / wOBAScale) + (lgR/PA)) * PA

Evidently Bill James projects a lower lgR/PA then Chone even though he projects a higher lgwOBA then Chone. Thus in Bill James’ projections Randy Winn will produce a lower wRC even though he has a higher wOBA and PA.

This does bring another question to my mind though. Are pitchers included in NL lgwOBA? If they are this would seem to me to be a mistake. What would be the point of knowing how a positon players hitting stacked up to a league average that included pitchers?

Once Again I think I have found the anwser to my question. It looks like Pitchers have to be included for the purpose of calculating wRC. This does though mean that a NL positon player with a league average wOBA is a below average hitting NL position player.

Does anyone know roughly what the average wOBA/wRC for NL Pitchers (who hit) is?

Does this http://www.lookoutlanding.com/2009/2/2/743444/pitcher-s-hitting-continue answer your question?

?????? ??? ???????????? ???????? ?????? :)