FanGraphs Baseball

Comments

RSS feed for comments on this post.

  1. Tango – The projections which you have clearly toiled over, putting in hours of personal attention until they were the perfect representation of your expectations for the coming season, do not tell me the players on my team/fantasy team will have as good a year as last year, which was not a fluke, or a much better year than last year, which was totally a fluke. Justify your existence.

    Comment by byron — March 5, 2012 @ 11:59 pm

  2. Hey. I’ve never really paid any attention to forecasting systems before but I am trying to expand my knowledge and understanding of the analysis here. As I was checking out the Marcels, I found myself questioning why I should even care about them at all. I’m over simplifying a bit for the sake of the post, but it seems this particular projection system doesn’t really attempt to be accurate with it’s singular projections. There seems to be a pattern of predicting a certain number that has an equal likelihood of the player over/under performing, thus on the whole of all projections being pretty much spot on to the mean, but with the individual outcomes varying wildly one way or the other. Now I understand an important factor here is regressing to the mean performance, but it seems you may as well just predict everyone to be league average(again over simplifying), and come up with the same general performance. What am I missing here? Or is this just meant to show a general futility in trying to predict baseball? Or is this stuff just too far over my head and I should just accept being a bonehead?
    Thanks
    P.S.: I did read through the links you posted on the other post.

    Comment by ?Confused? — March 6, 2012 @ 12:08 am

  3. You can’t honestly think that this is anywhere near the standard complaint…I understand what this projection system is all about and also recognize that your caricature of the objectors is just silly.

    Comment by RationalSportsFan — March 6, 2012 @ 12:12 am

  4. Tango,

    Inform Marcel that Ryan Braun is going to be tested on a weekly basis this year and see if he still predicts double digit HRs.

    Comment by cpebbles — March 6, 2012 @ 12:13 am

  5. Does anyone actually look backward and see how accurate the annual forecasts are, especially for regular position players and starting pitchers who have a history. For example, at least 3 full seasons under their belt and have more than 500 PA or 25 starts. Bench players, relievers and young players are probably a roll of the dice.

    Anyways, it would be nice if someone has looked at this for the major forecasting systems and published it somewhere.

    I would like to see forecasts for rate stats presented in a way that the uncertainty is provided. Example, SLG 500 +/- 30. Forecasting counting stats is problematic due to issues with playing time that can not be accurately forecast due to injuries and managers decisions.

    Comment by pft — March 6, 2012 @ 12:55 am

  6. What’s an approximate correlation for Marcel projections vs. end of season performance on rate stats(wOBA and FIP especially).

    Comment by Jeff — March 6, 2012 @ 1:00 am

  7. I have a question about forecasting and statistical relevance of certain stats. What I mean is that certain stats become indicative of a players true talent at different times, and it seems like this could be helpful to build in to a forecasting model. For example, contact rate – which might be an underlying component for forecasting Avg. – becomes statistically relevant as an indicator of a players’ true talent at 100 PAs. Essentially, I’m asking why we use larger samples of data to forecast, when smaller samples have proven to be statistically indicative – in addition to being a more recent representation of a players’ true talent? Thanks in advance.

    Comment by Milby — March 6, 2012 @ 1:02 am

  8. http://www.insidethebook.com/ee/index.php/site/article/testing_the_2007_2010_forecasting_systems_official_results/

    http://www.fangraphs.com/community/index.php/comparing-2011-hitter-forecasts/

    Would love to see ranges of probabilities. Zips does this a little.

    Comment by byron — March 6, 2012 @ 1:05 am

  9. Seems that, for most players, the range in projections of rate stats is substantially narrower than the range in forecasts of playing time. And of course this seems true especially for the young, the old, and those who face challenges for playing time, i.e., most players. So, for truly useful (for fantasy players, anyway) projections, the PA forecast seems more important than the level of performance projection. Long way to get to this: are you doing Community Forecasts of playing time this year? If not, any who else has a proven track record in forecasting playing time? And thanks for doing what you do.

    Comment by dprat — March 6, 2012 @ 1:06 am

  10. Because more data is almost always better than less data. You can weigh older data less. The exceptions (partial-year injuries, revamped swings, extra pitches) are exceptional enough to not provide much benefit to forecasting systems as a whole.

    Comment by byron — March 6, 2012 @ 1:08 am

  11. Why are projections usually measured by RMSE and not MAE? I think this leads people to regress more when really, you want the closest forecast not the forecast that is pretty close and doesn’t have big errors. If MAE was the big test, projections would be less regressed.

    Comment by BoSoxFan — March 6, 2012 @ 1:10 am

  12. Byron-
    (a) opposite
    (b). you may be using the wrong website
    (c). marcel is, IMO, the quite possibly the most reliable/useful (if there is such a thing regarding projections)

    Comment by Bill but not Ted — March 6, 2012 @ 2:01 am

  13. I’ve been running my own fantasy baseball projections (in conjuction with other projections) for a 3rd straight season. I’m wondering if there’s any validity to what I’m doing above just using others’ projections or if I’m wasting a considerable amount of my time.

    Basically, I’ve created a calculator. For pitchers, I input K%, BB%, FB%, HR/FB%, BABIP, LOB% (though I make my own for earned runs only), plus bunts, GIDP, pickoffs, etc. I look at projections, career rates, ages, last few years (45%, 35%, 20% for 3 prior years) to estimate these. I do similar for hitters. The calculator outputs the fantasy stats for me.

    I think it has helped me get a better understanding the driving forces of stats like HRs, but outside of that am I just wasting my time?

    Thanks for doing this!

    Comment by jsp2014 — March 6, 2012 @ 2:48 am

  14. I somehow forgot to mention that any advice for improving my process and accuracy would be greatly appreciated. Thanks again.

    Comment by jsp2014 — March 6, 2012 @ 3:06 am

  15. Do the current forecasting systems exist to forecast overall performance (e.g., to track real baseball) or to forecast performance for fantasy baseball? It strikes me that there is some room for systems to differentiate themselves by specifically focusing on the specific variables desired for different tasks.

    Comment by Blue — March 6, 2012 @ 8:14 am

  16. It would be helpful (to me) that people don’t use this thread as a discussion thread. I was envisioning people posting their questions, and then I would do a reply to each question, so we’d see:
    Q
    –A
    Q
    –A

    And so on. The Marcel 2012 thread looks more like a discussion thread, and we don’t need two of those at the same time.

    I’ll reply in the afternoon, so feel free to keep asking more questions. And even if you think the questions are too “simple”, post them anyway, because I promise you that there’s someone ELSE who is thinking the exact same thing, so you’d actually be doing someone else a favor.

    Another good reason not to turn this into a discussion thread is that I need everyone’s voice to be heard, not just those that are louder and snarkier. This thread is to benefit everyone.

    Thanks…

    Comment by Tangotiger — March 6, 2012 @ 9:13 am

  17. This is a really good question. Thanks.

    Comment by philosofool — March 6, 2012 @ 9:17 am

  18. Sounds like you need a regression factor, which just means adding in a small amount of league average performance.

    Comment by philosofool — March 6, 2012 @ 9:19 am

  19. Which publicly available projection system do you think is the best and most accurate?

    Comment by Kevin Ebert — March 6, 2012 @ 9:42 am

  20. Love this Q and A…

    Do you have any links to analysis of previous performances of popular forecasting systems (i.e. which one perform consistently the best)?

    Do you have any links to descriptions of what popular forecasting systems use to forecast?

    Do any forecast systems use advance metrics like jsp2014 (see above) uses? I would think a combination of that with some adjustment for injuries, team changes and playing time changes would make the best forecast. After all, we talk all season long about how those advance metrics are predicting a regression (or progression), why wouldn’t we use them in forecasting before the season begins? Am I missing something?

    Comment by mymaus — March 6, 2012 @ 10:04 am

  21. There seems to be players that some project systems “seem to hate.” Ichiro and PECOTA always seemed to be an example.

    Is there any player, or player type, that you think your projections may under or over value?

    Thanks for all of the great information.

    Comment by Joe — March 6, 2012 @ 10:08 am

  22. Using Marcel, or any other system, do you know of one (or more) players that have consistently matched their projections year after year? I don’t know how many seasons would be significant, but do you think that anything could be learned from said players being so projectable? And, yes I realize that by projecting so many players, some are bound to be dead on for several seasons.

    Thanks!

    Comment by Steve — March 6, 2012 @ 10:09 am

  23. Hey Tango,

    I have always been curious how “luck” factors into each projection system, how heavily does Bill James, Marcel, Pecota, and others factor in FIP, xFIP, BABIP, even xBABIP, or any of the other advanced metrics that are more predictive than the basic ones? Thanks for your time!

    - Moe

    Comment by Moe Koltun — March 6, 2012 @ 10:13 am

  24. 1) Are pitchers batting statistics included in the league average?

    2) Do any systems use the prior estimate as the complement rather than the league average as the complement?

    Comment by glassSheets — March 6, 2012 @ 10:23 am

  25. robinson cano PAs:

    07 – 669
    08 – 634
    09 – 674
    10 – 696
    11 – 681

    12 zips – 672
    12 bill james – 671
    12 marcel – 610

    i assume durability regression is the reason, but how/why is marcel so far away from the others? cano’s entering his age 29 season. 60 PAs seems like a lot to be off from both other projection systems and cano’s 5 year average.

    Comment by Woodrum's UZR Article — March 6, 2012 @ 10:43 am

  26. I’ve read some of the analysis on the relative accuracy of these systems, and while for real baseball I understand the purpose of being accurate when it comes to the whole league, for fantasy purposes most people only care about the 200 or 300 most relevant players. Which system tends to be the most accurate when it comes to them? Which is the most accurate for the top 100?

    Comment by zack — March 6, 2012 @ 10:44 am

  27. and to be clear: there were a LOT of players that this applies to, i just picked cano for this example.

    Comment by Woodrum's UZR Article — March 6, 2012 @ 10:45 am

  28. yes, extend our complaints to ridiculous and hyperbolic proportions and then criticize the results… quite productive.

    Comment by Woodrum's UZR Article — March 6, 2012 @ 10:53 am

  29. I would rather have a forecast that is pretty close and doesn’t have big errors. No forecast is going to ht the actual results on the head anyway, except maybe a few by chance.

    Comment by Baltar — March 6, 2012 @ 11:19 am

  30. I think you hit a nerve, Byron.

    Comment by Richie — March 6, 2012 @ 11:23 am

  31. sarcastic comments are only appreciated when theyre funny. that doesnt apply in the case of byrons comment.

    Comment by Woodrum's UZR Article — March 6, 2012 @ 11:27 am

  32. When projecting players with minimal playing time (say, 250-1,000 MLB PAs), which of the systems incorporate minor league stats? Are they weighted (aka, does the weight on minor league stats decrease as the sample size of MLB data increases)?

    Comment by Zack — March 6, 2012 @ 11:32 am

  33. I think it depends on what you’re using the projections for. If you’re trying to rank or compare players, an across-the-board error doesn’t really matter, and you should use RMSE. But if you’re trying to pinpoint a player’s wOBA, obviously MAE would be better.

    I’m guessing that for most purposes (which player should I draft on my fantasy team? who will be the best catcher in baseball? who should the Nationals start at 1B? who will hit the most homeruns? will Prince Fielder outhit Adrian Gonzalez?), it doesn’t matter whether the projection system is overly optimistic or pessimistic, so long as its internally accurate.

    Comment by Yirmiyahu — March 6, 2012 @ 11:41 am

  34. That’s an interesting question. Obviously, any good projection system is going to weigh older data less heavily than newer data.

    But do the more advanced projection systems use different weights and different sample sizes for different components? For instance, for K%, you might want to use only 1 or 2 years of data and put greater weight on the most recent 100 PA’s. But for BABIP, you might want to use 4 years of data and not put much more weight on the most recent season. Do any projection systems work like this?

    Comment by Yirmiyahu — March 6, 2012 @ 11:49 am

  35. Do all projection systems use a Monte Carlo simulation?

    Comment by swieker — March 6, 2012 @ 11:50 am

  36. Didn’t see this comment until after I’d posted a couple of discussion replies. If you don’t want the thread to get cluttered up with discussion, you might want to add this request to the original post.

    Comment by Yirmiyahu — March 6, 2012 @ 11:54 am

  37. Making a stupid comment and getting criticized for it is not the same as hitting a nerve.

    Comment by RationalSportsFan — March 6, 2012 @ 12:07 pm

  38. I’m wondering what, if any, external factors may exist that would have you follow a trend rather than forecast a reversion back to previous levels? Basically, what I’m struggling with is projecting breakouts from young players and further regression from old players. Here are 2 examples to explain my question better:

    Example A:
    A RF around the sweet spot of the aging curve has increased his HR/FB in 3 straight years, from 15.5% to 16% to 16.5% to 17%. Would you ever project further growth? What if it was a K-rate that dropped 0.5% per year?

    Example B:
    A 1B is 37 years old with 15 years of MLB experience. His K-rates in years 1-10 was steady at 15% every year. From years 11-15 the K-rate has gone 15% to 15.5 to 16 to 16.5 to 17% in 2011. Would you project the K-rate to increase in 2012?

    I essentially always revert back to previous levels, in these cases using some 3-year weighted average and ending somewhere between 2010 and 2011 values.

    Comment by jsp2014 — March 6, 2012 @ 12:18 pm

  39. Why does it appear that averaging multiple projection systems seems to generate better performance than any single projection system? Also, at what point does aggregating start giving diminishing returns (i.e. if I aggregate all batter projection systems that did better than Marcel last year, am I likely to get better results than if I just aggregate the two systems that performed best)?

    Comment by dzigga — March 6, 2012 @ 12:21 pm

  40. If I wanted to aggregate multiple systems and use Marcels as part of that equation, should I use the raw Marcels numbers or scale the Marcels numbers based on the expected plate apperances of the Fangraphs’ fan projections?

    Comment by Go Rockies — March 6, 2012 @ 12:58 pm

  41. How suspicious should I be of projections where the primary basis of the projections are minor league statistics (i.e. Matt Moore, Mike Trout, Brett Lawrie, etc)? I would expect the average error for such projections is much higher than for a normal big league player. Has anyone quantified how much larger the grain of salt should be, and is there any difference between the margin of error for pitchers vs hitters?

    Comment by dzigga — March 6, 2012 @ 1:14 pm

  42. Would a projection system be better served if it were to attempt to set the Over/Under on all stats? It seems like many of the projections are regressed so heavily that one could easily (> 66% correct) pick the over or under on the stats.
    vr, Xei

    Comment by Xeifrank — March 6, 2012 @ 3:54 pm

  43. Would a weighting of projection systems be a possible way to draw any sort of conclusions? (say 25%-Marcel, 50%-Zips, 40%-James) Would there be one that you suggest?

    Do you know how actual MLB teams use the projection systems we have available (if at all)?

    Comment by Jim Lahey — March 6, 2012 @ 4:02 pm

  44. Why do projection *systems* even bother with playtime projection? It seems like the results should be represented in “per 150 games” or “per 650 PA” terms.

    (I stress *systems* because obviously there are products like PECOTA and OLIVER that offer PTA projections based on human inputs that represent best guesses about play time. But these guesses aren’t really systematic and deserve to be considered separately from the mathematical procedure for many, perhaps most, purposes.

    I guess, on this view, Fans projections aren’t part of *system*, they’re more like opinion aggregation. )

    Comment by philosofool — March 6, 2012 @ 4:36 pm

  45. When considering games played, is injury history predictive? Basically, are some bodies more randomly fragile than others? If not, why do players have differing playing time projections?

    Comment by Sean — March 6, 2012 @ 8:17 pm

  46. Trying to strip out any sarcasm, and the answer is:

    Marcel provides the minimum level of competence of any forecasting system. It’s its intentional design.

    And yet, and yet, a good share of forecasting systems either don’t beat it, or barely beat it.

    Anyone who dismisses Marcel has no idea what they are doing. Marcel should be the core of EVERY forecasting system.

    Comment by Tangotiger — March 7, 2012 @ 11:23 am

  47. What you said in the first half is pretty good.

    No, you can’t forecast everyone to be exactly the same (nor can you forecast everyone to repeat 2011).

    Marcel provides the basis as to what you should expect… on average for each player. Even if Marcel knows that there should be 10 guys that are going to hit more than 30 HR, it doesn’t know which of the 40 candidates to hit 30+ HR will actually hit more than 30 HR.

    Those 40 guys who have 30+ HR POTENTIAL will have a mean forecast of say 25 HR. And that’s what Marcel will show: 25 HR for those guys.

    Some systems, knowing that we should observe 10 guys with 30+ HR will actually go ahead and forecast 10 guys with 30+ HR. And, if you look at it at the end of the year, they’ll only be right on 2 or 3 of them.

    So, what is it that you want out of a forecasting system? For them to hang their b-lls out, only to get the invariably cut off? Or, to simply acknowledge that it’s darn hard to forecast, and so, temper the enthusiasm so that you can get the group of 10 or 20 hitters right, as a group, even though you’ll be off individually?

    Comment by Tangotiger — March 7, 2012 @ 11:28 am

  48. I’ve published several tests on my blog over the years, with one of them listed in the above comment.

    If there is a single forecasting system out there that will say “I will definitely beat Marcel this year”, feel free to alert me. The best someone can do is have the true talent to win 52% of the time (meaning that said forecasting system will beat Marcel say 47%-57% of the time).

    Comment by Tangotiger — March 7, 2012 @ 11:32 am

  49. All forecasting systems have an r=.65 to .70, depending on the year, for hitters, and somewhat lower for pitchers (I don’t remember what it is for FIP, but probably close to .40).

    Comment by Tangotiger — March 7, 2012 @ 11:34 am

  50. What you are talking about is forecasting COMPONENTS, say K/PA, or 3B/(2B+3B), or BABIP, etc. This is much more important to do with pitchers.

    And that is absolutely the way to go. Marcel chooses not to do so, for simplicity reasons.

    Comment by Tangotiger — March 7, 2012 @ 11:37 am

  51. Absolutely, that more important than forecasting rate stats is forecasting playing time. I mean, every system (and fan) has the same guys at the top of the wOBA scale. Cabrera, Fielder, Braun, Pujols, etc.

    But, the true value comes in getting the playing time right. And there, I’m sure there’s huge differentiation.

    And my money is on Community Playing Time forecasts, because I believe the team’s fans knows more than some algorith, or self-appointed single team expert.

    Yes, I will run it again this year, as it’s deathly simple on a programming side (all the work is you guys entering data). But, Fangraphs ALSO does playing time forecasts, and my quick look at it in the past is that it matches pretty well with the ones my readers fill in. (Or maybe that was Dave Allen. Read his articles from a few years ago. Good stuff.)

    Comment by Tangotiger — March 7, 2012 @ 11:40 am

  52. I measure using both. I don’t see why we have to annoint RMSE over MAE or vice versa.

    The only thing you should not use is correlation (r), because that adjusts the slope, and that makes no sense.

    Comment by Tangotiger — March 7, 2012 @ 11:46 am

  53. Sorry for my unexpectedly disruptive sarcasm. The point I was trying to make is that if you, Tangotiger, the creator of Marcel, spent a year creating the most accurate forecasting system you could, Marcel wouldn’t be the result, it’d be the starting point. Yet other projection systems are lucky if they get a half step further than the starting point.

    Comment by byron — March 7, 2012 @ 12:08 pm

  54. On the surface, it looks like you are doing the right thing.

    You definitely need to do regression, as the comment above noted.

    And you also need to account for the age of the player.

    Comment by Tangotiger — March 7, 2012 @ 12:25 pm

  55. Real baseball.

    Comment by Tangotiger — March 7, 2012 @ 12:26 pm

  56. If anyone would suggest just one, then that person is full of sh!t.

    They are so close at the top, that to choose one over the others would be like saying an 85-win team is necessarily more talented than an 84-win team.

    Go with a mix of 3 or 4, and make sure Marcel is one of them.

    Comment by Tangotiger — March 7, 2012 @ 12:28 pm

  57. I should note that Marcel provides a “reliability” value from 0 to 1.00 (maximum is actually close to 0.90).

    So, for those who have a low reliability, then rely more on the others.

    If you wanted a formula, then I’ll just throw it off the top of my head: take the reliability figure, and square it. That’s how much weight to give Marcel. So, if you have someone with a reliability of 0.90, then count his Marcel as 0.81 shares. If you have someone with a reliability of 0.50, then count his Marcel as 0.25 shares.

    If you want to use ZiPS, Fangraphs Fans, Oliver, PECOTA, etc, you can give them each 1 share.

    Then, just do a weighted average for each player.

    Comment by Tangotiger — March 7, 2012 @ 12:31 pm

  58. 1. A few have been posted already. I have tons at my blog, so just do a google for
    Tangotiger Forecasting Systems Evaluate site:insidethebook.com
    and see what you get.

    2. No.

    3. I’m sure they do. But it’s not clear that it’s being used well, or for a net benefit. Marcel doesn’t do anything, and it goes toe-to-toe with these systems.

    Comment by Tangotiger — March 7, 2012 @ 12:35 pm

  59. Since Marcel intentionally ignores (a) minor league data and (b) parks, then any player with less than 2 years, or who underwent a park-shift (Petco to Fenway let’s say) would be a player that Marcel is biased against.

    Comment by Tangotiger — March 7, 2012 @ 12:38 pm

  60. We’re not going to learn anything from this. I think you should try to focus on other aspects.

    You have to understand that there’s HUGE variation, based on nothing but luck, when it comes to something that has 600 trials, and that the true rate of everything is so darn close. Consider that the true OBP rate of most players is .300 to .400, or 30% to 40%.

    If you have 625 trials, one standard deviation is (roughly) 0.5/sqrt(625) = .020, or 2%.

    So, a guy who is a true 35% in OBP, will perform, by luck alone, at 31% to 39%, 95% of the time.

    Basically, there is very little variation in true talent, in absolute terms, when we’re looking at the creme-de-la-creme.

    If you had a bunch of weighted coins, weighted to give you heads 47% to 53% of the time, would you really be able to tell which is the weighted coin of 53% after a few hundred trials?

    Appreciate that random variation will kill any semblance of “accurate” forecasting system.

    Comment by Tangotiger — March 7, 2012 @ 12:44 pm

  61. Any forecasting system that intentionally tries to “stretch out” their forecasts, by, basically, trying to forecast luck is doing something very foolish. It may look good, but, it’s not a good thing to do.

    Comment by Tangotiger — March 7, 2012 @ 12:45 pm

  62. 1. No system should use pitcher’s batting stats.

    2. You don’t need the prior estimate, since that’s built in by using the prior years.

    Comment by Tangotiger — March 7, 2012 @ 12:55 pm

  63. Do me a favor. Take the top 20 in PA in 2008-2010, and tell me two things:

    1. what was their average PA per season in 2008-2010
    2. how many PA did they have in 2011 on average

    I have NOT done the work. I do know someone that posted a similar kind of research.

    The reality is that Marcel is based on… historical reality.

    Comment by Tangotiger — March 7, 2012 @ 1:06 pm

  64. EXCELLENT question.

    If we limit ourselves to the higher end, we’ll find even less differentiation.

    Comment by Tangotiger — March 7, 2012 @ 1:07 pm

  65. Other than Marcel, which uses ZERO minor league data, all other systems use some minor league data.

    Naturally, one would hope that the more MLB data you have, the less you need to rely on minor leagues (and scouting).

    I can’t confirm what they actually do.

    Comment by Tangotiger — March 7, 2012 @ 1:09 pm

  66. Other than to get R and RBI, why would we need Monte Carlo?

    Comment by Tangotiger — March 7, 2012 @ 1:09 pm

  67. Only in the very extreme circumstances can you possibly hope to find a trend. And even then, you still have to hedge your bets.

    Has Clayton Kershaw’s 2010-11 really mean we can leave 2008-09 behind? We have some good idea that we should, namely that he’s so darn young, that 2008-09 is “growing pains”. And, his toolset is indicative of a high quality pitcher. But, we could have said the same for many many young pitchers.

    Again, the reality is that we use historical data to establish the algorithm. We can’t just say “yeah, except for this guy”, and then proceed to use that exception to the rule so many times, that you obviate the rule to begin with.

    Comment by Tangotiger — March 7, 2012 @ 1:13 pm

  68. You average out so that you knock out the extreme forecasts.

    And the idea that system A and B beat Marcel in 2010 has meaning in 2011 is not supported by results. Don’t expect that just because Ethier was better than Kemp in one year that it means that he’d have been better in 2011 as well.

    We don’t know which system really is the best, so just average them out and don’t waste your time.

    Comment by Tangotiger — March 7, 2012 @ 1:15 pm

  69. If you have Community Playing Time forecasts, use those and scale Marcel’s numbers to that.

    Comment by Tangotiger — March 7, 2012 @ 1:16 pm

  70. From what I remember (and it’s posted in one of my evaluations), if we use wOBA (or OBP), then the average error for veteran players is around .025, but for pure-rookies, it’s close to .040.

    Indeed in some respects, you get the same average error if you just forecast every rookie to be identical as you get from the forecasting systems.

    Comment by Tangotiger — March 7, 2012 @ 1:18 pm

  71. I’m not sure I understand the question. Presuming that the mean Marcel forecasts can double as the over/under forecasts, are you suggesting you can “easily” figure which stats are too low and too high?

    Comment by Tangotiger — March 7, 2012 @ 1:20 pm

  72. Just equally weight them all, but weight Marcel by the square of the “reliability” value. See above as I replied more elaborately.

    Comment by Tangotiger — March 7, 2012 @ 1:21 pm

  73. Right, we could just do per PA and per IP or per 650PA and per 200IP or whathaveyou.

    I think it’s just habit, and to give a semblance of playing time forecasts, even if you wouldn’t really stand behind those playing time forecasts relative to what a fan would suggest.

    Comment by Tangotiger — March 7, 2012 @ 1:23 pm

  74. I’m not sure that it is, but it’s implicitly part of the equation based on prior playing time.

    I do find that we don’t need to look at anything prior to two years ago, and indeed, just last year is the huge weighting. So, a guy injured in 2010 should have only a little bearing as to whether he might not play in 2012.

    Comment by Tangotiger — March 7, 2012 @ 1:27 pm

  75. Yes, Marcel is extremely straightforward, and required not much effort. Basically, you get 90% of the way there (“there” being the best you can possibly attain), by spending about 5-10 hours to creating a forecasting system like Marcel.

    Then, you spend, as I have, another 40-50 hours to get it 91% of the way there (it’s not part of Marcel). And, you spend, as others have, another 200-300 hours to get it 92% of the way there. And you spend, as I’m sure someone has, another 1000-2000 hours to get it 93% of the way there.

    It becomes a question of diminishing returns, and a question of pure pursuit of knowledge.

    I know I’ve spent/wasted quite a bit of time in this pursuit. It’s fun, you get to learn stuff. But, there’s no “oracle” moment, and no one should pretend there is.

    I like to think of myself as the Amazin’ Randi to the Uri Gellers of the world.

    Comment by Tangotiger — March 7, 2012 @ 1:36 pm

  76. i lol’d byron. funny post.

    Comment by kendynamo — March 7, 2012 @ 2:02 pm

  77. You wrote above, “Marcel should be the core of EVERY forecasting system.”

    If you’re still around, would you mind expanding on this a bit? Can I use Marcel as my regression factor?

    I feel I should know the answer to this but I’m a novice forecaster. I’ve already learned a TON from this, and it’s inspired me to dig through your archives a bit more. Thanks again for your time.

    Comment by jsp2014 — March 7, 2012 @ 2:26 pm

  78. How are Runs and RBI calculated in Marcel’s projections and how reliable would you say they are compared to other stats (wOBA, HR, BB)? Is position in the lineup or the quality of the team’s projected lineup considered at all?

    Comment by GTW — March 7, 2012 @ 2:28 pm

  79. What I mean is that a forecasting system should use the principles of Marcel as its basis, and then expand beyond that, using minor league data, park adjustments, maybe going to 4 and 5 years out, using components, etc.

    Comment by Tangotiger — March 7, 2012 @ 3:39 pm

  80. I only use exactly what I said I used, which is the seasonal data for the last 3 years. So, if you are asking for Fantasy purposes, I would not rely on it too much.

    Comment by Tangotiger — March 7, 2012 @ 3:42 pm

  81. Top 20 for PA in 08-10.
    Player; Ave PA for 08-10; PA for 11
    Ichiro 719.2 721
    Fielder 709 692
    Jeter 707.2 607
    Markakis 705.2 716
    Teixeira 701.1 684
    Gonzalez 691.1 715
    Braun 685.1 629
    Pujols 680.1 651
    Wright 674.2 447
    Howard 674.1 644
    Tejada 673.2 343
    Young 673 689
    Abreu 672.1 585
    CabreraM 672.1 688
    Cano 668 681
    Kemp 664 689
    Theriot 659.1 483
    CabreraO 658.1 477
    Vicotrino 656.1 586
    Holliday 656 516

    Comment by TFINY — March 7, 2012 @ 4:41 pm

  82. And the formatting didn’t transfer. Sorry all.

    Comment by TFINY — March 7, 2012 @ 4:41 pm

  83. I would like to make the bet that I could choose the over or under on Marcel projections (Pitchers wins, HRs, SLG etc…) and be correct two thirds of the time (or some other high percentage of the time).

    In other words would Vegas get killed in putting out Marcel or some other projections numbers as over/under bets.

    I think they would.

    Comment by Xeifrank — March 7, 2012 @ 5:13 pm

  84. I appreciate the work. Three more if you may:

    1. Check your averages, because you can’t have “.1″ and “.2″, if you are dividing integers by 3. These are not innings, but PA!

    2. Show the Marcel forecast for each player.
    http://tangotiger.net/marcel/

    You can get the 2011 data there.

    3. Show the group average… then be impressed.

    Comment by tangotiger — March 7, 2012 @ 5:58 pm

  85. To get good formatting, put the numbers to the LEFT and the text to the right. It’ll line up nice.

    Comment by tangotiger — March 7, 2012 @ 5:59 pm

  86. Yeah, sure, no problem, if you get to pick and choose the stats. I’ll choose the over on all Daniel Bard forecasted numbers, that’s for sure (except Saves where I’ll choose the under).

    But, if you had to bet the over/under on EVERY forecast?

    Comment by tangotiger — March 7, 2012 @ 6:00 pm

  87. I think the litmus test for a projection/forecasting system would be how well their numbers would hold up to being a Vegas over/under on every stat for every player while giving them 10 cent juice. I probably wouldn’t put Marcel to this test due to its simplicity, but I wonder how well the other systems would do. Each projection (non Marcel) system owner should ask him/herself if they’d be willing to take a bet on any of their projection numbers (10c juice). Maybe this is hard to do in practice, but it is the best test in my opinion.

    Comment by Xeifrank — March 7, 2012 @ 6:49 pm

  88. It could be done with fake money. Give everyone 100 units to place over/under bets on any stat from any player. Or to make it tougher, only take the fake bets on pitchers/hitters with projected playing time above some arbitrary threshold.

    Comment by Xeifrank — March 7, 2012 @ 6:51 pm

  89. 2008-2010 average: 680 PA
    2011 average: 612 PA

    Unsurprising.

    Comment by Matthew Bultitude — March 7, 2012 @ 7:08 pm

  90. Marcel’s forecast is 50% of year T-1, 10% of year T-2, plus 200.

    If we presume they got 680 PA in those two years, that’s 340 + 68 + 200 = 608 PA. Someone can go through the actual Marcel forecasts for those 20 hitters, but it’ll be close to 608.

    Therefore, a pretty solid match to the actual 612.

    Does this satisfy the non-believers?

    Comment by tangotiger — March 7, 2012 @ 7:39 pm

  91. For the averages, .1 is .3333 and .2 is .6666. I was using .1 and .2 for my convenience and forgot to switch it back.
    Marcel 11 Prediction, 11 Actual PA, 08-10 PA Average, Player Name

    634 721 719.6 Ichiro
    629 692 709 Fielder
    641 607 707.6 Jeter
    626 716 705.6 Markakis
    627 684 701.3 Teixeira
    614 715 691.3 Gonzalez, A.
    613 629 685.3 Braun
    620 651 680.3 Pujols
    597 447 674.6 Wright
    580 644 674.3 Howard
    608 343 673.6 Tejada, M.
    618 689 673 Young, M.
    600 585 672.3 Abreu, B.
    592 688 672.3 Cabrera, M.
    615 681 668 Cano
    601 689 664 Kemp
    588 483 659.3 Theriot
    539 477 658.3 Cabrera, O.
    593 586 656.3 Victorino
    604 516 656 Holliday

    Marcel Ave: 605.1 PA
    Group Ave: 613 PA
    13 of the top 20 were above the Predicted Average.

    Comment by TFINY — March 7, 2012 @ 7:56 pm

  92. So, as a follow up how much do these projections really mean? It seems that the projections for a single player have so much variation that, while not totally useless, may not provide that much information at all. The majority of the time a players numbers will fall somewhere in a spectrum with a fairly wide range like in your example with OBP.

    It’s not as if we can look to the longrun and wait for the mean to occur. Flip a fair coin 10,000 times and we’ll see pretty close to a 50-50 split, but baseball players only stay at a certain talent level for so long. Players will generally improve to their peak, sit there for a few seasons and decline. With such few trials (using a season as a trial here) there seems to be so much uncertainty.

    For a group of players, or even the whole league, there is more value. Assuming the population is normal, you’ll have your overperformers, and underperformers and they’ll cancel each other out roughly. And surely, many players will perform right at their mean projections. I’m sure there is value in that, but that holds less interest for me.

    Are the projections for a single player distributed normally? Could it be different for different statistics? Is it possible that there are other distributions that more accurately capture these numbers?

    I suppose there’s too much variation just due to the human element that we can only do so much. But, man, if only there was some way to account for seasons that are huge statistical outliers. Think Brady Anderson, or even Jacoby Ellsbury last year…..

    Comment by Steve — March 8, 2012 @ 4:11 pm

  93. You’ve captured the spirit pretty well.

    But I’ll disagree with the “while not totally useless, may not provide that much information at all”. It provides a good deal of information, if you don’t use the 2009-2011 data. But, if you use the 2009-2011 data, then, right, it won’t provide that much information for a single player.

    Comment by Tangotiger — March 8, 2012 @ 5:01 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Current ye@r *

Close this window.

0.354 Powered by WordPress