FanGraphs Baseball


RSS feed for comments on this post.

  1. Good article. And I think this is a great explanation of your often stated point that teams should continue to add talent even if they are low or in the middle of the win curve. Frequently people are thinking that if you aren’t projected to win 90+ you should blow it all up and start over, which is ridiculous. Very relevant to your arguments for the Mets to sign both Dickey and Wright to extensions as well.

    Comment by Caveman Jones — February 20, 2013 @ 4:41 pm

  2. Speaking of the RLYW guys:
    Dave…if you have a connection there, please inform them that they have double counted Freddy Garcia in the most recent CAIRO projections. (He’s listed twice, which may be screwing up their projected standings.) I can’t find contact information.

    Comment by Marver — February 20, 2013 @ 5:01 pm

  3. I fully agree that we need to do better about showing the uncertainty inherent in projections. However, I personally don’t differentiate between “prediction” and “projection” in the same way you do. I would use the words interchangeably, but the good ones will have the appropriate levels of uncertainty/confidence. For instance, I would in fact predict that the #1 seed would win the #1 pick, and my prediction would have 25% confidence. I’d also mention the chances for the other teams. In my mind, a prediction/projection *is* a probability distribution, and it’s harmful not to treat it as such.

    If you want to make that distinction in language that’s fine too though, but the important part is for us to think probabilistically.

    Comment by mickeyg13 — February 20, 2013 @ 5:03 pm

  4. Give us the confidence level of each number

    This is great advice for projections of individual players as well.

    Comment by Anon — February 20, 2013 @ 5:09 pm

  5. I hope people pay attention to the last paragraph

    Comment by Eminor3rd — February 20, 2013 @ 5:24 pm

  6. Garcia accidentally got listed twice in the projection spreadsheet, but that is not an issue with the projected standings which are done using a simulator.

    Comment by SG — February 20, 2013 @ 5:28 pm

  7. Mickeyg13 hits the nail on the head. The main point is that we need to quantify our uncertainty. Good predictions (or projections) should have probability distributions around them, reflecting the confidence of our claim.

    Comment by Beef — February 20, 2013 @ 5:39 pm

  8. If 68% confidence interval is +-8 wins, 95% is +-24 wins.. Just sayin’. Even more illustrative of your point. Great post though

    Comment by EJ Johann — February 20, 2013 @ 6:11 pm

  9. I’m not a woman, I am a nation.

    Comment by Mother Russia — February 20, 2013 @ 6:19 pm

  10. I also concur with Mickeyg13 and Beef. Perhaps it is because I work in statistics, but I don’t think the word *prediction* implies any greater certainty than does the word *projection*. A projection is a prediction. I suppose it is called a projection because it is a prediction of a future value, but it is still a prediction. All predictions have uncertainty and that uncertainty should be presented along with the predicted value.

    Comment by Keith Karcher — February 20, 2013 @ 6:19 pm

  11. I think you made a mistake there. To go from 68% to 95% you go from one deviation in either direction to two, so it doubles the spread, not triples it.

    Comment by byron — February 20, 2013 @ 6:20 pm

  12. I suspect that the Dodgers will be a excellent example of this point, lots of talent may not equal success.

    Comment by Hurtlockertwo — February 20, 2013 @ 6:21 pm

  13. 95% is 3 standard deviations. 1=68%, 2=90%, 3=95%, 4=99%

    Comment by EJ Johann — February 20, 2013 @ 6:24 pm

  14. Nope:

    Comment by byron — February 20, 2013 @ 6:34 pm

  15. I have my own projection system (7.6 std dev over 3 years) and do exactly what you recommend- I use my past error as a probability distribution to find each team’s odds of winning their division.

    Comment by Justin — February 20, 2013 @ 6:44 pm

  16. I think you may be right. I know we talk about tangible stuff around here but the intangibles on that team scare the poop out of me.

    Comment by Spit Ball — February 20, 2013 @ 6:58 pm

  17. My mistake… Nice catch

    Comment by EJ Johann — February 20, 2013 @ 7:15 pm

  18. Frankly, the only thing that’s missing (at least not readily apparent) from most sites/people’s projection estimates is the uncertainty. It would be more informative if everyone just stated a win projection with ± estimate (could be StDev, or St. error; ideally it’d be 95% Confidence).

    Unless you spend a fair amount of time using statistics people seem to prefer a set number. It’s also less visually attractive.

    Comment by Klatz — February 20, 2013 @ 7:55 pm

  19. Great explanation! This might be Dave’s best post in quite a while.

    Comment by gnomez — February 20, 2013 @ 7:55 pm

  20. It seems odd to criticize others for not publishing standard deviations when this website also does not publish standard deviations.

    Comment by Crap Shoot — February 20, 2013 @ 8:13 pm

  21. I appreciate the general thrust of this article, but I don’t think it goes far enough. What’s the utility of distinguishing between “projections” and “predictions”? In other words, why say something has “high” certainty when we can actually quantify the uncertainty? Whether we think certainty is “high” or “low” should emerge from the interpretation of the data. All estimates are inherently uncertain. It’s simply a question of how much exists. “High” is a very imprecise and arbitrary description of how much uncertainty exists for an estimate, and it’s unnecessary to be so imprecise.

    Comment by Jonathan — February 20, 2013 @ 8:58 pm

  22. Well, the issue is that the words mean different things within different communities of practice. Sometimes a projection is a subset of prediction, sometimes a prediction is a subset of projection! It literally depends on the group of people you’re talking with.

    So, to amend Dave’s mother/woman concept, imagine that the world become cohabited by aliens where all women were mothers but not all mothers were women. Now try to disentangle the terms. Basically, Dave is stating that within this community of discourse, he’s treating predictions as a subset of projections. Which is fine, since it beats having the terms be so ambiguously related that they’re both almost useless.

    Comment by B N — February 20, 2013 @ 9:07 pm

  23. FanGraphs doesn’t do their own projections, they simply display the projections of outside sources, so everything he said in his piece would apply equally to the projection systems they display here.

    Comment by KyleL — February 20, 2013 @ 9:10 pm

  24. It’s a shame prospect evaluation is so far behind the rest of baseball analysis when it comes to this.

    Comment by jda — February 20, 2013 @ 9:18 pm

  25. And, for the record, I mentally define prediction and projection somewhat differently. Coming from a systems and modeling standpoint, I consider a projection something that extrapolates from the known data (e.g., given prior events, project paths of future events). A projection can’t be “wrong”, as it’s just the mechanistic result of applying a model. In other words, projections are data-driven.

    Prediction, on the other hand, is meant to be tested. A prediction can be right or wrong. In other words, predictions are results-driven. I’d say that’s qualitatively different from being data-driven. For example, let’s say you have a savant who can actually predict the NBA draft lottery better than chance, just by looking at the ball machine. If you’re doing predictions, you’d be a fool not to consult him. How would you build that into a projection? “Well, we calculated the exact odds based on the probabilities, and then added a vector for Rainman’s predictions that improves the correlation by 10%…”

    Heck, some of the best predictions are self-fulfilling prophecies. By comparison, calculating a projection and then doing everything in your power to increase the fit? Generally frowned upon. Probably for good reason (“I’m sorry Aubrey Huff, but my simulation projected that you would spend 90 days on the DL this year. Hold very still…”).

    So I generally lean toward a view that a projection is a systematic approach for modeling data, which produces some estimates related to future events. On the other hand, I’d say that predictions are characterized by their results (right/wrong, close/no cigar). Whether these results are arrived at by advanced statistics or astrology, they’re still predictions. However, if your advanced stats can’t outperform astrology, it may be time to find a new line of work (palm reading maybe?).

    Comment by B N — February 20, 2013 @ 9:37 pm

  26. “All estimates are inherently uncertain.”

    Funny, that’s what my mechanic said too.

    Comment by B N — February 20, 2013 @ 9:42 pm

  27. I heartily agree with your central point: That there are so many uncertainties in predicting win totals that forecasters would do better to show a range and provide confidence levels at different intervals.

    But I also believe that you obscure that point by your use of language, specifically, the supposed distinction between a projection and a prediction. Those two terms are used in a variety of fields and carry different meanings within those different fields. If you limit the definition to the field of statistics, your definition fails to capture precisely the difference; among many statisticians, a prediction is based solely on a past data set that allows one to calculate a future data point without the need to make any additional assumptions; a projection requires additional assumptions. For purposes of sabermetrics, we are almost always dealing with additional assumptions and therefore projections.

    Comment by Jonathan Sher — February 20, 2013 @ 10:03 pm

  28. I accidentally downvoted your comment, but I want you to know that it gets a +1 from me.

    Comment by swainzy — February 20, 2013 @ 10:31 pm

  29. “This is why making preseason predictions is kind of silly.”

    Even sillier is when someone makes a prediction that happens to happen, and then they crow about their baseball acumen.

    Comment by PackBob — February 20, 2013 @ 11:02 pm

  30. I’m glad someone finally mentioned how bad fangraphs war is at predicting wins. The thing is though, that +/- 16 is not how bad at is at predicting wins in the preseason, but how bad it is when you know the records of the teams and have a full season of data. This raises the question of why baseball-reference doesn’t seem to have this problem.

    Comment by Stathead — February 21, 2013 @ 12:33 am

  31. Always love when someone gets a prediction “right” and becomes viewed as an expert

    Comment by sprained left fat — February 21, 2013 @ 12:51 am

  32. Great article Dave.

    Comment by Lenard — February 21, 2013 @ 12:54 am

  33. This is exactly the reason why I said “We are going to compete for a championship.” Since the standard deviation is around 8 wins and we are projected to have around 81 wins, we could end up with 89 wins which potentially maybe could win the division. There is about 17% of that. Also, we have to factor in the chances of the other NL Central teams not reaching 89 wins. I’ll estimate that at around 15%. Then, I’ll just estimate the potential of winning the World Series after making the playoffs at around 12.5%. So the Pirates do have a chance at a World Series this year. It’s the almost not insanely small .32%… Good thing I know my rhetoric to excite Pittsburgh fans.

    Comment by Bob Nutting — February 21, 2013 @ 12:57 am

  34. Words are merely projections of ideas and thus have an inherent margin of error… All joking aside a prediction is most certainly not “a projection where there is a high degree of confidence in a specific outcome” people make predictions all the time with little to no confidence. In your own words later in the article you refer to a prediction as a “random number pulled out of thin air by a television talking head”- so a prediction can’t be both right?

    I think the main idea of the article that there exists a type forecast that was arrived at with math and has a margin of error and that there also exist forecasts that might be more specific in nature and they probably have less to do with math and more to do with something like “gut”. Call these things whatever you want but don’t confuse them as the same thing.

    Comment by Max H — February 21, 2013 @ 1:44 am

  35. When do the Pecota standings thingymajigs come out?

    Comment by DodgersKingsoftheGalaxy — February 21, 2013 @ 2:12 am

  36. They’re double counting a ton of pitchers from what I’ve seen. Probably over 100. Most of the double counts have different RAR/WAR values but other than that the entries are identical. I was curious about this, myself.

    Comment by Sandy Kazmir — February 21, 2013 @ 5:50 am

  37. Version 1.0102a is out now!

    Comment by philosofool — February 21, 2013 @ 8:37 am

  38. Looks like a problem with how I generate the depth charts. Some guys are getting pulled in multiple times. I’ll fix it in the next release but the projected standings aren’t impacted by it.

    Comment by SG — February 21, 2013 @ 8:37 am

  39. Can’t wait for the idiots on the sports talk shows to take up this advice.

    “I’m not saying who’s going to win the NL East, I am telling you, right now, that the Nationals have a 43% of doing it!”

    “forty-three! Forty-three?!? You are sadly mistaken by at seven percent my friend. I would say sixty-three if Dan Haren is healthy.”

    “I didn’t say anything about Dan Haren yet, but if he’s healthy and Bryce Harper can do what Mike Trout did, nothing can stop them ninty four and a half percent of the time.”

    Comment by philosofool — February 21, 2013 @ 8:43 am

  40. I don’t see a philosophical difference between the 75% probability of being wrong on the top pick and the 1.8% probability on pick 14. They are both just predictions with different degrees of uncertainty.

    If Dave wants to nipitck on that, then he should tell us what would be the cutoff between a projection and a prediction? 50%? 75%? 98%?

    And if the only way of calling it a prediction would be to be close to 100% sure of it, you could just call it a fact.

    Comment by Pepe — February 21, 2013 @ 8:59 am

  41. I think BP used to provide the 10th, 25th, 50th, 75th and 90th percentiles for their player projections …

    Comment by Eric R — February 21, 2013 @ 10:21 am

  42. Let me take a shot at applying Dave’s set-subset example.

    Carson Cistulli is a woman
    Carson Cistulli is not a mother
    I project that Carson’s next prospect review will feature a under-talented and under-hyped prospect*
    I predict that Carson’s next prospect review will feature a under-talented and under-hyped prospect

    *the standard deviation in this prediction is 0, it will always be true.

    I am not sure if I am heading in the right direction here.

    Comment by Jaybo Shaw — February 21, 2013 @ 10:29 am

  43. Of course, that’s not what fWAR is meant to be used for, so (as was said on another discussion thread) if you use the wrong tool for a particular job, don’t blame the tool. Don’t blame that flathead screwdriver because it’s not hammering very well.

    Comment by Jason B — February 21, 2013 @ 10:44 am

  44. This is so much worse than I interpreted his original article. You’re basing the difference on an arbitrary distinction of “high confidence” (whatever that means).

    I’d have to support Taleb to find such pretentious absurdity.

    Atleast the end of the article points out some aspects that aren’t semantic arbitrary distinctions. It should be pointed out that using SD is pretty laughable as is the idea of a bell curve. A team with a median of 93 wins IS MUCH MUCH MUCH more likely to hit 73 wins than 113 wins, but why actually write useful stuff. Though there are teams with lower medians that certainly have right hand kurtosis. (Wow. We can discuss other central moments!!!)

    Comment by dafuq — February 21, 2013 @ 11:00 am

  45. Because SD is dumb?

    Comment by dafuq — February 21, 2013 @ 11:21 am

  46. Dave, this sounds a bit like a very mini version of parts of Nate Silver’s recent book “the Signal and the Noise”. I know you’re familiar with some of his work, and have posted on him previously, you’ve read the book, I suspect? The book’s brilliant in explaining some of this stuff, and other aspects of statistical analysis.

    Comment by grant — February 21, 2013 @ 11:28 am

  47. Like the article, but I’m in the camp that the projection/prediction distinction is too contrary to common usage for my liking. It seems more instructive to say that predictions have some uncertainty attached with them, and the good predictors are the ones who can most accurately quantify this uncertainty.

    This notion really got lost in the whole Nate Silver election predictions. People judged his results on how many states he “called”, but really we should be judging on how closely the real percentages matched his given percentages. Dave hit the nail on the head when he says it’s all about “making a pick”. If you try to bring uncertainty into the discussion it’s considered a cop out. It’s an annoying phenomenon.

    Comment by DJG — February 21, 2013 @ 11:31 am

  48. You’re still misunderstanding slightly. We need to stop thinking in terms of “right” and “wrong” or even “wrong 75% of the time” but in terms of a probability distribution: “The value of x will fall in range R p% of the time.” Dave wants to say that predictions are the subset of projections where R is small and p is large.

    Comment by philosofool — February 21, 2013 @ 12:09 pm

  49. I’d love to see probability charts on what % chance team A will win 81+ games, etc. For that matter, it would be great to see the % chance that player X will exceed a WAR of 2, or 30 HR, etc., rather than just a prediction of the expected mean.

    Comment by Delmon Youngs sprained left fat — February 21, 2013 @ 12:09 pm

  50. Poseur alert!

    Comment by chuckb — February 21, 2013 @ 12:56 pm

  51. I hope your projection is way off.

    Comment by Baltar — February 21, 2013 @ 1:25 pm

  52. Now two kind people who didn’t like the comment need to give it a + anyway to make up for this. Please do.

    Comment by Baltar — February 21, 2013 @ 1:28 pm

  53. Technically, don’t all teams compete for a championship?

    Comment by Baltar — February 21, 2013 @ 1:32 pm

  54. So predictions/projections are like the positions of electrons–a probability cloud. I can dig that.

    Comment by Baltar — February 21, 2013 @ 1:36 pm

  55. This is exactly what I was going to say, but you beat me to it.

    Comment by Baltar — February 21, 2013 @ 1:38 pm

  56. The intent of this article is very important and articles on it ought to be written on this site from time to time. I especially liked the NBA lottery example. People should realize that all statements or statistics about the future are probablistic at best, but they don’t.
    The prediction/projection definition was unfortunate and the suggestion that all projections be shown as probability charts is, of course, totally unworkable.
    Nevertheless, keep sounding the trumpet.

    Comment by Baltar — February 21, 2013 @ 1:46 pm

  57. Great article, enjoyed reading it very much.

    Comment by obsessivegiantscompulsive — February 21, 2013 @ 7:16 pm

  58. I agree with the sentiment, I just find the use of the distinction between projection and prediction absurd.

    If we use that type of definition, than we should be clear on what are the cutoff R and p% for one or the other. Otherwise it is just a semantic discussion that adds nothing.

    Comment by Pepe — February 22, 2013 @ 4:01 am

  59. In 2013, the Astros, Marlins, and Cubs seem to have no interest in doing so.

    Comment by Greg Simons — February 22, 2013 @ 4:02 am

  60. wrr

    Comment by af — April 26, 2015 @ 3:23 am

Sorry, the comment form is closed at this time.

Close this window.

0.325 Powered by WordPress