Fantasy semantics

If you’ve never taken a course in econometrics, I encourage you to do so if you have the chance. Actually, any course that teaches statistical methodology will do. Even if you never want to “crunch numbers,” it will teach you how to think and read “probabilistically.” Since nature, life and fantasy baseball are inherently random, understanding the semantics of probability is essential, doubly so if you’re purporting to offer advice.

Things basic econometrics has taught me:

1. We can still say something about coin flips even though the outcome can be either heads or tails.

This may seem obvious, but I’ve seen educated people argue that just because we can’t be sure about the outcome (or that “the outcome could be anything”), it is useless to talk about forecasts or numbers or statistics. Obviously not. We can still talk about which outcomes are more likely than others. We can still talk about the probability of outcomes. We can still say that the odds of a heads is 50 percent and that the outcome of a fair die roll is more likely to be between two and five (inclusive) than it is to be either one or six.

2. In an ideal world, tell me everything—give me the probability of everything.

Let’s say you meet God. God turns out “to play dice with the universe.” He doesn’t know whether Chipper Jones is going to retire midyear, but he does know the probability that it might happen. He knows his own dice.

Why ask God only how many home runs he “expects” (that is, the average amount) Chipper to hit? Why not get more information from him? What’s the probability that he hits fewer than five (tantamount to asking the probability that Jones gets injured)? What the probability that he hits 10, 15, etc.? With this information, you’d have a better idea about how risky Jones is.

3. It can be very tempting to take shortcuts when writing.

Actually, I learned this writing about baseball. I like to keep my columns as simple as possible while still making my point. I try to avoid adverbs when possible (though I am rarely successful). Writing “probabilistically” without adverbs is difficult—words like “usually, probably, likely” are useful. I have the same problem with numerical information. Yes, in a perfect world, I would just give you my probability of everything when I talk about my forecasts for Chipper, but parsimony and limited attention spans demand that I give you only as much as I deem relevant and interesting.

On the relationship between “experts” and readers:

The key is trust and establishing consistency. It is possible for one expert (say, Ron Shandler) to use mostly intervals and another (say, Derek Carty) to provide mostly point estimates. Intervals are kind of nice, but they require more disclosure. It is OK for Shandler to prefer to say (paraphrasing) “Miguel Cabrera is likely to have a home run total in the 30s” as long as we know what he means by “likely”—40 percent? 90 percent? Similarly, it is OK for Carty to say “Cabrera is projected to hit 37 home runs.” If Carty gives us some interval around it, too (“standard error bands” in statistics speak), then Carty’s statement is very similar to Shandler’s even though they’ve used different words. (In fact, I am just sort of rephrasing what Carty wrote about on Tuesday. My problem with Shandler’s recent writing is that he forgot a version of Lesson One above.)

Readers and writers need to come to a sort of tacit understanding about language. More often than not, writers are going to give numbers for everything. If a Shandler-esque writer wants to say “Cabrera is likely to hit around 35 home runs” instead of giving lots more numbers, than he should be consistent with what his words mean. Approximately what does “likely” mean?

On arguments within the expert community:

What goes for communication between adviser and advisee goes doubly for these blogged exchanges between experts. It is very hard to champion your cause against another “expert” in a venue designed to still be accessible to the layman reader. Actually, it is very hard to do it in any venue.

I’ll have more on the quants-versus-quaints (in case you can’t tell which side I’m on) debate in my next article, the tenor of which has actually be very good I think. Many expert exchanges are not nearly as interesting in part because one expert will say something semi-informative but mostly substantive like “It is good to use statistics to forecast how many home runs Cabrera will hit.” And then the opposing expert will say something like “Statistical forecasts are always wrong. I prefer to go with my gut.”

My problem with the second statement is that it is absolutely true but totally practically false. Forecasts are always wrong, but they are still incredibly useful. Most experts, even those who haven’t taken econometrics, know this to be true. The more literally accurate statement, “I project that Cabrera will hit between 35 and 45 home runs with 95 percent certainty” would be more bulletproof to these kinds of flatulent responses, but all of those numbers are superfluous to the argument. At some point it would be better if some details could be taken for granted.


Print This Post
Sort by:   newest | oldest | most voted
Derek Ambrosino
Guest
Derek Ambrosino
I was being totally sincere when I titled my article from two weeks ago. Many, many debates really boil down to nuanced differences between the way two “sides” use and interpret the use of a few key terms. Of course, language as wonderful and useful as it is, is woefully inadequate to capture the infinite complexity of human thought. And, the chasm between the essence of my thought and my ability to communicate it is the source of many a disagreement. When I say “likely” I perceive it to mean something very specific within the context of how I am… Read more »
The A Team
Guest
The A Team

I agree 100%. Econometrics is how I found myself falling down the rabbit hole known as sabermetrics. I wanted to do my final paper in the class on baseball performance. My metrics paper was a fairly flawed construct that looked just at free agents over a 3 year period using stats like OPS, but that lead to my Labor Economics capstone that used fangraphs WAR from ‘02-‘08 and all position players. Since then I’ve fallen further and further down the hole…

Derek Ambrosino
Guest
Derek Ambrosino
A-Team, I hear ya. I know a number people I would consider brilliant who have absolutely no interest in baseball. I’ve tried to tell them that they would absolutely fall in love with it if they developed an interest in it by thinking about studying the game as trying to understand and make sense of a complex system. I tell them that their ignorance of many of the game’s basics might actually work in their favor. Most of the game’s fans first have to unlearn the false knowledge they’ve been indoctrinated with by charlatans of the industry who are painted… Read more »
RMR
Guest
RMR
It would be wonderful if the quants could get together and agree upon a standard notion for expressing confidence intervals around projections in text.  Even something as simple as is done with polling data (X +/- Y) would be a big step up. Personally, I see “likely” as 1 SD, “very likely” as 2 SD, and extremely likely as 3 SD. Of course, I think part of the reason this isn’t done is because we have a standard sample size and mechanic of variability, so we can develop a reasonable intuition about error bands.  Give me an AVG or OBP… Read more »
The A Team
Guest
The A Team
I don’t know RMR, a lot of people I talk to are willing to dismiss a projection system because one player got hurt and hit 15 homeruns instead of 25. Or a healthy Juan Pierre hit .350 over a month when he was predicted to hit more like .280, etc etc. Most of the people I interact with are either completely ignorant of Statistics or Econometrics or very nearly so. And Derek, as long as a standard 5×5 uses R, HR, RBI, SB, AVG, W, SV, K, ERA, and WHIP as its stats, people aren’t going to develop a truly… Read more »
Jonathan
Guest
Jonathan

For me, the first order improvements in performance comes from understanding the basics – such as regression to the mean does not imply substandard future performance, just sub-current performance.
One can learn the basics just talking about stuff – no math necessary.  I just find that many don’t see this stuff talked about unless they take a stats or metrics course.  If at the end of such a course you walk out not being able to compute a standard deviation but still remember the basic principles, I think you’ve done yourself a great service anyway.

judas
Guest
judas
All I know for sure is that I know less about baseball by learning more about metrics.  Sure, xFIP, BABIP, HR/FB, Z-Swing% and the like can help one’s understanding of what can be expected and what is exceptional.  But as the Sabermetric movement gathers more steam it becomes more full of hot air. I am a relative newb, only coming across fangraphs last summer, and fell in love with baseball by virtue of the numbers, not the game.  But along with me have come many, many more that have began to adopt the SABR POV.  This has muddied the landscape. … Read more »
Derek Carty
Guest
Derek Carty
Thanks for chiming in Judas.  I think you’re right that numbers can be misused (and are by some), although I do think they are absolutely a necessity when used correctly.  That being said, scouting is also a very important thing (check out my bio line).  Scouting is very different, though, than “faith without numbers”.  Scouting is a very difficult thing to do is a very different than simply watching a player on TV and forming a vague impression about him (not to say that’s what you mean, just that scouting is a term that often gets tossed around to mean… Read more »
Derek Ambrosino
Guest
Derek Ambrosino
A-Team, It doesn’t really matter whether you continue to use triple crown stats in your leagues; that doesn’t preclude understanding the importance of peripherals and gaining insight into the basics of the discipline of statistics. If I’m betting on a player’s AVG, I want to know his BABIP, his trajectory splits, etc. I want to know whether his past, or current, output is sustainable. Sabermetrics is not a specific tool kit of numbers and stats, it’s a way of approaching the process of deriving understanding from what you are seeing on the field and in the box scores. Judas, I… Read more »
judas
Guest
judas

Thanks for the thoughtful responses.

I suppose a way of guarding against the misuse of statistics would be to draw a line in the sand, so to speak, of where/when we have enough data to make a solid assessment of what is going on.  Where does SSS end and a useful amount of data begin?  I know it varies from stat to stat. 

Any chance you guys could put something like this together?

Derek Carty
Guest
Derek Carty

Hey Judas,
Pizza Cutter did some work on this a couple years ago, and I reposted the results here: http://www.hardballtimes.com/main/fantasy/article/when-do-stats-become-meaningful/

I may post an update of this sometime this week, though, since it is an important thing.

judas
Guest
judas

thanks!

Peter Kreutzer
Guest
Peter Kreutzer
I enjoyed your article Jonathan. THTfantasy is all over this discussion and that’s good for all of us involved in it and all of us playing the game, but I keep feeling that something is being missed. Chris’s polemic about his instinctive approach to drafting has turned what should be a discussion about what information is useful and what information is not (or less so) in winning fantasy baseball leagues, into an absurd caricature of quants versus the instinctives, in which the quants claim that anything that isn’t spreadsheetable is lazy and sloppy. It isn’t nearly so easy. My goal,… Read more »
wpDiscuz