Are there two Jonah Lehrers?

I have been acquainted for some time with the work of one Jonah Lehrer, who has written books and articles on various aspects of psychology and neuroscience. I have not always agreed with Lehrer’s interpretations and conclusions, but I have appreciated his efforts to relate recent research to literature (e.g., in his book Proust Was a Neuroscientist) and to understanding modern life (e.g., in his book How We Decide).

On Monday, I became acquainted with another Jonah Lehrer, a seemingly different being who evidently inhabits the same body. Jonah Lehrer 2.0 finds fault with sabermetrics and the kinds of analyses presented on sites like The Hardball Times. His complaint is that sports teams…

…are seeking out the safety of math, trying to make extremely complicated personnel decisions by fixating on statistics … they are pretending that the numbers explain everything.

…This is largely the fault of sabermetrics. … The underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league.

…But sabermetrics comes with an important drawback. Because it translates sports into a list of statistics, the tool can also lead coaches and executives to neglect those variables that can’t be quantified.

Lehrer 2.0’s concern over whether variables can or cannot be quantified seems odd, especially in light of 1.0’s obvious familiarity with psychology. Lehrer’s subject, Proust, famously, wrote about memory, of course, and Proust’s work remains well known to this day because he captured so elegantly many aspects of memory that are not so easily articulated, much less quantified.

But today’s psychologists who are interested in memory quantify everything they study, and other psychologists invent ways to quantify depression or extraversion or risk-taking or anything else they seek to study scientifically. One of the first things that psychology majors learn is that concepts and variables relating to unobservable inner workings of the mind must be “operationally defined,” that is, described in such a way that they can be measured.

So “memory” might be defined in one study as how many words a person can recall from an earlier-studied list, “extraversion” might be a score derived from a personality test, and so on. No one who studies memory believes that any one operationalization captures everything there is to be known about memory, or even that the sum total of all the operationalizations ever used gives us a complete picture. Some aspects of memory no doubt remain unquantified. But modern psychology depends heavily on quantification. Lehrer 1.0 must know that.

So he should tell Lehrer 2.0 that sabermetrics is nothing more than a way to operationalize variables, which is appropriate, since athletics is another form of human behavior like any other that psychologists might quantify and study. Sabermetrics is founded on basically the same principle of operationalization that so much of the work of Lehrer 1.0 has depended on, and sabermetric measures are used in similar ways.

For instance, psychologists sometimes disagree about which operational definition is most appropriate to a given research question. In the same way, when sabermetricians introduce a variable like OPS, they are more or less just objecting to the idea that batting average is an adequate operationalization of hitting ability. I have not seen anyone write that OPS is a perfect measure of hitting ability, just that it is better. And just as the results of a psychological study might be misinterpreted by one psychologist or another, the results of any sabermetric analysis might be misapplied or misunderstood. That fact does not support the concerns that Lehrer 2.0 has raised.

Lehrer 2.0 similarly misunderstands sabermetric results that are yielded from statistical methods that Lehrer 1.0 would find familiar, because modern psychology relies heavily on various kinds of statistical modeling, for example in brain imaging studies. In sabermetrics, such models specifically recognize the impossibility of making perfect predictions and thus are not “pretending that the numbers explain everything” as Lehrer 2.0 says.

One example, Baseball Prospectus’ PECOTA forecasting system, gives a percentile forecast that takes uncertainty into account by making an estimate of the probability that a player will achieve some statistical standard (such as, a 50 percent chance that a player will have an OPS of .800 or higher), rather than yielding a single number in an attempt to “explain everything.”

Users of PECOTA know that it is not true, as Lehrer 2.0 says, that the “underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league.” The underlying assumption is that there is a range of possible outcomes and that the real world works according to a very large number of quantified and unquantified factors.

It is also unclear how Lehrer 2.0 would have sports executives deal with those variables that he thinks cannot be quantified. PECOTA deals with that by considering the range of possible outcomes and estimating how likely they are. Lehrer 2.0’s prescription for handling the unquantifiable is confusing:

If we were smarter creatures, of course, we wouldn’t get seduced by the numbers. We’d remember that not everything that matters can be measured, and that success in sports … is shaped by a long list of intangibles. In fact, we’d use the successes of sabermetrics to focus even more on what can’t be quantified, since our new statistical tools take care of the stats for us.

So what does Lehrer 2.0 mean by “focus even more on what can’t be quantified?” If we are not to measure those variables, what do we do instead? The most-usually-offered alternative to using statistics to make roster decisions is to use the judgments of scouts.

Using Recurrent Neural Networks to Predict Player Performance
Technology is rapidly advancing possibilities in decision-making.

But that notion, at least in the sense that it offers an alternative to quantifying things, is based on a misunderstanding: The scout’s judgments are themselves measurements of a sort. They rank player A ahead of player B, ahead of player C, and so on. By assigning ranks, they have put numbers on the players just as surely as if we calculated their WARP. And what do we do if the scouts disagree, as they so often do? Average their rankings? More statistics!

Lehrer 1.0 knows from writing his book How We Decide that human decision processes are fraught with systematic biases and errors that the scouts must be as subject to as the rest of us. Lehrer 2.0’s recommendation that it would be better to rely more on those processes and less on sabermetrics seems especially odd in that light. (No, I am not saying we should simply ignore the scouts, but we should not pretend they are not human, either.) I think Lehrer 1.0 is closer to the right way to think about this—I guess I’m not yet ready to upgrade to Lehrer version 2.0.


Print This Post
Sort by:   newest | oldest | most voted
Marc Schneider
Guest
Marc Schneider
It seems to me that, to the extent Lehrer’s complaint has validity, it’s relating to sports blogs and commenters rather than to baseball executives.  Despite your point that sabermetrics deals in probabilities, not absolutes, you often do not get that impression from reading stat-oriented blogs and comments.  There is a sort of absolutism in many cases, IMO.  For example, sabermetrics show that bunting is generally not a good idea but they don’t say you should never bunt.  But, if you go on stat-oriented blogs, many people will grow furious at the idea that their team ever bunts.  And, unfortunately, I… Read more »
Salo
Guest
Salo
Offensively, baseball elite (even to a not_geeky_who_studied_physics_like_myselfe_fan) have always been those with the best numbers. The only thing saber metrics are doing now is focusing on value added and not whatever ERA measures. I almost completely agree with the saber-community in this counter attack on Lehrer´s writing. But there is one thing that I´ve learned after reading Lehrer´s own how we decide. There are things; like that a pitcher is taking too long to deliver, generating a stealing opportunity, which our brain can analyze. The only thing is our brain tells us thru our feelings, not our reason. So if… Read more »
Brad Johnson
Guest
Brad Johnson
Salo, I can see what you’re getting at. If I remember that chapter correctly, the more complex a decision, the less we should think about it – so long as we have experience with that type of decision. For example, if the decision we’re given is “should I bring a banana or orange to work?” then we should take a moment and think about it. The variables are simple. Both are healthy foods. One is tangy, sweet, and juicy, and one is sweet, dense, and kind of mealy. You’ll be happiest if you pause to consider which flavor profile you… Read more »
Salo
Guest
Salo
@brad I agree, if you want a bunt to improve your situation you have to have a good bunter, that is measurable and so are most hitting abilities. I was talking of high pressure scenarios, you don´t want the better hitter, you want the one who gets a hit now. The result of that at bat cannot be predicted by stats. Luck is a gap due to our limited analysis capacity, and sometimes our brain can partially fill that gap and it “tell us” in an instinctic way, not an intelligent one. Of cures analysis should be included in team… Read more »
Brad Johnson
Guest
Brad Johnson

I agree with that. Sorry if I didn’t make it clear. A manager’s job is to know his players. If he would rather see the struggling Geoff Jenkins in there against Grant Balfour than recent hero Matt Stairs (a zany appearing decision that Charlie Manuel made against the Rays in 2008) than he should make that move. That’s what he’s there for as a game manager, that insight.

Let us internet pundits argue later whether using the slumping Jenkins was the right move or just dumb luck.

wpDiscuz