We took sabermetrics to the streets this week, and tried it out with concerts. But the ‘readily available’ or ‘replacement-level’ concert is woefully hard to define. Beer? Not so much. Go to your local bodega and look at the beer aisle and you have easy candidates for replacement-level beer. And so baseball’s WAR framework can easily be applied to suds.
Beers. Above. Replacement.
Some might argue for wins above replacement — it’s not ‘baseball players’ above replacement a chatter said on Thursday — but we have to differentiate, and what better way to differentiate than to create a metric named BAR? No confusion here as to what is being measured. Call ’em “beer wins” too, if you have to.
Some might argue that this is a subjective thing — and they’re right — but subjective things get rated all the time. Look at Yelp, Zagat or Rotten Tomatoes for your easiest cross-subject comps. And what we are suggesting here wold sit on top of an existing Yelp-like platform: Beer Advocate. At Beer Advocate, users rate the beer out of five stars and produce an aggregate rating out of 100.
As we established with concerts, context is king. What you ate or drank before the beer counts. In which order you drank your beers. How dehydrated you were to begin with. The temperature of your beer is inversely correlated with the fullness of the taste, and yet coldness is a desired trait. If you’re having it at a bar, the length of time the keg has been sitting there. Most importantly, perhaps, the last time that tap was cleaned. (Don’t think about that one too hard.)
In any case, the wisdom of the crowd mostly removes these issues. One or two people can have a Green Flash West Coast IPA off a dirty tap, but 100 won’t. But there’s one issue that remains: selection bias.
Mostly, selection bias is a sample issue. If you’re testing aging curves, and only good players survive, then you have a selection bias. In this case, selection bias is about the people that go to Beer Advocate to rate beers. You won’t get your typical Budweiser drinker to log on to Beer Advocate to write up a review of the appearance, smell, taste, and mouthfeel to produce an overall rating of the beer, unless to do so in jest. And that’s how Budweiser gets a rating of 56.
Can we just set the replacement level at 56, then, and call it a day? Maybe not. There are players that are worse than replacement. The Delmon Young and Chone Figginses of the world, you might say. And if you’ve had a Budweiser and are a craft beer drinker, you might agree. A better candidate for a replacement beer might be a Stella Artois — readily available and without the piss part. Maybe a Mexican lager would also make sense. So, maybe something like Stella’s 71 and Dos Equis’ 72 rating should be our replacement level.
But you might notice something here. We’re talking lagers. As with baseball, we need to make a positional adjustment. The worst pale ale might be better than one of the best lagers, but BAR needs to be a level playing field. The replacement level is higher for some types. So we have to index the beer advocate ratings to the style. Take a look at some sample styles:
Top American Adjunct Lager: Schlitz Gusto, Schlitz (3.59/82)
Top American Malt Liquor: Big Daddy J’s, Full Sail (3.37/83)
Top Czech Pilsner: Reality Czech, Moonlight Brewing (4.31/96)
Top Bock: La Troppe Bockbier, Bierbrouwerij De Koningshoeven B.V. (4/90)
Top Oktoberfest: Augustiner Bräu Märzen Bier, Augustiner (4.18/93)
Top Dubbel: Trappist Westvleteren 8 (VIII), Brouwerij Westvleteren (4.52/100)
Top Saison: Ann, Hill Farmstead (4.53/100)
Top American Double IPA: Heady Topper, The Alchemist (4.71/100)
Second-best American Imperial Stout: Founders CBS Imperial Stout, Founders (4.66/100)
You see what’s going on here. And there’s even a story behind this story, as the top IPA (Susan, Hill Farmstead, 4.5/98) and stout (Rise Up Stout, Evolution, 4.31/95) came up short when compared to the top double IPA and top imperial stout. The darker, the better, at least when it comes to the Beer Advocate crowd.
And yet, I personally like a little effervescence and what I call ‘drinkability’ from my beers. I can get bogged down in ABV and sweetness. I don’t always love the thickest, craziest, fullest beers out there.
So: we index the beers’ ratings within their ‘position.’ That makes one of my favorite double IPAs, Firestone Walker’s Double Jack (4.34/96 and somehow the 37th-highest DIPA) roughly equivalent to The Crisp (3.92/88), Sixpoint’s German Pils. Because the best German Pils is Victory’s Prima Pils… from Pennsylvania (4.1/94). Eh, it’s just a style.
After we produce some sort of spreadsheet with indexed beer advocacy ratings, we will need to set a replacement level for each type of beer. That requires some sort of math. (Should it be based on the number of rated within each style, some sort of median? Or should it be a mean? How do we use the ratings to find a replacement, and can it be obscure or should it be a big beer?)
And there are still more questions after that. Should we average these with untappd ratings? Should we then weight the beer differently if it came off the tap or cask? What other context can we put into numbers? How much should actual ubiquity factor in? What about how advanced the beer is — there are such things as beginner beers, after all. Would a note for can/bottle make sense or is that subjectivity creeping in? A clear bottle lets beer-skunking sunlight hit the beer, perhaps that could be an asterisk? If you cellared the bottle, should it get a half-win of ageing, or lose it? Is this all a waste of time?
And then finally, we’ll have to round the numbers anyway. Because there’s no such thing as a 3.2-BAR beer.
(There are some ten-win Mike Trout beers. My candidates that come to mind easily: Cask Racer Five IPA from Bear Republic, Cask Speedway Stout with Vietnamese Coffee from AleSmith, Supplication from Russian River, Sculpin IPA on Tap from Ballast Point, Pliny the Younger IPA from Russian River.)