The Sabermetric Library
Over the weekend, in a thread over at Tango’s blog, the idea of a “Sabermetric Library” was raised. As noted over there, one of the positives of the academic journal process is to catalog the work that has been done, making it easily searchable for future readers who are not following the discussion in real time. The statistical analysis crowd doesn’t have that kind of formal structure, which makes it difficult for those who come later to catch up on what has already been done.
Rather than employing a full time “librarian” to keep up with the most recent work, I thought perhaps we could just attempt to crowd-source this idea. So, that’s what we’ll attempt to do in this thread.
In the comments below, I’d like to encourage you to think back to influential articles that you’ve read about the game, and if you can, link to them. If they were written a book, link to it at a particular bookseller of your choice that carries it. If you can quickly summarize the conclusion, even better.
It doesn’t have to be an epic research piece that changed the face of analysis (such as Voros’ piece on DIPS), though those obviously fit in here, too. But if there is a blog post somewhere that explained something in a way that allowed you to understand it for the first time, link to that. If there was an interesting discussion on a popular topic (Blyleven for the HOF, maybe), then link to that.
The goal would be to populate the comments with enough resources to allow someone to go through and read a Best Of The Sabermetric Community collection of writings. There are a lot of good writers out there doing good work, but given the size of the internet, some of it can get lost in the shuffle. Let’s preserve the pieces that deserve to be kept alive, and at the same time, create a resource for those who come along in the future to find out about the work that has already been done.
In order to keep the layout easy to read, I would ask that you refrain from commentary about this post. Please limit comments to the format of linking to important pieces, with necessary comment about that piece as an abstract of sorts. If this takes off as I hope it does, we’ll do a discussion thread on another day about potentially culling the list, giving space for people to argue for or against any of the linked pieces below.

25


Happy to contribute what little I really know:
THT’s excellent xBABIP calculator, which gives you an expected batting average on balls in play value. Using this, you can calculate the projected line of an individual player (xAvg/xOBP/xSLG/xOPS) given their expected batting average on balls in play. It is a strong predictor of future performance of a player.
http://www.hardballtimes.com/main/fantasy/article/simple-xbabip-calculator/
For those without Excel (such as myself), the blog Cubs Stats made Chris Dutton’s xBABIP Quick Calculator available on Google Docs:
http://cubsstats.blogspot.com/2010/01/chris-duttons-xbabip-quick-calculator.html
http://www.hardballtimes.com/main/article/how-to-evaluate-hitters/
Even though it’s not the thought out version, I think Boswell’s “Total Average” You can always buy the book if you want to read some early “common sense” thought: http://www.amazon.com/Imitates-World-Penguin-sports-library/dp/0140064699
Sad that Boswell ended up writing stuff like this, but it shouldn’t diminish his contributions in the 70′s and 80′s.
Not to be a sycophant, but I think Dave’s USS Mariner post about evaluating pitcher talent is a solid resource, especially for folks looking for something at a more introductory level.
http://ussmariner.com/2006/08/29/evaluating-pitcher-talent/
Nate Silver’s Is Alex Rodriguez Overpaid from Baseball Between the Numbers is an excellent introduction to revenue curves and the whole “teams pay for wins” concept. Without this kind of work, the WAR framework losses a bit of its oomph.
http://books.google.com/books?id=uxdvwQdXbboC&pg=PA174&lpg=PA174&dq=Nate+Silver+A-Rod+overpaid&source=bl&ots=JAx3Ze6J9b&sig=IYKdB4cEn8gV3CvtU9xtp2LFsl8&hl=en&ei=S85dS9z5J43INc3t0foO&sa=X&oi=book_result&ct=result&resnum=2&ved=0CAoQ6AEwAQ#v=onepage&q=&f=false
I forgot to mention that the framework has been updated considerably, this is just a nice and well presented jumping off point.
Hell, the entire book is pretty good.
Of course, don’t forget PECOTA:
http://www.baseballprospectus.com/article.php?articleid=2659
The whole Baseball Between the Numbers book should be a starting point IMO.
While not in itself a sophisticated piece of sabermetirc research, I’ve often forwarded this Joe Posnanski post to my friends that are stuck in AVG/HR/RBI lockdown. It’s a great explanation of why those big 3 numbers are flawed and makes people more receptive to more advanced metrics.
http://joeposnanski.com/JoeBlog/2008/11/20/batting-average-home-runs-rbis/
My first intro to sabremetric thought, and I’m sure that’s true for thousands of others.
The Hidden Game of Baseball, by Thorn and Palmer
http://www.amazon.com/Hidden-Game-Baseball-John-Thorn/dp/0385182848
Pretty obvious inclusion:
http://members.cox.net/sroneysabr/JamesIndex/
Excellent piece on ballpark effects:
http://www.baseballthinkfactory.org/files/primate_studies/discussion/home_runs_and_ballparks/
I’m always looking back at these two articles when I’m thinking about UZR. They go through most details on how basic UZR is calculated and then how the various corrections are applied.
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2003-03-14_0/
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2003-03-21_0/
Not a monumental piece, but I was always impressed with Josh Kalk’s work with Pitch F/X on Hardball Times. Absolutely clear analysis, with plots that can teach you a ton about baseball in an statistics-based way, which is the whole point of sabermetrics.
I couldn’t find a great overview of everything, but I liked his pieces on Greinke and Hughes from ’08.
http://www.hardballtimes.com/main/article/anatomy-of-a-player-zach-greinke/
http://www.hardballtimes.com/main/article/anatomy-of-a-player-phil-hughes/
Pretty much everything that Tango has linked under “Research” on his main page is great. I especially like the baseruns stuff. It’s mathematically weighty, but it’s also very good.
Beyond the Boxscore’s Sabermetric Writing Awards are a good compilation of recent articles and a jump-start to the library.
http://www.beyondtheboxscore.com/2010/1/18/1253835/btb-sabermetric-writing-awards
Out of curiosity, why not incorporate a FanGraphs wiki into the website? Should be easy enough to accomplish and would give you, as well as us fans, the ability to update content as needed. MediaWiki is a fairly good one and there are others depending upon your needs.
Just a thought.
Oops. http://www.mediawiki.org/wiki/MediaWiki
Written by “Kincaid” of 3-D Baseball, this two-part piece takes a look at how to evaluate pitchers using FIP, and how the stat actually regresses balls in play:
Part I: http://bit.ly/1hVsPB
Part II: http://bit.ly/56d6Hd
A somewhat obscure piece that, though not widely publicized, still might have had an impact on a modest slice of the sabermetric community:
http://en.wikipedia.org/wiki/Moneyball
What’s funny is that Moneyball is often thrust up as the “Sabermetrician’s Bible” by detractors, but it doesn’t even explore advanced statistics all that much. It’s much more about the behind-the-scenes of front office and their exploitation of market inefficiencies (which, at the time, was as simple as “patient hitters are undervalued” — need no stat more advanced than BB%) than anything else.
Here’s an in-depth attempt at disproving the “pitching to the score” argument that Jon Heyman types use to push Jack Morris for HOF:
http://www.baseballprospectus.com/article.php?articleid=1815
sorry ’bout the missing link.
No one should utter a syllable about the historical archives of sabermetrics without naming Earnshaw Cook’s Percentage Baseball, then pausing for a moment of silence (if only to sigh quietly about the awkward first step taken by this baby of which we’re so proud now).
I had been poking uninformedly at ideas about Markov chains when I first read Mark Pankin’s article in the Great American Stat book ().
Hmm, the link didn’t show up:
http://www.pankin.com/markov/intro.htm
I have one for the people who are relatively new to sabermetrics. Alex Remington is an author for Yahoo Sports and over the past few months has been writing articles explaining the workings of certain stats such as BABIP, OPS+, FIP, wOBA, WPA, WAR, UZR, J-HOFFA, and Win Shares with more on the way. He explains what each stats means, how it’s calculated, what it’s good for, what it’s bad for, and why we should care about it. Check it out, spread it around.
http://sports.yahoo.com/mlb/blog/big_league_stew?author=Alex+Remington
DIPS
http://www.baseballprospectus.com/article.php?articleid=878
After we were just told, “It doesn’t have to be an epic research piece that changed the face of analysis (such as Voros’ piece on DIPS)…”
I learned everything I know about stats from Fangraphs, but this is a nice intro to linear weights in an ongoing series by Shawn Goldman over at Bleed Cubbie Blue:
http://www.bleedcubbieblue.com/2010/1/17/1255925/uzr-error-fail-or-win-a-lesson-on
>
> As a long time fan of the game, I am only now beginning to understand
> the importance of this stuff. This particular post by Rory Paap of
> Paapfly.com helped me understand how luck and BABIP affects ERA and
> thus can create a huge difference between ERA and FIP.
>
> http://www.paapfly.com/2009/12/affeldt-stars-aligned-in-2009.html
>
> http://www.PaapFly.com is a great blog for everyone, statheads to average
> baseball fans.
>
>
Great, straightforward article on batting stats:
http://www.hardballtimes.com/main/article/how-to-evaluate-hitters/
I’ve always thought that FJM’s glossary is an interesting place to start someone on key principles (like worthlessness of pitching wins and the like). The humor makes the concepts very accessible.
http://www.firejoemorgan.com/2005/04/glossary-of-terms.html
My brother has been teaching me a bunch about the stat analysis trend in baseball. He uses some of the new math to breakdown the homeetown SF Giants. Here are a couple examples:
http://www.paapfly.com/2009/12/can-buster-fill-bengies-shoes.html
http://www.paapfly.com/2010/01/moneyball-and-beane-are-evolving.html
http://www.paapfly.com/2009/12/affeldt-stars-aligned-in-2009.html
I felt a bit out-of-the-loop when I first discovered the wonterful world of intelligent baseball analysis last year, one of the first places I went to was “The Book”. In my opinion “The Book” has been the best help to me. However, a sabermetrics wiki may prove to be a very useful education tool.
Okay, so believe it or not this just came out today, but I am sure these will become fundamental metrics, their descriptions are short and clear, and the justification for development helps further define the metrics they develop from.
Basically, they nuance our understanding of pitching and luck by creating x-versions (e.g. xBABIP) to account for things really in pitchers’ control.
http://www.hardballtimes.com/main/fantasy/article/introducing-xw-xbabip-xlob-xhr-fb-and-more/
I really liked Jim Albert’s and Jay Bennett’s book “Curveball”:
http://books.google.com/books?id=jqCujQ_Ww54C&printsec=frontcover&dq=curveball+jim+albert&source=bl&ots=6Yx1Yf2asL&sig=lSNLFU5nRkrCMgdbRRIHAbPeonQ&hl=en&ei=WVReS9GpM4O0tgec1qi8CQ&sa=X&oi=book_result&ct=result&resnum=8&ved=0CCQQ6AEwBw#v=onepage&q=&f=false
Alan Schwartz’s The Numbers Game is a good history of baseball statistical analysis for those who thing they began with Bill James musing in a boiler room in Lawrence, Kansas.
http://www.amazon.com/Numbers-Game-Baseballs-Fascination-Statistics/dp/0312322232/ref=sr_1_14?ie=UTF8&s=books&qid=1264480911&sr=8-14
John Walsh’s building blocks of Pitch f/x work:
http://www.hardballtimes.com/main/article/fastball-slider-changeup-curveball-an-analysis/
http://www.hardballtimes.com/main/article/pitch-identification-tutorial/
http://www.hardballtimes.com/main/article/the-eye-of-the-umpire/
http://www.hardballtimes.com/main/article/how-fast-should-a-fastball-be/
http://www.hardballtimes.com/main/article/searching-for-the-games-best-pitch/
I’m still new to the sabremetric community (this is my virgin post). I still have a lot to learn, but I really enjoyed learning about tRA and wOBA, which, to my naive mind, sure seem like the best pitching and hitting stats around (of course, I don’t know about tRA* and tRA# and tRA~ and all the offspring of tRA). I hope it’s okay to post a couple.
Graham MacAree’s post explaining tRA:
http://www.lookoutlanding.com/2008/6/23/557089/the-big-tra-post
wOBA, explained briefly:
http://www.insidethebook.com/woba.shtml
wOBA, explained more thoroughly:
http://www.insidethebook.com/ee/index.php/site/comments/the_history_of_the_woba_part_1/
Infield defense:
http://www.hardballtimes.com/main/article/infield-defense-mdash-back-to-basics/
Great idea for a post. This gives me lots of homework to read up on!
http://www.askrotoman.com/fbguide/samediff.pdf
My first exposure to a more sabermetric approach which I felt gave me an edge in ranking pitchers for my first ever fantasy baseball season.
http://www.askrotoman.com/wordpress/?p=1033
The following year’s post per my request.
The old “Baseball Prospectus Basics” series, which ran in 2004, still holds up pretty well today. Great discussion of a wide range of topics, by several noted analysts, including Woolner, Silver, Click and others:
http://www.baseballprospectus.com/news/index.php?column=31
That series was, to some extent, a take-off on Woolner’s “Baseball’s Hilbert Problems” work, which first came out a full decade ago in BP2000, then was updated for the site in 2004:
http://www.baseballprospectus.com/article.php?articleid=2551
Also, two great (older) pieces on how fix revenue sharing:
The Zumsteg Plan:
http://www.baseballprospectus.com/article.php?articleid=1599
Keith Woolner’s take:
http://www.baseballprospectus.com/news/20020418woolner.shtml
http://www.startwedman.com/2010/01/what-youve-all-been-waiting-for-aint-it.html
I highly recommend these two pieces by FanGraphs author Mitchel Lichtman, as he tries to apply game theory to decision making in baseball. Obviously this is not straight on sabermetric work, but I think he’s on to something.
http://www.fangraphs.com/blogs/index.php/were-the-yankee-sac-bunts-in-the-8th-inning-correct
http://www.fangraphs.com/blogs/index.php/should-lidge-have-thrown-more-sliders
Posted this at Lookout Landing:
http://www.lookoutlanding.com/2010/1/26/1271512/on-the-shoulders-of-giants-a
This is, in a sense, something of what I was trying to compile here:
http://www.beyondtheboxscore.com/2009/12/17/1200459/want-to-help-me-plan-my-baseball
Focus might be slightly different, but there are a lot of good links there as well as good stuff submitted in the comments.
-j