Be Jolly

imageBill James is back with a vengeance. He’s not only published his first baseball annual in a while—The Bill James Gold Mine 2008—but he’s rolled out his very own website Bill James Online (the URL is It’s a subscription site, but the price is only $3 a month, and you can sign up for three months at a time instead of having to sign up for an entire year. Good deal.

Of course, Bill James is the reason many of us waste our time thinking and writing about baseball today. His Abstracts opened up my baseball mind in the 1980s and taught me to follow the sport from an entirely different angle. His innate curiosity, coupled with his very accessible writing style and analysis, opened doors that I didn’t even know were closed. It is good to see him return to a regular writing gig.

The Gold Mine and the website are filled with James’ typically trippy writing (regarding Mark Teixeira: “He’ll go in the Hall of Fame before I learn to spell his name.”) and keen insights (“MVP voting is very friendly to slugging first basemen and RBI men; Hall of Fame voting is not.”). There’s also a great big added bonus: stats. Not really just stats, but stat tables presented in some new and penetrating ways.

James and the Baseball Info Solutions team have put together stats tables that are hard to find elsewhere. Some of the most interesting are…

  • Baserunning plus/minus (also published in the Bill James Handbook)
  • Fielding plus/minus statistics (published in John Dewan’s Fielding Bible, except these tables are updated through 2007)
  • Pitch Type analysis (how often each pitcher throws each type of pitch, with splits for left-handed and right-handed batters)
  • Pitch analysis (for both pitchers and batters), based on the pitch’s location
  • Hitting analysis (breakouts for batted ball types to left, center and right—and what happened to each one)
  • Career pitching record against the quality of the competition
  • Batter’s impact according to batting position in the inning
  • Batting by quality of the opposing pitcher

Imagine the book you could write if you had access to these sorts of tables for every major league player, and now imagine what a writer like James could write. That’s the Gold Mine.

Here are just a few of nuggets that James has pulled out of his mine:

  • The Reds won 24 games started by Aaron Harang in 2007—the most of any major league pitcher.
  • Darin Erstad‘s 2000 was the fourth-flukiest batting season of all time.
  • Craig Counsell was the most extreme pull hitter on the Brewers last year.
  • Lastings Milledge was thrown 50 inside pitches and chased only one of them. He was thrown 132 outside pitches and chased more than half of them.
  • The Mariners had only 71 “good games” (according to James’ Game Score) but won 88 of them, the biggest advantage in the majors.

Another table I enjoyed was the Team Lead By Inning, which displays whether a team was in the lead, tied or behind (on average) at the end of each inning. For instance, the Dodgers were ahead in 54 games and behind in 73 games (with 35 ties) after four innings. After five innings, they had moved up to 64-69 (29 ties). They owned the fifth! I love stuff like this; it may mean nothing—or it may mean that the Dodgers had a poorly constructed lineup, their starters improved during games, or they just didn’t show up on time. Who knows? With stats laid out like this, you can find the flukes that make baseball worth pondering.

Very few tables are presented for all teams (and the ones that are, are pretty “vanilla”), so you can’t go through the book and compare teams and players yourself. The point of the book is that Bill James has done that for you. That’s wonderful unless, like me, you like to dig deeper. For instance, James notes that Juan Pierre was relatively effective as a leadoff hitter in 2006: His team scored .61 runs when he led off, compared to .50 when he batted second and .29 when he batted third.

You can probably spot the flaw in that logic: If Pierre is batting third in an inning, that means that the 8th and 9th hitters batted before him and most likely made outs. James acknowledges the fact, but then says, “Still… try to find another leadoff hitter who has the same data.”

Well, you can’t do that in the Gold Mine. The good news, however, is that you can find this information in Bill James Online. And it is a lot of fun to look through these tables yourself. For instance, I found that the Padres scored .74 runs per inning when Marcus Giles led off last year, but only .34 when he batted second and .31 when he batted third. I’m not sure what it says about Giles or Pierre (probably nothing), but isn’t that weird?

By the way, did you know that a player named Giles led off in 147 of the Padres’ games last year? That’s 78 by Brian and 71 by Marcus. That’s courtesy of Baseball Reference.

Anyway, James has stated that he wrote the book for the 90% of his readers who don’t need or want the detail. The other 10%, like me (and maybe you), can subscribe to the website to get as dirty in the details as we’d like. The plan works well.

My only complaint about the tables is that there is rarely any context provided in them. The Pierre example is a good one—how can we make an inference about Pierre’s stats unless we know something about his situation, or at least what the league averages were for each situation? We have to guess and imply.

Another example is the batted ball table. James and BIS have broken out, for each batter, how often he hit a ground ball, line drive or fly ball; which field he hit it to; and the outcome of each. This is great stuff, but we aren’t given any context. For instance, all batters have high batting averages on line drives. How can we know if a particular player has a relatively high or low batting average on line drives (or ground balls, or whatever)? We can’t, unless James tells us in the comments.

There are no contextual stats on the website, either. No league or major league averages. This makes it hard to truly understand and appreciate many of the tables.

Compare this approach to the one adopted in the Hardball Times Annual (full disclosure: The Hardball Times produces the Hardball Times Annual!). In our tables, we don’t present all the batted ball detail available in the Gold Mine. Instead, we summarize each type of batted ball into a run impact (which provides context by attributing run values to singles, home runs, etc.) and then we present the major league averages for each batted ball on the same table. You can find an example in this article.

Tebow or Not Tebow, a Visualization
When it comes to the Mets' famous minor leaguer, it's not just will he get major league time, but should he.

In other words, we translate the detailed stats into an “impact” measure, and we also provide context for interpreting those measures. The Gold Mine takes the opposite approach, emphasizing the details instead of the context. An interesting choice, but it would be nice to have the context somewhere.

It would also be nice if there were a few definitions. James refers to Win Shares, Game Scores, Runs Created and Clutch Hitting in the Gold Mine, but he doesn’t define those stats anywhere in the book or on the site, at least not that I can find. Well, he does refer to the purpose of Game Scores in a box on page 303, but that takes some digging to find.

The lack of a Clutch Hitting definition is a particular oversight. In the Gold Mine, James announces that Brad Hawpe was the Clutch Hitter of the Year (in the Annual, we anointed his teammate Troy Tulowitzki for the title, except we called him the “clutchiest”) yet nothing in the book or on the site (that I can find) indicates how James analyzed clutch hitting.

My hope is that the website will add the statistical context and definitions eventually. I also hope they add the ability to sort stats, instead of just displaying individual player tables. It will help complete what the Gold Mine has started.

In addition to the nuggets and tables, organized by team, there are the typical James essays. Some of my favorites were:

  • “Hall of Famers Among Us,” in which James gives his opinion about the Hall of Fame chances of each current player. The percentages are okay, but I particularly enjoyed his comments about individual players.
  • “Measuring Consistency,” which includes a typical Jamesian formula to determine which players had the most consistent (and inconsistent) careers. Most consistent? Hank Aaron.
  • “Strength Up the Middle,” an investigation of whether pennant-winning teams really are stronger up the middle.
  • “The Targeting Phenomenon,” in which he shows that players really do target certain numbers, such as 20 wins and 100 RBIs.

What’s more, there are several shorter (single page) essays scattered throughout the team comments. These were all enjoyable—a good example is “Howard’s Mark,” about the history of batting strikeout records. When it comes to capturing and writing about the statistical nuances of the game, no one is better than Bill James.

The Bill James Gold Mine is a very enjoyable read, and I have a feeling I’ll be spending many hours playing with the stats in Bill James Online.

Unfortunately for you, the Gold Mine has inspired me to go off on a tangent about context in sabermetrics. Here’s what got me thinking about it: James offers two mirror tables in the book:

  • Batting Performance by Quality of Opposing Pitcher, which shows a batter’s batting average and OPS broken out by the opposing pitcher’s ERA, and
  • Pitching Performance Against Quality of Opposition, which shows a pitcher’s record broken out by the opposing team’s winning percentage.

Not pitching performance by quality of opposing batter, but opposing team. Teams can be good by having good batters, but they can also be good by having good pitchers. By constructing the tables this way, there’s an implication that pitchers take on entire teams, not just batters. Personally, I’d be more interested in the quality of batter breakout.

It’s kind of ironic that the person who perhaps best helped baseball fans understand that context can overrun statistics (in fact, I think that contextual adjustment is the most important aspect of sabermetrics) has subtly reintroduced context into his writing and thinking. I get the feeling this is something James has actively thought about, and it’s something that his readers should keep in mind too.

The book contains an essay called “Bullpens and Crunches,” in which James attempts to answer the question of whether teams with outstanding bullpens tend to do well in close games. To study this question, he ranks bullpens by a number of factors. Three of the ten factors he considers are wins and saves.

Think about this for a second. When a team wins a close game, its bullpen is almost always given a win or a save. Virtually 100% of the time. Does that mean that the bullpen contributed meaningfully to the win? I think the correct answer is “it depends.” Yet this study relies on saves and wins to rank bullpens. It’s like measuring whether dogs that have a higher flea count are more likely to have fleas. The answer is yes, but it’s sort of a meaningless question. I’m sure he disagrees, but I don’t think Bill James filtered the context enough for this analysis to be valid.

Here’s an even more subtle contextual issue to think about. In the Toronto section, James talks about the great fielding stats posted by second baseman Aaron Hill and takes some time to investigate them a little more deeply. He finds that Toronto faced lefty batters more than any other team last year (due to their primarily right-handed pitching staff), which no doubt influenced the stats because batters tend to pull ground balls.

This is a great insight, presented just right. It’s typical James, searching for the deeper “truth” and investigating the context. However, there is a stat that already adjusts for this context. MGL’s Ultimate Zone Rating, which has been around for at least four years, includes a correction for the handedness of the batter. UZR isn’t mentioned in the Gold Mine, although it’s freely available on the web.

Of course, that’s 100% okay. Bill James likes to research baseball issues himself instead of relying on other research. Perhaps he also feels that his readership needs to be brought along a little more slowly, so he hesitates to adjust his statistics for too many contexts. But I think it’s safe to say that James’ “context edges” are now at a different place than mainstream sabermetrics’ (as reflected in sites like the Baseball Think Factory, Baseball Prospectus and The Book Blog).

How a writer thinks about context impacts just about everything the writer does. It impacts the way you present stats, the way you conduct your studies and even the extent to which your analysis is up to speed with the most recent sabermetric insights. Keep that in mind when you’re reading any baseball article, whether it’s by Bill James, Dave Studeman or Alyssa Milano. It makes a difference.

References & Resources
Throughout this article, I’ve referred to “sabermetrics,” which I take to mean “baseball analysis in search of the truth.” I think that is what James intended it to mean.

Be Jolly Inc. is the name of James’ new company that owns the site and the book. It’s a play off the initials “bjol,” or “Bill James Online.”

Print This Post
Dave Studeman was called a "national treasure" by Rob Neyer. Seriously. Follow his sporadic tweets @dastudes.

Comments are closed.