Fun With BaseRuns
With a brand new (to me) historical database of all players, my project for the day was to calculate BaseRuns for all batters. BaseRuns models run creation, much like Bill James’ Runs Created, but BaseRuns is a more accurate model. As for the calculations, I decided to stick to David Smyth’s BaseRuns Primer. I used the “simple” version for seasons prior to 1955 and the more “complex” version for anything 1955 to the present. Here’s the more complex version, where BaseRuns = A*B/(B+C)+D
A = H + BB + HBP - HR - .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * X
C = AB - H + CS + GDP
D = HR
The quick and dirty version of what I did was, determine the B multiplier or X for each major league team by season, use BaseRuns to calculate the number of runs a team would have had without a particular player, and then subtract that from the actual runs the team had, to get that player’s BaseRuns.
To determine the B multiplier, I dug up my 8th grade algebra skills to solve the following equation for X: Runs = A * (B * X)/((B * X) + C) + D
X = ((Runs - D) * C) / B / (A - (Runs - D))
Hopefully, even with my rusty algebra skills, this was (and still is) correct. Now that I had my B multipliers (X), I could go ahead and calculate what teams would have done without a particular player and then finally get a players BaseRuns. So just for kicks, let’s look at a few lists:
Top 20 All Time:
Name BSR RC Babe Ruth 2638 2757 Ty Cobb 2534 2524 Cap Anson 2514 1794 Barry Bonds 2451 2791 Hank Aaron 2400 2553 Stan Musial 2382 2569 Willie Mays 2238 2369 Ted Williams 2231 2384 Tris Speaker 2208 2176 Lou Gehrig 2199 2264 Rickey Henderson 2166 2167 Pete Rose 2116 2220 Mel Ott 2104 2085 Jimmie Foxx 2072 2146 Honus Wagner 2064 1888 Carl Yastrzemski 2050 2147 Frank Robinson 2012 2127 Eddie Collins 1997 1799 Roger Connor 1949 1498 Rafael Palmeiro 1922 2040
Top 20 Seasons: All Time
Name Season BSR RC Babe Ruth 1921 212 233 Hugh Duffy 1894 204 187 Tip O'Neill 1887 202 173 Babe Ruth 1923 199 216 Jimmie Foxx 1932 191 206 Babe Ruth 1920 190 206 Billy Hamilton 1894 188 148 Joe Kelley 1894 186 152 Lou Gehrig 1927 186 211 Lou Gehrig 1930 185 197 Lou Gehrig 1936 184 190 Babe Ruth 1927 183 203 Babe Ruth 1924 183 199 Lou Gehrig 1931 183 183 Babe Ruth 1931 182 185 Babe Ruth 1930 181 187 Rogers Hornsby 1922 180 206 Rogers Hornsby 1929 178 188 Ted Williams 1949 177 180 Jimmie Foxx 1938 175 184
Interesting how only 9 players are in the top 20 seasons of all time. Of the modern day players, Barry Bonds‘ 2001 season and Todd Helton’s 2000 season make the top 30. Ryan Howard’s 2006 MVP season amounts to the 157th best of all time and Justin Morneau’s 2006 is 967th best.
But since we’re looking at a player’s production in the context of his own team, it might be interesting to see who is responsible for the highest percentage of BaseRuns by a single player.
Top 25 All Time (> 500 BSR):
Name BSR BSR% Ralph Kiner 1100 16.68% Albert Pujols 817 16.62% Barry Bonds 2451 15.72% Roger Connor 1949 15.48% Jesse Burkett 1867 15.42% Babe Ruth 2638 15.20% Stan Musial 2382 15.12% Hank Aaron 2400 15.02% Ted Williams 2231 14.98% Bob Johnson 1369 14.94% Ty Cobb 2534 14.82% Honus Wagner 2064 14.70% Jeff Bagwell 1658 14.62% Willie Mays 2238 14.61% Mickey Mantle 1901 14.59% Tris Speaker 2208 14.43% Harry Stovey 1447 14.16% Lou Gehrig 2199 14.06% Paul Hines 1401 13.98% Todd Helton 1192 13.92% Billy Hamilton 1669 13.89% Ichiro Suzuki 650 13.86% Cap Anson 2514 13.79% Ed Delahanty 1786 13.77% Eddie Mathews 1656 13.73%
What I like about expressing BaseRuns as a percentage of a teams total runs is that you can see just how big a part of the offense that particular player is.
Top 10 in 2006:
Name BSR BSR% Albert Pujols 133 17.06% Lance Berkman 125 17.01% Jason Bay 117 16.90% Ryan Howard 144 16.69% Alfonso Soriano 124 16.63% David Ortiz 134 16.29% Miguel Cabrera 120 15.81% Garrett Atkins 126 15.52% Matt Holliday 124 15.20% Grady Sizemore 132 15.18%
Jason Bay was an extremely large part of the not so wonderful Pirate offense in 2006. Other notables include Justin Morneau falling in at 12th with 14.65% of the offense.
Anyway, at some point in the future, I’d like to include BaseRuns in the FanGraphs player pages and leader-boards. Since this is my first shot at calculating BaseRuns, I want to make sure I’m calculating them in a way that makes sense. If you see any problems with my methodology, please let me know as I’d hate to have blatantly wrong data on the player pages.
For more information on BaseRuns, Tangotiger had an excellent series on BaseRuns and Linear Weights.

studes said,
January 6, 2007 @ 2:06 pm
Great, David. On what level did you set your “B” multiplier? Year, league, team or something else?
David Appelman said,
January 7, 2007 @ 9:13 pm
Thanks Dave. I just set the B multiplier to the individual team. I’ve started messing around with custom linear weights, but I’m still trying to wrap my head around the whole thing.
studes said,
January 8, 2007 @ 11:58 am
Personally, I don’t think setting the multiplier to the team level is correct. B is partially a measure of “clutch hitting,” and teams can vary quite a bit in their clutchiness. If you define B on the team level, and they have an outstanding year in the clutch, you’re probably giving the batter too much credit (particularly if he personally didn’t hit well in the clutch). I prefer to set the B multiplier on the year/league level.
Also, did you use James’s multiple versions of Runs Created (which he developed for separate eras)?
Thanks.
tangotiger said,
January 8, 2007 @ 4:07 pm
If a team was expected to score 700 runs and they scored 800, then that’s 100 runs unaccounted for (typically clutch, but also baserunning). The question then is how to best estimate those 100 missing runs.
I think distributing it to the players based on “production” would be the right thing to do. However, using the “B” factor to do that would be wrong, since you are dismissing most of the HR in the calculation. (This is the way I’ve been doing it, and it’s wrong.)
I’d set the “B” factor at the league level, and then multiply the players BaseRuns by 800/700 (or whatever it is for that team). You are assuming that the clutch portion is dustributed this way, and that may be wrong too.
***
Another way to try to figure it out is using Run Expectancy. The change in RE (as opposed to the change in WE, which is what David does) will give you the exact runs created above average. Add in the average runs created per PA, and you have runs created. This will give you the exact runs scored for the team.
Compare this to the custom linear weights for that season, and look at the difference. Is the difference randomly distributed, or is there a relationship based on the production of the player?
David said,
January 9, 2007 @ 3:04 am
Thanks Dave & Tom. I’ll try rerunning the numbers by setting the B “factor” to the league average for a particular year, and then scale the individual players to the actual number of runs by a particular team.
I’m thinking it will make the biggest difference for seasons prior to 1900 where the B “factor” can get quite large.
Calculating runs created above average using RE sounds interesting and I can probably throw this into the WE code while I’m cleaning it up for 2007. Is there any particular RE table I should be using?
Dave: For Runs Created on FanGraphs, I use the “basic” version for seasons prior to 1955 and the “technical” version for seasons 1955 onward.
tangotiger said,
January 9, 2007 @ 2:13 pm
Well, you can probably generate your own RE table, like I did here:
http://www.tangotiger.net/RE9902.html
I did it for a 4yr time period (ends up with 5.0 RPG). You could do it on a year-by-year, or league-by-league basis. Or even by park really.
Or, for simplicities sake, and to match up to my WE tables, use the RE chart above, or the Markov-generated one in The Book (Table 8).
tangotiger said,
January 9, 2007 @ 2:14 pm
That should be Table eight, not the automatic smiley that eight and a closing bracket gives you.