Fun With BaseRuns

With a brand new (to me) historical database of all players, my project for the day was to calculate BaseRuns for all batters. BaseRuns models run creation, much like Bill James’ Runs Created, but BaseRuns is a more accurate model. As for the calculations, I decided to stick to David Smyth’s BaseRuns Primer. I used the “simple” version for seasons prior to 1955 and the more “complex” version for anything 1955 to the present. Here’s the more complex version, where BaseRuns = A*B/(B+C)+D

A = H + BB + HBP – HR – .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * X
C = AB – H + CS + GDP
D = HR

The quick and dirty version of what I did was, determine the B multiplier or X for each major league team by season, use BaseRuns to calculate the number of runs a team would have had without a particular player, and then subtract that from the actual runs the team had, to get that player’s BaseRuns.

To determine the B multiplier, I dug up my 8th grade algebra skills to solve the following equation for X: Runs = A * (B * X)/((B * X) + C) + D

X = ((Runs – D) * C) / B / (A – (Runs – D))

Hopefully, even with my rusty algebra skills, this was (and still is) correct. Now that I had my B multipliers (X), I could go ahead and calculate what teams would have done without a particular player and then finally get a players BaseRuns. So just for kicks, let’s look at a few lists:

Top 20 All Time:

Name                  BSR      RC
Babe Ruth             2638     2757
Ty Cobb               2534     2524
Cap Anson             2514     1794
Barry Bonds           2451     2791
Hank Aaron            2400     2553
Stan Musial           2382     2569
Willie Mays           2238     2369
Ted Williams          2231     2384
Tris Speaker          2208     2176
Lou Gehrig            2199     2264
Rickey Henderson      2166     2167
Pete Rose             2116     2220
Mel Ott               2104     2085
Jimmie Foxx           2072     2146
Honus Wagner          2064     1888
Carl Yastrzemski      2050     2147
Frank Robinson        2012     2127
Eddie Collins         1997     1799
Roger Connor          1949     1498
Rafael Palmeiro       1922     2040

Top 20 Seasons: All Time

Name                 Season    BSR     RC
Babe Ruth            1921      212    233
Hugh Duffy           1894      204    187
Tip O'Neill          1887      202    173
Babe Ruth            1923      199    216
Jimmie Foxx          1932      191    206
Babe Ruth            1920      190    206
Billy Hamilton       1894      188    148
Joe Kelley           1894      186    152
Lou Gehrig           1927      186    211
Lou Gehrig           1930      185    197
Lou Gehrig           1936      184    190
Babe Ruth            1927      183    203
Babe Ruth            1924      183    199
Lou Gehrig           1931      183    183
Babe Ruth            1931      182    185
Babe Ruth            1930      181    187
Rogers Hornsby       1922      180    206
Rogers Hornsby       1929      178    188
Ted Williams         1949      177    180
Jimmie Foxx          1938      175    184

Interesting how only 9 players are in the top 20 seasons of all time. Of the modern day players, Barry Bonds‘ 2001 season and Todd Helton‘s 2000 season make the top 30. Ryan Howard‘s 2006 MVP season amounts to the 157th best of all time and Justin Morneau‘s 2006 is 967th best.

But since we’re looking at a player’s production in the context of his own team, it might be interesting to see who is responsible for the highest percentage of BaseRuns by a single player.

Top 25 All Time (> 500 BSR):

Name                 BSR       BSR%
Ralph Kiner          1100    16.68%
Albert Pujols         817    16.62%
Barry Bonds          2451    15.72%
Roger Connor         1949    15.48%
Jesse Burkett        1867    15.42%
Babe Ruth            2638    15.20%
Stan Musial          2382    15.12%
Hank Aaron           2400    15.02%
Ted Williams         2231    14.98%
Bob Johnson          1369    14.94%
Ty Cobb              2534    14.82%
Honus Wagner         2064    14.70%
Jeff Bagwell         1658    14.62%
Willie Mays          2238    14.61%
Mickey Mantle        1901    14.59%
Tris Speaker         2208    14.43%
Harry Stovey         1447    14.16%
Lou Gehrig           2199    14.06%
Paul Hines           1401    13.98%
Todd Helton          1192    13.92%
Billy Hamilton       1669    13.89%
Ichiro Suzuki         650    13.86%
Cap Anson            2514    13.79%
Ed Delahanty         1786    13.77%
Eddie Mathews        1656    13.73%

What I like about expressing BaseRuns as a percentage of a teams total runs is that you can see just how big a part of the offense that particular player is.

Top 10 in 2006:

Name                 BSR       BSR%
Albert Pujols        133     17.06%
Lance Berkman        125     17.01%
Jason Bay            117     16.90%
Ryan Howard          144     16.69%
Alfonso Soriano      124     16.63%
David Ortiz          134     16.29%
Miguel Cabrera       120     15.81%
Garrett Atkins       126     15.52%
Matt Holliday        124     15.20%
Grady Sizemore       132     15.18%

Jason Bay was an extremely large part of the not so wonderful Pirate offense in 2006. Other notables include Justin Morneau falling in at 12th with 14.65% of the offense.

Anyway, at some point in the future, I’d like to include BaseRuns in the FanGraphs player pages and leader-boards. Since this is my first shot at calculating BaseRuns, I want to make sure I’m calculating them in a way that makes sense. If you see any problems with my methodology, please let me know as I’d hate to have blatantly wrong data on the player pages.

For more information on BaseRuns, Tangotiger had an excellent series on BaseRuns and Linear Weights.



Print This Post



David Appelman is the creator of FanGraphs.


Sort by:   newest | oldest | most voted
studes
Guest
studes
9 years 7 months ago

Great, David. On what level did you set your “B” multiplier? Year, league, team or something else?

studes
Guest
studes
9 years 7 months ago

Personally, I don’t think setting the multiplier to the team level is correct. B is partially a measure of “clutch hitting,” and teams can vary quite a bit in their clutchiness. If you define B on the team level, and they have an outstanding year in the clutch, you’re probably giving the batter too much credit (particularly if he personally didn’t hit well in the clutch). I prefer to set the B multiplier on the year/league level.

Also, did you use James’s multiple versions of Runs Created (which he developed for separate eras)?

Thanks.

tangotiger
Guest
tangotiger
9 years 7 months ago

If a team was expected to score 700 runs and they scored 800, then that’s 100 runs unaccounted for (typically clutch, but also baserunning). The question then is how to best estimate those 100 missing runs.

I think distributing it to the players based on “production” would be the right thing to do. However, using the “B” factor to do that would be wrong, since you are dismissing most of the HR in the calculation. (This is the way I’ve been doing it, and it’s wrong.)

I’d set the “B” factor at the league level, and then multiply the players BaseRuns by 800/700 (or whatever it is for that team). You are assuming that the clutch portion is dustributed this way, and that may be wrong too.

***

Another way to try to figure it out is using Run Expectancy. The change in RE (as opposed to the change in WE, which is what David does) will give you the exact runs created above average. Add in the average runs created per PA, and you have runs created. This will give you the exact runs scored for the team.

Compare this to the custom linear weights for that season, and look at the difference. Is the difference randomly distributed, or is there a relationship based on the production of the player?

David
Guest
David
9 years 7 months ago

Thanks Dave & Tom. I’ll try rerunning the numbers by setting the B “factor” to the league average for a particular year, and then scale the individual players to the actual number of runs by a particular team.

I’m thinking it will make the biggest difference for seasons prior to 1900 where the B “factor” can get quite large.

Calculating runs created above average using RE sounds interesting and I can probably throw this into the WE code while I’m cleaning it up for 2007. Is there any particular RE table I should be using?

Dave: For Runs Created on FanGraphs, I use the “basic” version for seasons prior to 1955 and the “technical” version for seasons 1955 onward.

tangotiger
Guest
tangotiger
9 years 7 months ago

Well, you can probably generate your own RE table, like I did here:
http://www.tangotiger.net/RE9902.html

I did it for a 4yr time period (ends up with 5.0 RPG). You could do it on a year-by-year, or league-by-league basis. Or even by park really.

Or, for simplicities sake, and to match up to my WE tables, use the RE chart above, or the Markov-generated one in The Book (Table 8).

tangotiger
Guest
tangotiger
9 years 7 months ago

That should be Table eight, not the automatic smiley that eight and a closing bracket gives you.

wpDiscuz