Linear Weights + BaseRuns = Good

In my last article, I explained how wOBA’s current implementation changes the value of walks, singles, home runs, etc., annually due to changing league characteristics.  Does this mean that the value of an event is the same for every team in the league each season?  Or in every park in the league?  No way.  If you’re talking about a weak offense in a high-offense era, then the overall constants for a weak offensive era are probably more applicable to that team.  However, it’s not really the point of standard wOBA to guess the run-producing contribution of a particular player to a particular team; I think it’s probably more accurate to say it’s about his probable productiveness in a typical team (although park effects aren’t taken into account, so not exactly… that would be more true of wRC+).

Anyway, Tom Tango realized this limitation, and produced a table that shows how the values change depending on a team’s runs scored.  He accomplished this system of “Custom Linear Weights” (“a necessary offshoot” of linear weights, he says) by making use of David Smyth’s BaseRuns formula, which is, in simplest terms, Runs Scored = base runners * (% of base runners that score) + home runs.  Home run hitters are not considered base runners, in this equation, by the way.  Makes perfect sense, right?

Tango realized that BaseRuns had a better handle on the team run-scoring process than his basic linear weights system (and all the other run estimators), so he translated the results of BaseRuns in various run environments into linear weights.  Specifically, the BaseRuns formula told him how many runs the team should score, and the linear weight value of each hit came from how many additional runs BaseRuns expected the team score if it had one more of that type of hit (the marginal value of each hit type).  Here are just the basics of his results, in graphical form:

LWTS by runs

You may have noticed the numbers don’t line up with those in the chart at the end of my last article; that’s because this is without the value of the out subtracted. You’ll also see more or less what I was talking about in the previous article, which is that the run value of each event rises with greater scoring — particularly the non-home run events.

Speaking of outs…

What’s an out really worth?  In the version of linear weights (LWTS) that deals with outs, the default value of an out is about -0.29 runs.  But that’s based around a typical team.  And it’s a pretty abstract concept — it’s all about the cost of “what could have been”.  That cost is entirely dependent on the team; as the great mathematician and/or funk singer Billy Preston often said, “nothin’ from nothin’ leaves nothin,” so if a hypothetically horrible team (or player) can only be expected to make outs, then nothing is really lost when they do in fact make an out.  Hitting into a double play or making an out on the base paths… that has to be considered a little differently, since that’s effectively taking away part of the value of somebody’s walk, single, etc.

Anyway, here’s the chart for when the value of not making an out at the plate is included:

LWTS by runs and outs

You may have noticed that the slopes are a bit steeper now, as the value of an out goes from -0.06 at 1 run per game to -0.594 at 10 runs per game.

I’m a little more inclined to look at these shifts in the values of the hits from the angle of synergies between on-base percentage (or more accurately, not making outs), extra base tendencies, and base running rather than from the angle of runs per game.  I think this way gets the cause-and-effect order correct, plus it seems a little too close to being circular otherwise (try googling “recursion”… those pranksters).

Consider how many base runners per inning you can expect given the proportion of batters who make it on base and aren’t put out afterwards (I call it “non-out rate,” but just consider it on-base percentage with no screw-ups involved):

Runners v OBP

Assuming no outs on the bases, as OBP approaches 1.000, runners per inning approaches infinity.  I cut the graph short of infinity, because I figure I’ve already put you loyal readers from my FanGraphs Community Research articles (1, 2, and especially 3) through enough endless scrolling…

When you think about it, the closer to 1.000 a team’s OBP becomes — at least past a certain point — the closer to 1 the value of any on-base event should be.  If the bases are always loaded, then even a walk is always going to drive in a run, and you’re always going to be driven in by somebody behind you.  Sure, a home run with the bases loaded is going to drive in 4 runs, but then you have to consider that the hitters behind the home run hitter would have driven in the runs anyway.  Basically, it’s a communist utopia of hitters, where all varieties of hits, and even walks, have equal worth.  But it only applies at practically impossible team OBP levels (long-term, anyway), where everybody is performing a lot better than they’re realistically capable of, somehow.

Anyway, if you were wondering, the line follows the formula: Runners per inning = 3/(1-OBP) – 3.  So that you can see why this makes sense:

Reached Base

Plate Appearances

OBP

3/(1-OBP) -3

0

3

0.000

0

1

4

0.250

1

2

5

0.400

2

3

6

0.500

3

4

7

0.571

4

… and so on.

You don’t have to be a rocket scientist to figure out that if you have a lot of plate appearances in a single inning, you’re going to score a lot of runs.  Each plate appearance over 6 in an inning is automatically worth a run (a maximum of 3 on base and a maximum of 3 outs).

Now, what if runners get thrown out trying to stretch a single into double, or trying to steal?  You could then have a 1.000 OBP in an inning, yet only have 3 plate appearances total… so that screws that idea up.  Then there’s double and even triple plays to deal with (OK, at least you know there won’t be a 1.000 OBP for the whole inning in those cases).  These events complicate things considerably…  so not to distract you from that little problem or anything, but… HEY, WHAT’S THAT BEHIND YOU?!  Seriously, though, I’ll attempt to address that issue in a future article, but it’s probably not going to be pretty.

Moving on, here’s a relationship that Tom Tango pointed out:

Runs per OB v OBP

Runs per Time on Base (R/OB), defined as R/(H + BB + HBP), has a fairly strong positive correlation to OBP.  Recall the BaseRuns principle that runs = (base runners except HR) * (score rate) + HR; R/OB is the scoring rate, so the implication here is that having a higher OBP actually raises both parts of that equation.  With a little bit of algebra, you can therefore say that total runs has a strong relationship to OB^2 / PA.  Specifically, 1.1 * OB^2 / PA turns out to be a pretty decent estimator of runs, with a 0.929 correlation to runs and a Mean Absolute Error of 30 runs over a season (1960-2012).

If you modify the above graph to instead compare OBP to (R – HR) / (OB – HR) — BaseRuns style, considering HR separately — the R^2 shoots up to 0.5181.  The conversion now leads to the formula:

R = 0.955 * ((OB – HR) * OB/PA + HR)

… which turns out to be a better estimator of runs; a 0.962 correlation to runs and a Mean Absolute Error of 22 runs over a season.  (for the uninitiated in stats: the MAE of 22 means the formula guesses runs per season correctly to within an average of 22, and the 0.962 correlation is extremely strong, since 1 is as high as a correlation can be).

Now’s as good a time as any to get into more details regarding David Smyth’s BaseRuns.  Paraphrasing the simple version of the formula:

Runs = (OB-HR) * (runner advancement estimate) / (runner advancement estimate + Outs) + HR

The middle section of the formula represents the score rate.  This means what I essentially showed earlier is that OBP (or OB/PA) works quite well as a stand-in for the score rate component of BaseRuns.  Interesting, no?  BaseRuns is basically as good as it gets amongst run estimator formulas, but it’s really only a little bit better than that formula, with a 0.973 correlation to runs, and an MAE of 18.2.  Of course, the accuracy of my formula there could be partly due to the tendency of high-OBP teams to also have more power.

The score rate component of BaseRuns is probably the only component that can stand to be improved upon.  In the simple version of the formula, it’s:

((1.4*TB – .6*H – 3*HR + .1*W)*1.02) / ((1.4*TB – .6*H – 3*HR + .1*W)*1.02 + (AB – H))

(AB-H) is the aforementioned Outs component, by the way.  Anyway, as you can probably imagine, scoring rate isn’t entirely solved by this equation.  There’s a more complex version that deals with steals, caught stealing, double plays, hit by pitches, and intentional walks, but it only makes for a slight improvement.  The truth of the scoring rate is more complicated than that.

So, the BaseRuns formula holds up to extreme run environments better than the other run estimators you probably know and love, which is why Tango used it to derive runs-per-game-based linear weights.  However, increased runs per game don’t cause higher linear weights; a better, more synergistic offense is the root cause of both.  Therefore, what we want is to start over and take these synergies more into account next time.  When we do that, and we see how the whole can be greater than (or less than) the sum of its parts, we can transcend linear weights, BaseRuns, and the combination thereof.  Next time, I’ll show you a bit of how that can be done.




Print This Post



Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?


27 Responses to “Linear Weights + BaseRuns = Good”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Dan says:

    Why would FG let you post these. Now, whenever they analyze a trade or acquisition we masses will shout “Bout what about in team x’s particular run environment?”

    Great article, again.

    Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      Haha, thanks

      Vote -1 Vote +1

    • philosofool says:
      FanGraphs Supporting Member

      There is actually a response to the “What about particular run environment x?” It goes like this: while the run value of the player’s performance will be lower/higher than expected in the current average environment, (1) half of all games played are on average in an average environment,(2) valuable skills are valuable in every environment and (3) the relative differences in those values is typically small. Then you say “the burden of proof is on you to show that these differences undermine the present analysis.”

      Vote -1 Vote +1

  2. Hans says:

    Great series. I love these kinds of articles that dig into the math behind sabermetrics.

    Vote -1 Vote +1

  3. Oh, Beepy says:

    Not to knock any of the other FG writers but I find your no-nonsense level of Mathing(tm) to be a refreshing break from some of the more pseudo-humourous posts that have been increasingly pervasive here. (I’m aware that the funny brings in the readers, and it was likely a humorous article which brought me here, I’m only advocating continually writing articles like these instead of relegating everything to the ‘guts’ pages.)

    Congratulations again on your much-deserved place on the FG staff!

    Vote -1 Vote +1

    • philosofool says:
      FanGraphs Supporting Member

      I’m not knocking anyone either, but I agree that too much jocularity tends to cause me to stop reading analytical articles. Sometimes I just want the data and the conclusions.

      Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      Hey, I attempted to be funny a few times in the article! Haha, thanks.

      Vote -1 Vote +1

    • B N says:

      Fangraphs: One of the few sports sites with a reader base so serious that you are congratulated for being humorless.

      Not entirely untrue though. I can find funny things all over the internet. I come here for baseball knowledge.

      Vote -1 Vote +1

  4. philosofool says:
    FanGraphs Supporting Member

    Just to clarify, the Runs/Game axis in the graphs is runs per TEAM per game, not Runs of Home Team plus Runs of Away Team, right?

    Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      Yup, you got it. This is about team run-scoring effects, not park effects (not directly, anyway).

      Vote -1 Vote +1

  5. Tim says:

    I think there’s something wrong here. As OBP falls (and therefore R/G) triples should become more valuable relative to singles, doubles, and especially walks, in the same way that HRs do, because runners from third are far more likely to score without a hit. Instead they’re nosediving. A walk maintaining its value as runs decrease better than a triple does is clearly wrong. Walks should be losing the most relative value of all the positive outcomes as the run environment decreases.

    Vote -1 Vote +1

    • Dan says:

      I believe it is logical because triples get the most value from having runners on base. If you are in an environment with a very high obp, then it is very likely that a triple will drive in runners on base. Triples have the most to lose as obp goes down.

      Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      Hm, I think your idea is correct, but I don’t see how it disagrees with the charts. Triples do become more valuable relative to the lower on-base events as runs per game drops. A triple goes from being worth 2.9 times a walk at 10 runs per game, up to 4 times a walk at 1 run per game. Make that 1.8x to 3.1x, if you include the value of not making an out.

      Vote -1 Vote +1

  6. rotowizard says:

    R = 0.955 * ((OB – HR) * OB/PA + HR)

    IMO, this is on par with E=MC^2

    You get my vote for the Fields Medal

    Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      Heh, yeah right. You know, thinking about it a little bit more, though, HR should stand alone and not be multiplied by 0.955. R = 0.934*(OB-HR)*OBP + HR makes more sense, and works a tiny bit better.

      Vote -1 Vote +1

  7. William Bean says:

    I’m not following the point of this. Is it something like…Given X environment, Y situation, do Z? Or does this just make you feel better about other assumptions that have been made regarding BaseRuns?

    Vote -1 Vote +1

    • Steve Staude. says:
      FanGraphs Supporting Member

      An illustration of the main point is this: if you took two hitters with identical wOBAs (or wRC+), they won’t necessarily have the same value to a particular team. For example, the higher-OBP, lower-HR hitter of the two will be less valuable to a low-OBP team; the reverse will be true for a high-OBP team.

      Vote -1 Vote +1

      • Baltar says:

        This is extremely important to a team that is trying to maximize value with a low budget. I’m wondering if the Rays and A’s already use this insight.

        Vote -1 Vote +1