# The monster in the mirror

*“I saw a monster in my mirror when I woke up today / A monster in my mirror but I did not run away / I did not shed a tear or hide beneath my bed / Though the monster looked at me and this is what he said / He said ‘wubba wubba wubba wubba woo woo woo … wubba wubba wubba and a doodly do’ …”*

This is probably the first time you’ve ever read a baseball article that not only directly quotes from a “Sesame Street” song, but uses the subject of it in the title. It probably feels a bit weird to read it, and to be honest, it feels weird to write it. Now, the $64,000 question: “How the hell does this relate to baseball?”

Believe it or not, this song was the direct inspiration for Tom Tango’s Weighted On-Base Average, or wOBA. wOBA, pronounced exactly as it is in the song, is a rate measurement of a hitter’s offensive performance that goes above and beyond the traditional rates (batting average, on-base percentage, slugging percentage, and on-base plus slugging). wOBA, in a nutshell, is linear weights raised above the value of an out and scaled to resemble on-base percentage. But what does that mean?

There have been a number of wOBA primers circulating around the Internet recently to help answer this question. I love primers, and I love reading people’s explanations of the more advanced statistics. These primers on wOBA are accessible to the most casual of fans. They answer the “what,” and occasionally the “why,” but not the “how.” This is important. We can’t trust a statistic unless we really know what goes in to it, and how it compares to other statistics that attempt to accomplish the same goal.

A word of caution: This will get a bit technical at times, and if you’re not particularly interested in learning the intricate details, Sections 1 through 4 can actually be skipped over. There’s a summary in the conclusion that briefly describes each section in here.

Now, let’s get down to business. First, let’s take a closer look at the traditional rate measures of performance—why do we need something different? What we have to begin with works just fine, right?

### 1. Revisiting and evaluating the basics

**Batting average (AVG):*** H / AB*

Batting average implies two things: 1) that the hitter is getting on base, and 2) that he is moving the runner over or driving him in. Since it fails to incorporate walks, it is limited in how well it tracks the on-base process. And since it fails to incorporate the hit types, it is limited in how well it tracks the run-driving process. We can have two players that are 10-for-30 and have a .333 average, but they can be distinctly different players. If a player has 10 singles in those 30 at-bats, he’s considered equal to the player that has five singles, three doubles and two home runs. The latter provides more value than the former, but batting average labels them the same. It’s a descriptive statistic, not a value one.

**On-base percentage (OBP):** * (H + BB + HBP) / (AB + BB + HBP + SF)*

This is meant to tell us how often a player reaches base (hence the name). OBP doesn’t weight events, as a home run is worth the same as a walk. For all intents and purposes OBP does just fine for what it is meant to do.

**Slugging percentage (SLG):** * (H + 2B + (2 x 3B) + (3 x HR)) / AB*

This is a common measurement of power. SLG weights singles at 1, doubles at 2, triples at 3, and home runs at 4—a logical assumption, but not necessarily correct. A double is not always worth two singles, nor is a home run worth four singles. In any case, it works fairly well at measuring power, and that’s all we’re looking for here.

**On-base plus slugging (OPS):** * ((H +W) / PA) + (TB / AB)*

We’ve got two rates that measure a player’s offensive value—one is OBP, which measures the player’s on-base ability, and the other is SLG, which measures his driving power. It is only natural to combine the two to give an overall measurement of a player’s hitting, and that is exactly what OPS attempts to accomplish.

But (and I’m sure you saw this coming), there is a glaring flaw. As you can see in the formula, I’ve highlighted the denominators of both rates. See the problem? We’re adding two rates with different denominators. Since OBP uses PA and SLG uses AB, it becomes apparent that slugging events will be overvalued and that walks will be undervalued.

Ignoring the denominators, OPS breaks down into:

OPS = *(1B + 2B + 3B + HR + TBB) + (1B + 2 x 2B + 3x 3B + 4 x HR) *

Which simplifies to:

OPS = *(TBB + 2 x 1B + 3 x 2B + 4 x 3B + 5 x HR)*

I don’t know about you, but I don’t think a single is worth twice as much as a walk. And while a home run is undoubtedly more valuable, I have a hard time believing that it’s five times greater than a single. I should say that there’s nothing particularly wrong with using OPS, as it works just fine for using a quick and dirty valuation of a hitter. But it would be misleading to use it as a definitive measurement to evaluate a hitter’s production.

As a reference point, here’s a general idea how well the rates relate to actual run scoring, from 1954-2009 (strike-shortened seasons have been excluded)*:

Rate r MAE RMSE AVG 0.791 36.43 45.58 OBP 0.858 31.86 39.73 SLG 0.869 26.71 33.52 OPS 0.908 22.04 27.47

The first column is the correlation coefficient, which measures the strength between the rate in question and runs scored (an “r” of 1.00 indicates a perfect relationship). Batting average correlates strongly with runs scored, but not nearly as well as the other rates. MAE is the Mean Absolute Error—the average distance between estimated runs scored and actual runs. RMSE is the Root Mean Square Error, which gives a higher weight to large errors. In both cases, the lower the number, the better.

OPS clearly leads the pack, which isn’t surprising—usually, statistics that incorporate more data will be more accurate. Compared to traditional run estimators like Bill James’ Runs Created or David Smyth’s BaseRuns, it’s not terrible (they usually have an RMSE around 23-24)—so it makes for a decent estimator on the team level. The reason why this is, as Dan Fox pointed out years ago, is that “it is a linear approximation of more complex run estimation formulas.”

So, here we are left with the triple slash rates (AVG/OBP/SLG) and OPS. The slash rates give us a nice overall picture of a player’s value, but it still falls a bit short of what we want—a comprehensive rate of total offensive production. OPS attempts to solve this, and it has the right idea, but it too falls short of what we’re really looking for. We need something that’ll put everything together in a neat little package, and something that will avoid the weighting issues in OPS. What we need are linear weights.

### 2. Linear weights: the heart of the monster

Oh, my. I’m terrible with titles.

The goal of baseball is to score more runs than your opponent in order to win. How are these runs created? Well, there are two ways a player creates a run—by getting on base or through moving the runners over or driving them in. We know that a single is more valuable than a walk, a double more than a single, and so on.

The problem is that we don’t know how valuable each event is. One way we can help quantify the impact of each event is through a run expectancy chart. Here’s one for 2009, courtesy of Baseball Prospectus:

Men on 0 outs 1 out 2 outs 0-0-0 0.517 0.279 0.106 1-0-0 0.883 0.533 0.223 0-2-0 1.142 0.688 0.322 0-0-3 1.315 0.965 0.370 1-2-0 1.484 0.922 0.456 1-0-3 1.769 1.202 0.522 0-2-3 2.014 1.414 0.562 1-2-3 2.279 1.558 0.750

As the name implies, it tells us how many runs we expect to score in any one of the 24 base/out states until the end of the inning. We know that a runner on third with no outs has a high probability of scoring; RE tells us that, on average, a team with a runner on third and with no outs scores 1.3 runs from that state to the end of the inning.

The nice thing about RE is that it models intuition—we know that a team has a better chance of scoring a run with a runner on second and no outs than it does with a runner on first and two outs. RE simply looks at every single base/out situation and finds the amount of runs scored in those scenarios. It provides our intuition with empirical numbers, and we can use these charts to find the value of an event in each situation. This is determined by:

*(Runs Scored + End State RE) – Beginning State RE*

Let’s say we’ve got a runner on first with one out—the batter lines a single, and the runner makes it to third. This increases the team’s chances of scoring from a runner on first to a runner on first and third.

(0 + 1.202) – .533 = .669 runs

Runner on first, no outs and the hitter strikes out:

(0 + .533) – .833 = -.30 runs

We can then expand on this. We can look at every event that occurs in each base/out state, and then we can find the average value the event adds in terms of run expectancy. These are what we refer to as linear weight values (lwts). It’s not an elegant name by any means, but it is an elegant system. Here are the linear weight values for some events:

Event NIBB HBP 1B 2B 3B HR SB CS Value 0.308 0.333 0.461 0.76 1.033 1.402 0.194 -0.435

These are the standard values as provided by Tango. Of course there are other events we could include, but I’m keeping it strictly to the events used in wOBA. The run values vary depending on the environment, of course.

We can begin to see the relationship between events—a non-intentional walk is about 0.153 runs less than a single is, on average. A double is about 0.299 runs more, meaning that it is worth less than two singles, and this makes sense: Two singles usually result in a man on first and third (two men on), whereas a double results in only one man on base. A triple is about 0.273 runs more than a double, and the home run naturally trumps all. It’s interesting to see that the run value of a stolen base is tiny, but the value of a caught stealing is quite large. Obviously, this is because a CS not only creates an out, but it removes a runner from the basepaths as well.

Now, let’s return to OPS for a quick second. Remember how I said it undervalues on-base events? It turns out that if we divide each event by four, it gives us the marginal values:

*OPS = (.25 x TBB) + (.5 x 1B) + (.75 x 2B) + (1.00 x 3B) + (1.25 x HR)*

Just as we expected! OPS undervalues walks. The weight for home runs is also off by quite a wide margin, so we cannot trust these weights.

Now before I get in to the calculation of wOBA, I want to point something out: A hitter, as we all know, has no control over the base/out state he enters. Some players get more opportunities than others to drive in runners or to move them over. This is why we take the average value of each event.

So while it is absolutely true that some players are poor with men on base and others excel, linear weights do not take this into account. Each event is therefore treated as if it creates the same amount of runs no matter the situation, and this is why we refer to linear weights as being a context-neutral statistic.

And of course, not everyone has access to empirically derived linear weight values—but we can estimate them based on what we know about the relationship between events. These are Tango’s approximations:

Event Calculation AL NL MLB Runs per Out Runs / (IP x 3) 0.181 0.166 0.173 NIBB Runs per Out + 0.14 0.321 0.306 0.313 HBP lwtsNIBB + 0.025 0.346 0.331 0.338 1B lwtsNIBB + 0.155 0.476 0.461 0.468 2B lwts1B + 0.3 0.776 0.761 0.768 3B lwts2B + 0.27 1.046 1.031 1.038 HR 1.4 1.400 1.400 1.400 SB 0.2 0.200 0.200 0.200 CS -1 x (2 x Runs per Out + 0.075) -0.436 -0.406 -0.420

Why is NIBB dependent on the runs scored per out? The short answer is so that we can better estimate the weights based on the run environment. Linear weights are as good as they are because they’re highly sensitive to the setting.

As shown in the table, more runs are being scored per out in the AL (.181 R/O) than in the NL (.166). That’s quite a bit of disparity, and it’d be wrong to assume that the run values will remain static. As mentioned earlier, and as Tango states in one of his articles, we can identify patterns between events. I pointed out the value of a single is about 0.15 runs greater than a non-intentional walk, a double 0.3 runs more than a single, and so on. The estimations you see in the table above are simply a reflection of that.

The run values for both HR and SB are fixed, and this is because both events remain static despite the changes in run environment. As shown in Tango’s table, the value of a HR and a SB show the least amount of movement among the events included in wOBA (.218 and .105, respectively) whereas events like triples (.693) and doubles (.540) show a lot of fluctuation.

### 3. Transforming linear weights into a rate

Now that we have our run values, we’re ready to begin the transformation from runs to rate. First things first, we need to figure out what the league on-base percentage is so we can match the coefficients to fit it. Since we’re excluding intentional walks in wOBA, we need to exclude them from OBP as well. OBP is rewritten as:

OBP = *(H + NIBB + HBP) / (PA – IBB)*

This gives us an OBP of .330 for the AL, .322 for the NL, and .326 for MLB. This is what we’re aiming for. Next, we multiply our run values by the frequency of the event and add them together. These are our linear weight runs. By dividing estimated runs created by outs made—defined as (AB – H + SF)—we now have the run value of an out. The 2009 AL has a value of .283, the NL .263, and MLB .272. The value generally sits in the .26-.30 range. What does this do? Why are we doing this?

In terms of standard linear weights, it effectively sets the league average to 0. If we write out a linear weights equation for the AL, we get:

LWTS = *.32 x NIBB + .35 x HBP + .47 x 1B + .78 x 2B + 1.05 x 3B + 1.40 x HR + .20 x SB – .44 x CS – .28 x (AB – H + SF)*

This is a necessary step for any linear weights equation. If we plug in the frequency of each event for the league, we get a value of exactly 0 (not for the sample provided, though, due to rounding). If a hitter produces better than the league average—i.e. he has a higher ratio of runs created to outs made than an average hitter—he will have a positive runs total, meaning that he added x runs above an average hitter. Conversely, if a hitter has a lower ratio of runs created to outs made, he will have a negative number, meaning that he added x runs less than what an average hitter would do in his at-bats.

The formula shown above gives us the run values relative to the out. In rate statistics, the value of an out in the numerator is always zero. That said, if we absorb the value of an out into the numerator by adding the out value to each batting event, we effectively change the values from being in terms of outs to plate appearances. That gives us these values:

Event AL NL MLB NIBB 0.604 0.569 0.585 HBP 0.629 0.594 0.610 1B 0.759 0.724 0.740 2B 1.059 1.024 1.040 3B 1.329 1.294 1.310 HR 1.683 1.663 1.672

You’ll notice that both SB and CS are not altered. This is because neither event is a part of plate appearances, which means that neither one of them is a positive or negative batting out.

We’re almost finished! Now here’s the fun part—multiply the revised weights (and SB/CS run values) by the frequency of the event, and divide by plate appearances (sans the intentional walk). This gives us a rate of .282 for the AL, .260 for the NL, and .270 for MLB. In other words, it essentially gives us a weighted batting average.

But since we’re looking to match it to OBP, we need to create a scale. If our desired OBP is .330 for the AL, this means that we have .282/.330 = 85.5% of our desired rate. And from there, we can solve for our scale by dividing our desired OBP with our current rate. Doing this gives us a value of 1.17 for the AL, 1.24 for the NL, and 1.21 for MLB. This is the scale we want—this is the scale that allows us to generate coefficients to match OBP. All that’s left is to multiply the weights by the scale, and that gives us our coefficients:

Event AL NL MLB NIBB .71 .70 .71 HBP .74 .73 .74 1B .89 .89 .89 2B 1.24 1.27 1.25 3B 1.56 1.60 1.58 HR 1.98 2.06 2.02 SB .23 .25 .24 CS -.51 -.50 -.51

Voilá! We’re finished. Our formula for MLB in 2009 would then be:

(PA – IBB)

__(.71 x NIBB + .74 x HBP + .89 x 1B + 1.25 x 2B + 1.58 x 3B + 2.02 x HR + .24 x SB – .51 x CS)__(PA – IBB)

It’s important to remember that since we are excluding IBB from the numerator, it will be excluded from the denominator as well. It’s being treated as a non-event. This is something that tends to be overlooked in published wOBA formulae. Once we plug in the numbers with the coefficients, we get a wOBA of .330 for the AL, .322 for the NL, and .326 for MLB. Exactly what we were aiming for. wOBA essentially works in three steps:

1. Determine linear weight values.

2. Raise it above the value of an out.

3. Multiply it by a scale to match OBP.

Furthermore, we can figure runs created from wOBA rather easily:

wRC = *(((wOBA – LgwOBA) / Scale) + (LgRuns/LgPA)) x PA*

And runs above average:

wRAA = *(wOBA – LgwOBA) / Scale x PA*

And there you have it—linear weights that can be expressed as a rate statistic, runs created or runs above average in a few simple steps. wOBA can be park-adjusted if you divide by the square root of the park factor.

### 4. EqA or wOBA?

There are two powerhouses on the Web that carry advanced, modern-day statistics. One is Baseball Prospectus, and the other is FanGraphs. And each has its own über rate. At BP, they use Equivalent Average. At FanGraphs, they use wOBA. A clash of the titans is almost inevitable, and the battle for “who has the better rate statistic” is sure to ensue (as I’m writing this, I hear EqA developer Clay Davenport is going to present an article testing the two).

Not long ago, BP writer Jay Jaffe offered some reasons as to why EqA—now being renamed as TAv (“True Average”)—is a better choice than wOBA. He made three claims:

1. The scale of Equivalent Average is easier for the casual fan to understand than the scale of OBP. This is undoubtedly true, but as I demonstrated above, the transition from the OBP scale to a batting average scale is a minor one—simply divide by the scale, and we have a weighted batting average. If you know what scale we’re using, we can make the switch from OBP to BA in a matter of seconds.

In addition, one could argue that if we truly want to increase awareness of advanced metrics, we shouldn’t encourage a rate statistic on the batting average scale due to the limited relevance of BA to the run-scoring process. This isn’t to say that it’s wrong to use BA as a scale, though. Really, it comes down to the preference of the creator—Clay Davenport wanted to make it accessible for the casual fan, while Tango wanted to make his accessible for the more statistically savvy people. It does not mean that one is “better” than the other.

2. wOBA is not park-adjusted. Well, the raw form isn’t—but it can be. As noted earlier, all one has to do is divide wOBA by the square root of the park factor. To play Devil’s Advocate again, one could argue that it would behoove us to present a non-park-adjusted version of the rate. Not all park factors are created equal, as some are far less rigorous than others. If we present both an unadjusted rate and a park-adjusted one, it allows for us to use the park factors of our choice.

3. Equivalent Runs yields a lower RMSE than Weighted Runs Created. Well, this very well may be true. It might not. Christina Kahrl of BP wrote a while back that EqA is “testably more accurate.” Colin Wyers followed up with a test between the two and found that they were in a virtual dead heat. Wyers’ test from 1974-2008 showed EqR winning by a slim margin, but wOBA took the lead from 1993-2008.

It seems that both rates track the run-scoring process equally well, or with one marginally better than the other. For the sake of argument, though, let’s assume EqR is the better run estimator over a greater length of time. If this is true, does this mean that it’s better than wOBA?

Not necessarily. We know that EqA works pretty darn well on the team level, but we can’t be so sure about the individual level. After all, OPS works pretty well on the team level despite its ugly weighting. If the weights are distorted, then it’s not wise to use it on individuals. The raw form of EqA is calculated through this formula:

AB + BB + HBP + CS + (SB/3)

__H + TB + 1.5 x (BB + HBP) + SB__AB + BB + HBP + CS + (SB/3)

One thing really sticks out to me, and it’s that a coefficient of 1.5 is placed in front of walks. This indicates that walks won’t be as undervalued as they are in OPS, at least not as much. Brandon Heipp’s “plus-one” method, which is another means of finding linear weight values, gives us the intrinsic weights from EqR (these are from 1990-2005):

EqR = *.347 x TBB + .501 x 1B + .810 x 2B + 1.119 x 3B + 1.428 x HR + .225 x SB – .238 x CS*

The weights seem a bit off, but if we set the value of total walks to a normal linear weight value (around .33 for total walks), we get:

EqR = *.33 x TBB + .484 x 1B + .793 x 2B + 1.102 x 3B + 1.411 x HR + .208 x SB – .221 x CS*

This looks much better. The important thing is that the disconnect between events is correct for the main events—a single is correctly .15 runs more than a walk, and the difference between hit types is exactly .309, which is right around where it ought to be. The run value for CS, obviously, is off.

I should note, however, that EqA is converted into an absolute runs created formula, whereas traditional linear weights are typically expressed as runs above or below the league average. When we switch to linear weights runs created, however, the value of a CS changes from around -.44 to around -.27. So in a sense, EqA has it right.

That being said, the caught stealing is being treated as essentially a polar opposite of a SB. This isn’t true. When all is said and done, the weights in EqA may or may not be optimal for individual hitters—but they work all right, except for the odd treatment of CS.

It would appear to be more practical to use wOBA for individual hitters, to avoid any issues with SB/CS or any weighting problems not brought to light in the sample provided. This isn’t to say that it would be wrong to use EqA on individual hitters, though. It ultimately comes down to the preference of the user.

### 5. Concluding thoughts

A quick summary:

{exp:list_maker}While the triple slash rates work well as a descriptive statistic, it doesn’t provide us with all of the information we look for in a performance metric. Each rate individually is limited in its usage, but overall is useful for painting a picture of a player’s offensive profile. OPS is an improvement in that it attempts to provide us with a number for a player’s total offensive contributions, but it ultimately fails for individual players because it weights events improperly.

One way to identify the correct value of each event is to look at how much an event contributes to the run scoring process via run expectancy charts. This allows us to accurately weight events.

Once we create a scale, we raise the marginal run value above an out and multiply it by the scale. This gives us a rate that resembles on-base percentage (hence the name “Weighted On-Base Average”).

Baseball Prospectus’ Equivalent Average is quite similar to wOBA, and might be slightly better at predicting team runs scored. But the intrinsic weights in EqA aren’t as fine-tuned as those in wOBA, meaning that it may be better to use wOBA for individual hitters. This does not, however, mean that it is wrong to use EqA on individuals. {/exp:list_maker}

wOBA is great, but it’s certainly not perfect (this goes without saying—no statistic is perfect). I should note that since all outs are created equal in wOBA (aside from CS), there is no distinction made for players that have a propensity to hit into double plays or strike out more than the average player. A strikeout is barely worse than a regular batting out, but grounding into a double play is extremely harmful.

wOBA is a linear weights formula, but it is relatively simple—so obviously, a more rigorous lwts equation will most likely yield better results. The important thing to keep in mind is that it serves a clear upgrade from the traditional statistics, and that it works exceptionally well for measuring an individual hitter’s rate of production.

*” … Going wubba wubba wubba is the thing to do / every time you wubba us, we’ll wubba you …”*

**References & Resources**

*The conversion of rates to runs is (2 x Rate / League Rate – 1) x (LgRuns/LgPA) x PA. I used 1.5 x – 1 for SLG instead, at the suggestion of Patriot in a conversation we had a while back. I held the league rate and runs per plate appearance constant, with .259 for AVG, .327 for OBP, .396 for SLG, and .723 for OPS. The point isn’t to give a precise figure; rather, I wanted to demonstrate the discrepancy in accuracy. **Please do not use this test as gospel for accuracy.** Thank you.

I’d like to thank Patriot for his outstanding (and enlightening) work on EqR, and I’d like to extend a **huge** thank you to Tom Tango—I don’t know where I’d be without his help.

Great write up on wOBA, JT!

Excellent article.

There was a discussion in The Book blog a while back that utilized a simplified version of wOBA, which I think is a little more accessible. (It does not include stealing.) The formula is:

.36*(2*(NIBB+HBP)+1.5*H+TB)/(AB+NIBB+HBP)

You can simplify further by ignoring IBBs and HBPs, in which case the formula is:

.36*(2*BB+1.5*H+TB)/(AB+BB)

Thanks for the kind words, guys!

Craig, I’ve never seen that simplified version before- thanks for sharing!

If I remember right, another approximation is written along the lines of (1.8*OBP + SLG)/3. It’s the same as Aaron Gleeman’s GPA, but we divide by 3 rather than 4.

Also, Kincaid at 3-D Baseball wrote an equation using straight OBP and SLG as well- you can find the article (which is a fantastic read) here:

http://www.3-dbaseball.net/2009/11/converting-obpslg-to-woba.html

It’s pretty cool that we can get some good approximations using simpler methods.

Question: If wOBA is scaled to AL and NL values, then is a NL player’s wOBA expected to change upon a move to the AL?

Evan,

Let’s use Albert Pujols as an example. If we apply the coefficients for both leagues to his batting line, we get .445 for the AL and a .452 for the NL.

IOW, a straight translation (assuming no difference in the counting stats) will give a different wOBA because of the differences in the run environment.

I hope this answers your question!

It does, thank you. One more: do the coefficients change over time, depending on the year, or will the coefficients remain the same in perpetuity, even if run environments change?

Oh, by the way, I think I see a typo. “It’s important to remember that since we are excluding NIBB from the numerator…”—shouldn’t it be “excluding IBB”?

Good question!

The coefficients are going to vary from year to year. It all depends on the run environment. Let’s say we’re calculating wOBA for the 1968 season. We get these coefficients:

NIBB: .68

HBP: .72

1B: .91

2B: 1.35

3B: 1.74

HR: 2.34

SB: .29

CS: -.48

Compare that to the present era (i.e. 2000-2009), and there’s quite a big difference between some of the events. You can get a better idea of how the coefficients vary over time with Tango’s table:

http://tangotiger.net/bdb/lwts_woba_for_bdb.txt

It’s from 1871-2008 using the method outlined above.

Thanks for pointing out the typo! It’s been fixed.

Kincaid hit the nail on the head. Thanks for the assist!

thx for the help guys.