## Seeing the Complete Picture: Building New Statistics to Find Value in the Details

Attempting to accurately estimate the number of runs produced by players is one of the most important tasks in sabermetrics. While there is value in knowing that a player averages four hits every ten at-bats, that value comes from knowing that more hits tend to lead to more runs. On-base percentage became popularized through *Moneyball* in the early 2000s because the Oakland Athletics, among other teams, realized that getting more runners on base would lead to more opportunities to score runs.

Knowing a player’s batting average or on-base percentage can be informative, but that information does nothing to quantify how the player contributed to a team’s ability to score runs. The classic method for determining how many runs a player contributes to his team is to look at his RBI and runs scored totals. However, both of those statistics are extremely dependent on timely hitting and the quality of the rest of the team. A player will not score many runs nor have many RBI opportunities if the rest of the players on his team, particularly the players around him in the lineup, are not productive.

One of the more popular sabermetric methods to estimate a player’s run production is to find the average number of runs that certain offensive events are worth across all situations and then apply those weights to a player’s stat line. In this way, it doesn’t matter if a player comes to the plate with the bases loaded every time or the bases empty every time, just that he produced the specific type of event.

Here is a chart that shows the average number of runs that scored in an inning following each combination of base and out states in 2013^^.

Base State |
0 OUT |
1 OUT |
2 OUT |

0** |
0.47 |
0.24 |
0.09 |

1 |
0.82 |
0.50 |
0.21 |

2 |
1.09 |
0.62 |
0.30 |

3 |
1.30 |
0.92 |
0.34 |

1-2 |
1.39 |
0.84 |
0.41 |

1-3 |
1.80 |
1.11 |
0.46 |

2-3 |
2.00 |
1.39 |
0.56 |

1-2-3 |
2.21 |
1.57 |
0.71 |

We can see in the chart that in 2013, with no men on base and zero outs, teams scored an average of 0.47 runs through the end of the inning. If a batter came to the plate in that situation and hit a single, the new base/out state is a man on first with zero outs, a state in which teams scored an average of 0.82 runs through the end of the inning. If the batter had instead caused an out, the new base/out state would have become bases empty with one out, a state in which teams only averaged 0.24 runs through the remainder of the inning. Consequently, we can say that a single in that situation was worth 0.58 runs in relation to the value of an out in the same situation. If we repeat this process for every single hit in 2013, and apply the averages from the chart to each single depending on when they occur, we find that an average single in 2013 was worth approximately 0.70 runs in relation to the average value of an out.

This is known as the linear weights method for calculating the context-neutral value of certain events. Check this article from the FanGraphs Library, and the links within, for more information on linear weights estimation methods.

There have been a variety of statistics created to estimate a player’s performance in a context-neutral environment using the linear weights method over the last few decades. Recently, one of the more popular linear weight run estimators, particularly here at FanGraphs, has been *weighted On-Base Average *(wOBA) introduced in *The Book: Playing the Percentages in Baseball*. wOBA is arguably the best, publically-available run estimator, but I think it has potential for improvement by incorporating more specific and different kinds of events into its estimate.

wOBA is traditionally built with seven statistics: singles, doubles, triples, home runs, reaches on error, unintentional walks, and hit by pitches. While some versions may exclude reaches on error and others may include components like stolen bases and caught stealing, I will focus exclusively on the version presented in *The Book* that uses those seven statistics. By limiting the focus to just those seven components, wOBA can be calculated perfectly in every season since at least 1974 (as far back as most play-by-play data goes), and can be calculated reasonably well for the entire history of the game.

While it can be informative to see what Babe Ruth’s wOBA was in 1927, when analyzing players in recent history, particularly those currently playing, accuracy in the estimation should be the most important consideration. Narrowing the focus to just seven statistics, some broadly defined, will limit how accurately we can estimate the number of runs a player produced in a context-neutral environment. The statistics I refer to as “broadly defined” are singles and doubles. I say that because it is a relatively easy task to convince even a casual baseball fan that not all singles are created equally.

If we compare singles hit to the infield with singles hit to the outfield, we’ll notice that outfield singles will cause runners on base to move further ahead on the basepaths on average than infield singles. For example, in 2013, with a man on first, only 3.2% of infield singles ended with men on first and third base compared to 29.9% of outfield singles. If outfield singles create more “1-3” base states than infield singles, and we know from the chart above that “1-3” base states have a higher run expectancy than “1-2” base states in the same out state, then we know that outfield singles are producing more runs on average than infield singles. If outfield and infield single are producing different amounts of runs on average, then we should differentiate between the two events.

Beyond just breaking down hits by fielding location, we can refine hit types even further. If we differentiate singles and doubles by direction (left, center, right) and by batted ball type (bunt, groundball, line drive, fly ball, pop up) we can more accurately reflect the value of each of those offensive events. While the difference in value between a groundball single to right field compared to a line drive single to center field is minimal, about 0.04 runs, those minimal differences add up over a season or career of plate appearances. Reach on error events should also be broken down like singles and doubles, as balls hit to the third baseman that cause errors are going to have a different effect on the base state than balls hit to the right fielder that cause errors.

The two other ways that wOBA accounts for run production by a batter are through unintentional walks and hit by pitches, notably excluding intentional walks. If a statistic is attempting to estimate the number of runs produced by a player at the plate, I believe the value created by unskilled events should also be counted. While it takes no skill to stand next to home plate and watch four balls go three feet wide of the strike zone, the batter is still given first base and affects his team’s run expectancy for the remainder of the inning. Distinguishing between runs produced from skilled and unskilled events is something that should be considered when forecasting future performance as unskilled events may be harder to repeat. However, when analyzing past performance, all run production should be accounted for, no matter the skill level it required to produce those runs.

There is an argument that the value from an intentional walk should just be assigned to the batting team as a whole, as the batter himself is doing nothing to cause the event to occur; that is, the batter is not swinging the bat, getting hit be a pitch, or astutely taking balls that could potentially be strikes. However, as the players on the field are the only ones who directly affect run production — it isn’t an abstract “ghost runner” on first base after an intentional walk, it’s the batter — the value from the change in run expectancy must be awarded to players on the field. While it can be difficult to determine how to award that value for the pitching team with multiple fielders involved in every event (pitcher and catcher most notably and the rest of the fielders for balls put into play), the only player on the batting team who can receive credit for the event is the batter.

If we accept that the intentional walk requires no skill from the batter, but agree that he should still receive credit for the event, then we can extend that logic to all unskilled events in which the batter could be involved. Along with intentional walks, that would include “reaching on catcher’s interference” and “striking out but reaching on an error, passed ball, or wild pitch.” In those cases, it is the catcher rather than the pitcher causing the batter to reach base but it doesn’t matter to the batting team. If the team’s run expectancy changed due to the batter reaching base, it makes no difference if it was the pitcher, catcher, or any other fielder causing the event to occur.

When building wOBA, the value of the weight for each component is calculated with respect to the value of an average out, like in the example above. Using the average value of all outs is very similar to using the broad definition of “single,” as discussed earlier. Very often we hear about productive outs, and yet we rarely see statistics quantify the value of different types of outs in a context-neutral manner. If a batter were to exclusively make all of his outs as groundballs to the right side of the infield, he would hurt his team less than if he were to make all of his outs as groundballs to the center of the infield. Groundouts to the right side of the infield allow runners on second and third base to advance more easily than groundouts to the center of the infield. Additionally, groundouts to the center of the infield have more potential to turn into double plays than groundouts to the right side of the infield. As above, the differences in value are minimal — around 0.04 runs in this case — but they add up over a large enough sample.

To deal with the difference in the value of outs, all specific types of outs should also be included in any run estimation, weighted in relation to the average value of an out. For instance, in 2013 the average value of all outs in relation to the average value of a plate appearance was -0.258 runs while the average value of a fly out to center field in relation to the average value of a plate appearance was -0.230 runs. Consequently, we can say that a fly out to center field is worth +0.028 runs in relation to the average value of an out. We can do the same for groundouts to the left side of the infield (-0.015) or lineouts to center field (+0.021), as well as every other type of out broken down by direction, batted ball type, and fielding location. Interestingly, and perhaps not surprisingly, all fly outs and lineouts to the outfield are less damaging than an average out while all types of outs in the infield are more damaging than an average out, except for groundouts to the right side of the infield and sacrifice bunts.

Taking the weights for each of these 104 components, applying them to the equivalent statistics for a league average hitter, and dividing by plate appearances, generates values that tend to fall between .280 and .300 based on the scoring environment, somewhat similar to the batting average for a league average player. In 2013, a league average player would have a score of .256 from this statistic compared to a batting average of .253. To make the statistic easily relatable in the baseball universe, I’ve chosen to scale the values in each season to batting average. The end result is a statistic called *Offensive Value Added rate* (OVAr) which has an average value equal to that of the batting average of a league average player in each season. So, if a .400 batting average is an historic threshold for batters, the same threshold can be applied to OVAr. Since 1993, as far back as this statistic can be calculated with current data, Barry Bonds is the only qualified player to post an OVAr above .400 in a single season, and he did it in four straight seasons (2001-2004).

Where OVAr mirrors the construction of the rate statistic wOBA, another statistic, *Offensive Value Added *(OVA), mirrors the construction of the counting statistic *weighted Runs Above Average* (wRAA). Here is the equation for OVA followed by the equation for wRAA.

OVA = ((OVAr – league OVAr) / OVAr Scale) x PA

wRAA = ((wOBA – league wOBA) / wOBA Scale) x PA

OVA values tend to be very similar to their wRAA counterparts, though they can potentially vary by over 10 runs at the extremes. In 2013, David Ortiz produced 48.1 runs above average according to OVA and “just” 40.3 runs above average according to wRAA, a 19.4% increase from his wRAA value. Of Ortiz’s extra 7.8 runs estimated by OVA, 4.3 of those runs came from the inclusion of intentional walks, and 2.5 of those runs came from Ortiz’s ability to produce slightly less damaging outs through his tendency to pull the ball to the right side of the field.

You won’t find many box scores or player pages that list direction, batted ball type, or whether the ball was fielded in the infield or outfield, but the data is publicly available for all seasons since 1993. While wOBA gives non-programmers the ability to calculate an advanced run estimator relatively easily, if we have data that makes the estimation more precise, then programmers should take advantage. Due to the relative difficulty in calculating these values, I’m providing links to spreadsheets with yearly OVAr and OVA values for hitters, Opponent OVAr and OVA values for pitchers, splits for hitters and pitchers based on handedness of the opposing player, and team OVA and OVAr values for offense and defense, with similar splits. Additionally, I’ve included wRAA values for comparison. Those values may not exactly match those you would find on FanGraphs due to rounding differences at various steps in the process, but they should give a general feel for the difference between OVA and wRAA.

I’ve obviously omitted the meat of the programming work, as I felt it was too technical to include every detail in an article like this. For more information on run estimators built with linear weights methodology I’d highly recommend reading *The Book,* *The Hidden Game of Baseball *by John Thorn and Pete Palmer, or any of a variety of articles by Colin Wyers over at Baseball Prospectus, like this one. I used ten years of play-by-play data to get a substantive sample^{++ }of when each type of event happened on average, and I used a single season of data to create the run environments. Otherwise, the general construction of OVAr mirrors the work done by Tom Tango, Mitchel Lichtman, and Andrew Dolphin in *The Book*.

The next step for this statistic is to make it league and park neutral (nOVAr and nOVA). I chose to omit this step for the initial construction of these statistics as it was also omitted in the initial construction of wOBA and wRAA. Also, the current methods to determine park factors used by FanGraphs and ESPN, among other sites, are somewhat flawed and not something I want to implement. Until that next step, enjoy a pair of new statistics.

OVAr and OVA, Alphabetical Batters

OVAr and OVA, Ordered Batter Splits

OVAr and OVA, Alphabetical Batter Splits

OVAr, Ordered Qualified Batters

OVAr, Ordered Qualified Batter Splits

Opponent OVAr and OVA, Ordered Pitchers

Opponent OVAr and OVA, Alphabetical Pitchers

Opponent OVAr and OVA, Ordered Pitcher Splits

Opponent OVAr and OVA, Alphabetical Pitcher Splits

Opponent OVAr, Ordered Qualified Pitchers

Opponent OVAr, Ordered Qualified Pitcher Splits

OVAr and OVA, Alphabetical Weights

^^ These averages exclude all events in home halves of the 9^{th} inning or later to avoid biases created by walk-off hits and the inability of the home team to score an unlimited number of runs in 9^{th} inning or later like they can in any other inning.

** A number in the Base State column represents a runner on that base, with 0 representing bases empty.

++ I have one note on sample size that I didn’t think fit anywhere comfortably in the main body of the article. The biggest issue with a statistic built with very specific events is that some of those events are extremely rare. For instance, groundouts to the outfield have happened just 111 times since 1993, compared to groundouts to the infield that have happened 891,175 times since 1993. Consequently, the average value of outfield groundouts, split up direction, can vary substantially from year to year as different events are added or taken away from the sample. I choose to use a ten-year sample to attempt to limit those effects as much as possible but they still will be evident upon close examination. With that sample size, in 2013 a groundout to left field was worth -0.447 runs in relation to the average value of an out. In 2006 the same event was worth -0.089 runs, while in 2000 it was worth +0.154 runs.

As long as the statistic is built in a logically consistent manner, I don’t mind that low frequency events like outfield groundouts and infield doubles vary somewhat from year to year in estimated value, as the cumulative effect will be quite minimal. That being said, as I am trying to estimate the value of events as accurately as possible, the variation in value is a bit off-putting. It may be that a sample of 20 or more years would be necessary for those rare events, with a smaller sample size for the more common events. That adjustment will be considered for the nOVAr and nOVA implementations, but for OVAr and OVA I wanted the construction to be completely consistent.

Print This Post

This is really fantastic work and I’d love to see more from you.

Cameron and Appelman, hire this man now before someone else does.

And OVA should be added to the site post haste

Be careful with your intentional walk rationale. I am thinking of the credit that should be ginven to a player who “earns” an intentional walk because he is a good hitter and the other team decides to walk him because of his skill at the plate.

Otherwise, I think this is great work. If you can quantify a value more specifically than traditional wOBA, then do it, as you have done. Specifically I really liked your breakdown of infield vs. outfield singles and their differing values, as well as the incorporation of the different values of outs.

Thanks also for sharing the spreadsheets. I will enjoy going through them.

For me, it doesn’t matter why the pitching team chooses to issue an intentional walk only that the batter was given first base. The pitching team’s manager may decide to intentionally walk a batter for a variety of reasons, some of which may be in the best interest of the pitching team when considering the quality of the batter on deck or the specific game context, but the batter still increased his team’s run expectancy in a context-neutral manner. It isn’t the fault of the intentionally walked batter that the next batter in the lineup isn’t good enough to make the pitching team pay, all he can do is take his base.

Thanks for the feedback, though. Enjoy the spreadsheets.

I know there’s no way to remedy this, but it seems to me that either wOBA or OVA are going to overestimate the value of hitters batting 8th in the National League. With the pitcher batting behind them, they draw a lot more walks, both explicitly intentional and unintentional “intentional” walks. As a quick example, Kozma walked once in 77 plate appearances when he wasn’t hitting eighth. Hitting eighth, he walked 33 times in 371 plate appearances. But the value of those walks was much less than average, because the expected runs in an inning when the next batter is a pitcher is much less than average.

Great stuff! I second what MustBunique said. Differentiating between infield and outfield singles is a great idea.

Hmm, it seems odd that you’re seeing so much variation in the value of something as common as an infield ground out to the left side. I can’t help but wonder if that’s due to a bug, or some methodological problem.

It’s not infield groundouts to the left side, which as you said are quite common, it’s OUTFIELD groundouts to the left side, which are quite rare. They generally occur when the left fielder is playing shallow in late game situations, or when a batter hits a rocket to left field and the left fielder is able to throw a runner out at second or third base. Since 1993, there have been a total of 34 groundouts to left field, so we’re not talking about an event that is going to wildly change any estimation, but the variation in value is still bothersome.

Ah, OK, sorry — I read that part a little too quickly. Yeah, a groundout to left field is a total fluke of a play. How does that even happen? A runner trips on his way to third base and gets forced out? Anyway, what’s the point of even factoring in some fluke play like that?

I agree they are definitely flukey, but plenty of plays are flukey, even more common ones. The difference between a triple and an inside-the-park home run is often a ball hitting a wall awkwardly or a fielder falling or something else weird happening.

The goal of the statistic is to ignore skill and just measure the average value of all types of events, even the flukey ones. I believe players should still receive credit for those kind of plays even if all they did was put the ball in play and allow random forces to cause the fluke.

As I’m not a member over at Tango’s blog, and can’t seem to become one, I figured I’d use this space to address a post he made over there yesterday.

http://tangotiger.com/index.php/site/comments/he-had-me-until-he-said-batting-average#comments

As well as point people towards this discussion from a few months ago.

http://tangotiger.com/index.php/site/comments/are-we-ready-to-move-on-from-batting-average-and-tav#comments

I went back and forth on the scaling issue for quite some time. Initially, OVAr mirrored the wOBA construction completely for ease of comparison between the statistics, so OVAr was scaled to on-base percentage. Then I began to consider not scaling OVAr at all, as the unscaled version actually reflects the estimated number of runs above average produced per plate appearance, which is what we’re trying to get in the first place. However, it was a comment by MGL in that second article above that ended up swaying me towards batting average.

It is far easier for me to immediately recognize a good, bad, or average batting average than on-base percentage. I know what .200, .300, and.400 batting averages mean for hitters without a second thought. For players in the last few decades, when I see someone hit .300 in a season, I immediately know that he had a strong, above-average year. A .250 mark is mediocre in general, and below .200, the dreaded Mendoza line, is just plain bad. I know that Ty Cobb hit .400 three times in his career, that Ted Williams was the last to do it, and that it’s a huge deal when anyone is close these days, which never happens. So, when I see that Barry Bonds is the only player to have a .400 OVAr in a season since 1993, and he did it in four straight seasons, I immediately know he did something really remarkable.

I believe that the determining factor when choosing a scale should be to choose the scale that most successfully facilitates interpretation and understanding of the raw data. As the language of offense has been in terms of batting average for so long, it only makes sense that a modern statistic trying to communicate offensive value would speak in that language as well. In my opinion, the batting average scale is the best choice.

I think the problem with saying we “immediately recognize a good, bad, or average batting average” is that that intuition doesn’t actually align with offensive value — taking us all the way back to arguments about the value of Joe Morgan or Kevin Youklis-as-draft-prospect.

Given all the great data you’re using with the intention of moving the conversation forward, it just doesn’t make sense to me drop in a scaling variable at the end to revisit batting average.

“Then I began to consider not scaling OVAr at all, as the unscaled version actually reflects the estimated number of runs above average produced per plate appearance, which is what we’re trying to get in the first place.”

I really think this is the place to go. My one problem with wOBA is that the number doesn’t mean anything by itself. Why scale something when the number actually represents something. If a .356 OVA means that the player averaged 0.356 runs above average per plate appearance, let’s just leave it at that.

Have you analyzed what sort of hitters are likely to overachieve or underachieve in OVA as compared to RAA? Just browsing the list it seems like the guys that hit the ball really hard are more likely to overachieve.

I have not done a rigorous analysis yet of the statistics, but here are a few general trends I’ve seen while producing the data, all of which are pretty intuitive.

Hitting the ball hard and in the air definitely leads to the most positive events. Flyball and line drive hits and outs are almost always worth more than their groundball equivalents. The tradeoff is that flyballs tend to be outs more often than groundballs, so it can be more favorable for some batters to hit the ball into the ground if it boosts their batting average on balls in play significantly enough.

Flyballs hit to center field tend to be the most valuable type of hit and out, presumably because that is where the park is the largest. While flyouts to right field are less damaging than flyouts to left field, it’s the opposite for lineouts. I have yet to really get a grasp for why left field lineouts are less damaging than right field lineouts, but it is a consistent trend. Perhaps the generally inferior arms of left fielders make it easier for baserunners to advance.

The best direction to hit groundballs is the right side, for the reasons discussed in the article. Consequently, left-handed pull hitters tend to produce more valuable events when putting the ball into play than right-handed pull hitters. If two opposite-handed, heavy-pull hitters had the same traditional stat line, the left-handed player would likely have produced more runs above average than the right-handed player.

Brilliant stuff Colin. While I’m not 100% on board with the intentional walk methodology, the rest of the ideas concerning the different value of different batted ball types is something that definitely needs to be considered more in player evaluation.

Being able to quantify it into a stat makes it even that much better.