The Intrinsic Value of a Batted Ball

It isn't surprising to see Miguel Cabrera's Intrinsic Value is among the best in baseball. (via Cbl62)

It isn’t surprising to see Miguel Cabrera’s Intrinsic Value is among the best in baseball. (via Cbl62)

Back in 2014, the Yankees sponsored a competition for their fifth-starter role in spring training. Following a spirited audition, left-hander Vidal Nuno was awarded the consolation prize of a spot in the major league bullpen. But five starters is never enough. The Yankees were rained out on Jackie Robinson Day, which led to a doubleheader the next day and the need for a sixth starter five days later.

On April 20, Nuno got the nod in Tampa and capitalized by striking out six batters in five scoreless innings before watching the New York bullpen surrender his lead in the seventh. But even a casual fan with a box score, perhaps armed with an education delivered by Brian Kenny’s relentless anti-win crusade, could look past the no-decision and conclude the guy had pitched pretty well.

The still-winless Nuno got another start against the Rays on May 2 back in New York. This time he took a 2-1 lead into the fourth and, after retiring the leadoff man, appeared to get a quick second out when Evan Longoria hit a first-pitch curveball high in the air to center field. But Jacoby Ellsbury “lost it in the sky” — as he described it — and the ball landed for a triple. Nuno’s stat line absorbed two earned runs in the inning, but none would score if his center fielder had caught the fly ball. Nuno’s first win would have to wait. And this time there was not a trace of forensic evidence in the box score suggesting he deserved better.

Perhaps we need to dig deeper to get things right. Technologies like HITf/x and Statcast characterize a batted ball at contact, which provides the opportunity to separate a batted ball’s intrinsic value from its outcome. In the process, we remove the effect of factors such as the defense, the weather, the ballpark and random luck. As a result, we can define batted-ball statistics for batters and pitchers that are less subject to random variation than statistics that are based on batted-ball outcomes.

The HITf/x System

HITf/x is a system developed by Sportvision that estimates the initial trajectory of a batted ball in three dimensions using the same video sequences acquired by PITCHf/x. Each batted ball is described by three parameters. The speed, s, is an estimate of the ball’s initial speed in miles per hour. The vertical launch angle, v, is the angle the batted ball’s initial direction makes with the plane of the playing field where a vertical angle of -90 is straight down and a vertical angle of +90 is straight up.

The horizontal (or spray) angle, h, specifies the direction of the batted ball in the plane of the playing field where the direction toward third base has a horizontal angle of -45 and the direction toward first base has a horizontal angle of +45.

The result of a batted ball has a strong dependence on the (s,v,h) vector. For example, a vector of (75, 70, 35) typically is a pop-up to the first baseman, a vector of (60, -10, -15) typically is a ground ball to shortstop, and a vector of (100, 25, -35) usually results in a home run to left field. Early HITf/x studies by researchers including Peter Jensen, Brian Cartwright and Mike Fast demonstrated that HITf/x measurements provide significant advantages for analysis over previous data that included only a ground ball, line drive or fly ball descriptor for each batted ball.

Learning the Value of a Batted Ball

Sportvision generously provided data acquired by the HITf/x system for every regular-season major league game in 2014. Using these data, I generated a model for the intrinsic value of a batted ball as a function of its s, v and h parameters. The model was constructed using the data for all balls in play with a horizontal angle in fair territory that were tracked by the system, where bunts are excluded. This results in a set of 124364 batted balls, which represents more than 97 percent of the major league total for 2014.

A Bayesian framework was used to derive the mapping from batted-ball parameters to intrinsic value. If we consider the six batted-ball outcomes R_0=out, R_1=single, R_2=double, R_3=triple, R_4=home run, and R_5=batter reaches on error, then Bayes’ rule states that the probability of an outcome R_j given a measured HITf/x vector x=(s,v,h) is:

P(R_j | x) = p(x | R_j) P(R_j) / p(x)

where p(x | R_j) is the conditional probability density function for x given outcome R_j, P(R_j) is the prior probability of outcome R_j, and p(x) is the probability density function for x. For example, P(R_1 | (90,15,0)) is the probability of observing a single for a ball hit up the middle at ninety miles per hour with a vertical launch angle of 15 degrees.

Bayes’ rule is important because we can represent some of our favorite baseball statistics as a weighted sum of the P(R_j | x) values using:

S(x) = w_0 P(R_0 | x) + w_1 P(R_1 | x) + w_2 P(R_2 | x) + w_3 P(R_3 | x) + w_4 P(R_4 | x) + w_5 P(R_5 | x)

If we treat sacrifice flies as ordinary outs, then a weight vector of (w_0,w_1,w_2,w_3,w_4,w_5) = (0,1,1,1,1,0) turns S(x) into the expected batting average for a batted ball with parameter vector x. If we instead use the vector (0,1,2,3,4,0), then we get slugging percentage. The key point is that we have a way to estimate a batted ball’s value that is separate from the batted ball’s particular outcome.

How an Ace Performance Impacts Reliever Workloads
Bullpenning has its advantages, but it's great when an elite starter eats up a bunch of innings, too.

As we know, batting average and slugging percentage are deficient for describing the value of a batted ball. Weighted on base average (wOBA), on the other hand, also can be represented in the form of S(x), but it uses coefficients derived from the average run value of each event. Therefore, we define the intrinsic value of a batted ball as wOBA(x) using the weight vector (0.000, 0.892, 1.283, 1.635, 2.135, 0.920), where w_0, w_1, w_2, w_3, and w_4 for 2014 were obtained from the FanGraphs guts page, and w_5 was obtained from The Book. Thus, wOBA(x) is proportional to the expected run value of a batted ball given only what occurs at contact.

For this process to work, we still need to find the functions on the right side of Bayes’ rule. This is accomplished by applying machine learning techniques to the HITf/x data. The details of the approach are somewhat technical, but the curious reader is encouraged to follow the link. The Cliffs Notes version is that nonparametric estimates for the probability density functions are generated using a kernel method that employs cross-validation to learn an optimal set of anisotropic smoothing parameters.

Visualizing wOBA

Now that the math is out of the way, we can enjoy the fruits of our labor. Since wOBA depends on the three variables — s, v, and h — we can visualize its structure by taking lower-dimensional slices through the wOBA cube. If we fix the batted-ball speed at 93 miles per hour, for example, we get a wOBA plane that depends on the other two variables:

heatmap
For this value of s, the best results for batters occur for balls hit with vertical angles between 25 and 40 degrees that are near the left-field line (-45° to -35° in h) or the right-field line (35° to 45° in h) where ballpark dimensions are typically the shortest. These batted balls often result in home runs. Batted balls hit at the same speed with the same vertical angle are less valuable at horizontal angles near zero, which correspond to larger ballpark dimensions in center field. For this initial speed, batted balls with vertical angles near 12 degrees tend to carry over the infielders and land in front of the outfielders and have a high value for all horizontal angles.

Typical horizontal angle positions for the three outfielders are evident from the three cold zones for balls hit in the air with vertical angles between 15 and 20 degrees, and typical horizontal positions for the four infielders are evident from the four cold zones for ground balls (v < 0).

We can also examine one-dimensional slices through the wOBA volume. Let’s look at ground balls with a vertical angle of -2° that are hit at 85 and 93 miles per hour:

Figure3
Minima in the two curves correspond to the typical position of infielders with the minima near -36, -14 ,14, and 37 degrees, corresponding to the third baseman, shortstop, second baseman and first baseman, respectively. Over most horizontal angles, balls hit at 93 mph have a higher value than balls hit at 85 mph since ground balls hit at a higher speed have a higher probability of eluding a defender.

We also can consider balls hit in the air with a vertical angle of +15° at the same speeds:

Figure4
Here minima in the two curves correspond to the typical position of outfielders with the minima near -20, zero, and 20 degrees, corresponding to the left fielder, center fielder and right fielder, respectively. For this vertical angle, balls hit in the direction of an outfielder have a higher value for a speed of 85 mph because these balls often fall in front of the outfielder for hits, while balls hit at 93 mph more frequently carry to the outfielder for outs. For both the ground balls and fly balls, the largest wOBA values occur for balls hit near the foul lines (h=-45° or h=45°) which often result in extra-base hits instead of singles.

Significant wOBA differences for the same s, v and h occur between left-handed and right-handed batters due to differences in the positioning of defenders. Thus, we can define separate values, wOBAl for left-handed batters and wOBAr for right-handed batters. Let’s look at the effect of batter handedness on ground balls with a vertical angle of -2° that are hit at 93 miles per hour:

Figure5
As before, we observe four minima in each curve that correspond to the typical position of the four infielders. We see, however, that the minima for left-handed batters are shifted several degrees toward the first-base line (h=45°) compared to the corresponding minima for right-handed batters. This shift corresponds to the difference in fielder positioning as a function of batter handedness. We also see that ground balls near the first-base line have a higher value for right-handed batters since there is a lower probability of a defender in that region, and that ground balls near the third-base line (h=-45°) have a higher value for left-handed batters.

We also can look at balls hit in the air with a vertical angle of +15° and a speed of 93 miles per hour:

Figure6
Once again, the three minima in each curve correspond to the typical positions of outfielders, but the minima are shifted several degrees toward the right-field line (h= 45°) for left-handed batters. We also see left-handed batters have an advantage for batted balls hit in the direction of the right fielder (h near 20°) since the right fielder is typically playing deeper for left-handed batters, which allows additional batted balls to fall safely for hits. We observe the opposite effect for batted balls hit in the direction of the left fielder (h near -20°) since the left fielder is typically playing deeper for right-handed batters.

Intrinsic Contact Statistics

A batted ball with the parameters s, v, and h can be assigned the intrinsic value given by either its wOBAl or wOBAr value depending on the handedness of the batter. Batted balls may also be assigned an observed value given by the wOBA coefficient for the result of the batted ball. The observed value depends on several factors that are beyond the control of the batter and the pitcher, such as the defense, the weather, and the ballpark. That high fly ball Ellsbury lost in the sky had a minuscule intrinsic value of 0.040, but it scored an observed value of 1.635 after landing for a triple. We’ve used the intrinsic and observed values to derive statistics that describe batters, pitchers, defense, and park effects. Today we’ll focus on batters and pitchers.

Batters

Analysts sometimes quantify the value of a hitter’s batted balls using the average, O, of his observed batted ball values over a period of time. The statistic O is referred to as “wOBA on contact” or wOBAcon. As we’ve pointed out, however, O depends on a number of variables that are independent of the batter’s quality of contact. Thus, we propose the average, I, of the intrinsic values as a more accurate valuation of a hitter’s collection of batted balls. The batters with the highest I in 2014 among players who hit at least 300 batted balls that were tracked by HITf/x are:

Batter I
Giancarlo Stanton 0.526
Mike Trout 0.498
Miguel Cabrera 0.488
J.D. Martinez 0.482
Matt Kemp 0.477
Brandon Moss 0.476
Jose Abreu 0.469
Michael Morse 0.468
Corey Dickerson 0.465
Edwin Encarnacion 0.461

No surprises here. These guys tend to hit the ball hard.

For an individual batter, several factors can contribute to differences between the average observed outcome, O, and the intrinsic value, I, of his batted balls. Batters who are fast runners, for example, force infielders to play shallower, which compromises range and leads to additional hits. Fast runners also tend to beat out more infield hits and garner additional bases on hits to the outfield. Thus, a faster runner will tend to achieve a higher O for a given I.

Batters with a high degree of predictability in their batted balls, such as left-handed batters who hit a large majority of their ground balls to the right of second base, are easier to defend than batters who produce a more uniform distribution of batted balls. Batters with a higher degree of predictability, therefore, will tend to have a lower O for a given I. Luck also can play a role in creating differences between O and I.

The next table presents batters with the largest values of O-I during the 2014 season where both O and I are computed using the batted balls tracked by HITf/x:

Batter O−I
Starling Marte 0.072
Jose Abreu 0.063
Yasiel Puig 0.060
Adam Eaton 0.060
Billy Hamilton 0.060
Lorenzo Cain 0.059
J.D. Martinez 0.053
Josh Harrison 0.052
Andrew McCutchen 0.051
Hunter Pence 0.049

Most of the players in this list have above-average running speed, with Hamilton and Cain having exceptional speed. The top two players on the list also benefited from good luck. Marte led major league baseball by reaching base on an error 14 times in 2014, which contributed to his major league-leading O-I. Jose Abreu also experienced significant good fortune, as many of his 36 home runs just barely cleared the fence, causing his home runs to have an average intrinsic value I of 1.461, which is significantly less than the corresponding O of 2.135.

Finally, we have the batters with the lowest values of O-I during 2014:

Batter O−I
Billy Butler -0.055
Brandon Moss -0.043
Yadier Molina -0.042
Miguel Cabrera -0.042
Matt Dominguez -0.041
Alberto Callaspo -0.039
Mark Teixeira -0.038
Albert Pujols -0.038
Carlos Santana -0.036
Buster Posey -0.032

All of these batters have below-average running speed and several (Moss, Teixeira, Santana) also had sufficiently predictable batted-ball distributions against which opposing teams were able to employ extreme defensive shifts.

Pitchers

In 2001, McCracken suggested pitchers have little control over the result of opponent batted balls that are not home runs. Since then, however, a number of researchers (really, lots of them) have presented evidence that pitchers can have some effect on the expected outcome of balls in play. Despite this progress, models that isolate the impact of the pitcher on the fate of batted balls have been elusive due to the confounding effects of the defense, ballpark, weather and luck on a batted ball’s outcome.

Since the HITf/x system characterizes a batted ball at contact, the influence of these confounding factors can be removed. As with batters, we can assign the intrinsic value, I, to the collection of batted balls allowed by a pitcher. The statistic, I, provides a context-invariant measure of a pitcher’s opponent contact, which allows this aspect of his performance to be accurately quantified.

The pitchers with the lowest I values in 2014 among those who allowed at least 300 batted balls that were tracked by HITf/x are:

Pitcher I
Garrett Richards 0.304
Anibal Sanchez 0.309
Danny Duffy 0.314
Chris Sale 0.319
Matt Garza 0.328
Dallas Keuchel 0.329
Jarred Cosart 0.329
Clayton Kershaw 0.332
Alex Cobb 0.336
Johnny Cueto 0.337

Eight of the 10 pitchers in the table had an average fastball speed in 2014 above the league average, and Richards, who earned the top spot in the list, enjoyed one of the highest average fastball speeds in the majors. The success of the two softer-tossing pitchers on the list was due in part to an exceptional sinker for Keuchel and an exceptional split-fingered fastball for Cobb. An interesting topic for future research will be to study pitcher characteristics that lead to low values of I.

Value for Modeling and Forecasting

The statistics that measure the intrinsic quality of contact for batters and pitchers are influenced less by random variation from contextual variables than traditional statistics that depend on batted-ball outcomes. In addition, the new statistics can be used to separate the various skill components that contribute to a player’s performance on batted balls.

A batter’s performance, for example, can be partitioned into statistics that measure his intrinsic contact, running speed, and batted-ball distribution, which determines susceptibility to defensive shifts. An important advantage of generating separate statistics to represent distinct skills is that each statistic can be regressed and projected using its individual reliability and aging curve during forecasting.

The new statistics also allow us to investigate how players control quality of contact. We observed, for example, that many of the pitchers who were the most effective at controlling contact also exhibited an above-average fastball velocity. Given the wealth of descriptors measured by the PITCHf/x system, we have the opportunity to characterize the relationship between the quality of a pitcher’s opponent contact and his distribution and sequencing of pitches. Similarly, we can study the relationship between a batter’s intrinsic contact and his swing parameters.

Acknowledgment

I am grateful to Sportvision for providing the HITf/x data which made this work possible. I also thank Alan Nathan and Tom Tango for helpful comments on a previous draft of this article. I am happy to acknowledge the help of Qi Shi in the preparation of this document.

References & Resources


Print This Post
Glenn Healey is a professor of electrical engineering and computer science at the University of California, Irvine where he is director of the computer vision laboratory.
Sort by:   newest | oldest | most voted
Jim S.
Guest
Jim S.

Excellent stuff. Even a dummy like me can understand it. Except most of the math, of course.

Scott
Guest
Scott

I always liked what wOBAcon was trying to get at but the results were so noisy. These results really pass the smell test with speedy slap hitters topping the O-I leaderboards and pull happy lumbering DH/1B types leading the I-O charts. The pitchers also made a ton of sense. Richards yielded a .264 babip and measly 3.9% HR/FB in 2014 working his way to a fantastic run prevention season with a good but not elite K/BB ratio.

Studes
Guest
Studes

This is awesome. Is it possible to publish a list of all batter and pitcher I as a spreadsheet or list?

Glenn
Guest
Glenn

Thanks! Yes, I can put that together.

Ron Mexico
Guest
Ron Mexico

Great stuff Glenn! Would it be possible to most your accuracy rates as well? Maybe just something as simple as pr(HR|HR predicted) or something? I’m curious to know how accurate your system is.

Glenn
Guest
Glenn

Good question. We could easily build a Bayes classifier where we assign a batted ball vector x to the outcome R_j with the largest P(R_j | x). I haven’t done that. The error rate for such a classifier would be highly dependent on x since, for example, certain fly ball vectors will be almost certain home runs while some ground ball vectors will have about a 50/50 chance of being a single or an out.

Ron Mexico
Guest
Ron Mexico

That makes sense. I’m just curious as to how accurate your method is. I’m sure it’s a huge improvement over just using LD/FB/GB classification, and I’d like to get an idea of just how big that improvement is. Classification accuracy seems like the easiest approach.

Glenn
Guest
Glenn

I have appended the lists for batters and pitchers to the end of the addendum file which can be accessed at

http://fs23.formsite.com/viXra/files/f-2-2-8801494_ykeQdm4s_addendumplus.pdf

Enjoy!

Bob Backlund
Guest
Bob Backlund

This was really great research.

Brian Cartwright
Guest
Brian Cartwright

Pleasantly surprised to see my name mentioned, thank you! Less than a week ago I had a presentation similar to this article showing the relationships between vertical angles and exit velocities to different types of batted balls, in the process ‘proving’ DIPS.

If you’d like I can send you my slides, and I’d like to ask if your batted ball model is available. It would be great to match it up with some research I’m doing now.

Glenn
Guest
Glenn

I’d love to get a copy of the slides and to compare notes. I’m ghealey at uci dot edu.

tz
Guest
tz

Brian, having not seen your presentation, would it be safe to say that pitchers have a large influence on the frequency of “long hits” (HR plus deeply hit 2B and 3B) and the high run value that comes with them, but there’s a lot of random variation in how many long hits become homers?

This could simultaneously explain (1) meaningful variation across pitchers for wOBA allowed on balls in play AND (2) the high correlation McCracken found between BB/SO/HR and runs allowed, given that HR serve as a good proxy for long hits allowed.

Eric
Guest
Eric
Man oh man, I did this already, two years ago. This is why I created HEWCO, CCR and BSM. The HEWCO is how you set your lineup by the value of plate appearance outcomes, CCR is contact consistency rate, and BSM is bases moved. HEWCO doubles as a player development tool and triples as a hitting philosophy. At the heart of it, HEWCO teaches you the value of contact, saying that IF you DECREASE BOTH your walks and strikeouts, all that is left is the higher value proposition of contact, whether a 4 base “home run” error like Alcides Escobar… Read more »
Lee Trocinski
Guest
Lee Trocinski

I believe the common Billy Madison quote applies quite well here…

Andy
Guest
Andy
California has twice the population of New York. Think of all the gasoline not used if people used that train instead of driving their cars. Glenn, I much appreciate your work, I’ve had interest in this approach for several years, and definitely think it’s the way to go in the future. It does raise one fundamental problem: why play the game at all? I.e., why not have just a pitcher throwing to a batter, with no defenders, and value determined from the trajectory of the batted ball? Add up all the values to determine who wins the game? I know… Read more »
Tangotiger
Guest
Tangotiger

The LH/RH is exactly the reason.

Note also that the pull tendency is on low launch angles (i.e., GB). The higher the launch angle (i.e., FB), the more spread of the ball gets pushed toward a 0 angle spray.

You see the evidence of this in putouts of LF and RF, where RF have more putouts than LF, even though 60% of batters are RHH.

Andy
Guest
Andy

Sorry about those first three lines. That was from something else I forgot to delete before posting.

Peter Jensen
Guest
Peter Jensen

Glenn – An excellent article. If you contact me by email I would like to discuss your article in more detail. Fangraphs has my email address and I am giving my permission for them to release it to you.

Glenn
Guest
Glenn

Peter – Thanks. I would be happy to discuss. I’ll check with fangraphs. If you’d like to start the conversation I am ghealey at uci dot edu

Gary Edwards
Guest
Gary Edwards
Wow, excellent article. Thank you. A question for the author/readers/commentors: While we SHOULD give the hitter credit for “squaring up” a ball and hitting it hard (good exit speed=s and good launch angle=v), perhaps we should NOT give the hitter credit for the horizontal direction the ball was hit. Why should the intrinsic value of a batted ball (for a high exit speed, optimal launch angle ball to center) be lower if it’s it at h=0 instead of h=10 or h=-10? (Other than any “pull hitter” tendencies, mentioned in the article.) Another thought: Isn’t the intrinsic value of a pitch… Read more »
Eli Ben-Porat
Member
Member

Awesome. Quite envious of your HitFx access and I still wonder why they don’t release it alongside PitchFx.
I’m curious – if you were to throw the three variables into a simple probit regression (or tease it out of your model), what are the relative weights in terms of predicting intrinsic value? I.e. is it 5 parts velocity, 3 parts launch angle, 2 part spray angle? Also curious if hitters hit the ball harder to their pull side, which would imply some co-linearity (as I understand it) between velocity and spray angle.

Mark
Guest
Mark

Really good stuff! I really think this was a great study. I just have one question. For your wOBA estimate, you said you used The Book’s number for sac flies. I figured that number must have gone down since The Book was published, as a result of overall decreased run production. I understand that that is the data you had to work with, but is there a reason that it is okay to overlook this? Thanks, once again this was a fantastic article!

Kevin Wickering
Guest
Kevin Wickering

Mr. Healey, this is an excellent article and the wOBA graph makes a lot of sense, it makes it a lot clearer. I still do no think it explains the Oberkfell / Stapleton syndrome though. 🙂

Glenn Snieder
Guest
Glenn Snieder

Mr. Wickering… that’s the chaos theory.

I denti bianchi
Guest

Di conseguenza, i risultati ottenuti sono comparabili con quelli raggiunti dopo lunghe ore
di trattamenti odontoiatrici.

L Martin
Guest
L Martin

i’ve got a ball with Nuno’s signature and that’s one of my biggest treasures..had to spend lots of sleepless nights working for http://www.domyhomework4me.net in order to earn the money for it..still was worth it

professional phd writers
Guest

Oh L Martin you are so lucky to have that ball!!! My boyfriend would die for that chance!

wpDiscuz