The Physics of RoboUmp

Are human umpires likely to make fewer mistakes than RoboUmps? (via Keith Allison)

“It ain’t nothin’ ‘til I call it.” Bill Klem thus defined the role of the umpire. He would know, since he officiated major league baseball for 37 years and still holds the record for working 18 World Series.

There has been a lot of discussion lately about having balls and strikes called robotically using MLB’s Statcast technology. In response, Commissioner Rob Manfred stated, “In all candor, that technology has a larger margin of error than we see with human umpires” as reported by Patrick Saunders of The Denver Post.

That sounds like an invitation to think about the physics associated with Statcast to understand the potential sources of these errors. First, let’s just be sure we understand the meaning of the “error” as it is used by physicists.

Two Types of Error

In common language, an error is something that can be remedied. In scientific analysis, we call this type of error “systematic error.” In this case, the devices we are using do not report the correct values and, in principle, we can find the problem and correct it.

However, there is another type of error called “random error.” This kind of error is intrinsic to the measurement process. It can be minimized but never completely eliminated. Random error is associated with the fact that measurements are never perfectly reproducible.

Suppose you want to get a pitching machine to fire the ball right down main street. You immediately notice all the pitches are low and outside. You can probably adjust the machine to correct this systematic error. Now, the pitches are right down the middle, but they vary pitch-to-pitch by a few inches up, down, inside, or outside of the center. This random error can only be minimized, perhaps by using a better pitching machine, but it will never be eliminated completely.

There is evidence to suggest Statcast suffers from demonstrably systematic errors. Rob Arthur of FiveThirtyEight wrote about these systematic errors in April of 2017 in “Baseball’s New Pitch-Tracking System Is Just A Bit Outside.” He compared Statcast pitch locations this year with PITCHf/x pitch locations from previous years. He found the average systematic error in horizontal position across major league parks for Statcast was only slightly higher than PITCHf/x – about 0.2 inches. However, the vertical systematic error increased from under half an inch to almost 0.75 inches.

One would suspect Statcast is working to fix these systematic errors, and the current errors should drop over time as happened with PITCHf/x. After all, systematic errors can be fixed with better calibration, data analysis, and measurement techniques.

Let’s go back to the properly adjusted pitching machine to deal with random errors. If one fired thousands of pitches and recorded the number of pitches as a function of their horizontal (x) positions, the result would likely look like the graph below.

The most likely position for a given pitch is right in the middle. However, due to random errors, there is an ever-decreasing chance of the pitch actually turning up farther and farther away from the center. This type of distribution is called a normal distribution, and it is a common way deal with random errors because one can use this curve to estimate the probability of getting any given value (or range of values) for x.

The key parameter describing the normal distribution is the width of the curve. It is related to the standard deviation. The standard deviation for Statcast pitch locations is not publicly available. So, I’ll have to make some estimates.

The Statcast Mistake Rate

The goal here is to use the normal distribution to estimate the mistake rate for ball and strike calls produced by a RoboUmp using Statcast data. So, imagine a pitch that actually crosses the plate at a horizontal position, x, as shown below. We’ll assume Statcast has some standard deviation, so it could report the position of the ball in different locations with probabilities given by the normal distribution.

In the sketch above, x = 0 is the center of home plate. The ball actually crosses the plate at a position x. The position labeled D is the edge of the strike zone that is equal to half the width of the plate plus half the diameter of the ball. The blue curve is the normal distribution of Statcast-reported positions for this event. Note there is some chance this strike will be reported as a ball because the distribution is non-zero at locations greater than D.

Using this idea for every possible actual position of the ball, one can find the probability Statcast will report the pitch incorrectly. Below is a graph of the mistake probability as a function of actual position for a random error standard deviation of 0.25 inches.

Cooperstown Confidential: The Hauntings of the Hall of Fame
Believe it or not, Halloween stirs the ghosts of baseball.

You can see the error probability is 0.5 if the edge of the ball aligns with the edge of home plate (x = D = 9.95 inches). That is, this location is a 50/50 call. Farther from the edge, the probability of a missed call drops and is essentially zero when a pitch lands an inch away from the edge. This curve includes both xD being called a strike.

If pitches were uniformly distributed across the strike zone, the total mistake rate could be found by just adding up these probabilities. However, we know pitchers try to keep the ball near the edges of the plate. If they are actually successful, the total mistake rate should increase because the ball is more often in the mistake-prone region.

The plot above is the probability of Statcast reporting a pitch from July of 2017 in the region between the center of home plate and 16 inches to the catcher’s right. You can see pitchers are only somewhat successful at keeping the ball near the edge of the strike zone. Combining the probability of a given pitch location with the probability of a missed call by Statcast as a function of the random error standard deviation results in the plot below.

I have always heard–but have no verification of the fact–that major league umps are expected to have less than a five percent error rate. I don’t know whether this means five percent of called pitches or five percent of all pitches, but I suspect the former. Anyway, this analysis shows that as long as Statcast has small systematic errors and random errors less than about 0.9 inches, it should be as good as umpires at calling inside or outside pitches.

The Top and Bottom of the Zone

Now we should investigate high and low pitches. Ball and strike calls here are not as cut-and-dried. The horizontal piece of the strike zone is carefully and quantitatively defined by the width of home plate. The vertical strike zone is much more nebulous. The MLB definition of the strike zone states:

“The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.”

It is accompanied by the sketch below:

This definition leaves plenty of room for interpretation as far as the vertical part of the zone is concerned. Many batters have a straight upward stance and move into a crouch only as they swing. Others start in a deep crouch and become more upright as they unload. Not to mention that the knee cap may be hard to spot if the batter wears loose pants.

PITCHf/x originally used poorly paid “stringers” to sit in a dark room under the stands and manually turn a dial to set the top and bottom of the zone on the video image of the batter. Saunders reports that Statcast uses the previous calls of major league umpires to build a database of the top and bottom of the strike zone for each hitter.

Isn’t that ironic? Until MLB comes up with a machine-comprehensible definition of the top and bottom of the strike zone, machines will need the assistance of humans to define the strike zone for the machines.

Other Issues

One obvious problem is, on occasion, Statcast simply misses a pitch or a hit. Although these incidents seem to be occurring less and less frequently, if it did happen, would the RoboUmp have to declare a “do-over?” Several times during the World Series telecast, the strike zone box disappeared. Of course, we don’t know if that was a Statcast failure or a production mistake.

I also noticed during the World Series on several occasions, the replay of a pitch showed the ball in a noticeably different position than the “live action” did. Again, it is not clear if Statcast is to blame or the problem was a production issue.

One last concern for using Statcast data to power a RoboUmp involves the time required to collect the video and radar data, process it into meaningful numbers, and transmit those values to a RoboUmp. When one watches a broadcast, it appears as though the system produces the results in real time. The speed and location of the pitch appear on your TV as the pitch is caught by the catcher. It is easy to forget that the broadcast has been delayed by a few seconds for the express purpose of adding those graphics.

The time for data processing and transmitting is not available publicly. However, I have noticed it takes at least one second, sometimes longer, for the pitch speed to be posted on the scoreboard in most ballparks. It is not clear if this data comes from Statcast or some radar gun positioned behind the plate. If it is from Statcast, it would be an estimate of the processing and transmission time needed to alert a RoboUmp.

Travis Sawchik has suggested that perhaps inside/outside calls could be made by the RoboUmp while high/low calls are made by the human umpire. So, when the game comes down to the winning run on second in the bottom of the ninth and the closer fires a two-strike pitch on the black, we should wait a second or two for the scoreboard to tell us whether the game is over. That can’t happen.

Of course, processing and transmission times may drop as Statcast improves, allowing more instantaneous pitch calls. Nonetheless, we’ll still have the random errors and the issues associated with the definitions of the top and bottom of the strike zone to address.

I guess we’ll leave the last words to Bill Klem, who once replied to a rookie pitcher complaining about the strike zone, “Son, when you pitch a strike, Mr. Hornsby will let you know.” The point is that, for now, as the Commish says, calling balls and strikes must remain a human endeavor.

References and Resources


David Kagan is a physics professor at CSU Chico, and the self-proclaimed "Einstein of the National Pastime." Visit his website, Major League Physics, and follow him on Twitter @DrBaseballPhD.
newest oldest most voted
Jim
Member
Member
Jim

VERY interesting, David. Thank you.

Kristopher
Member
Kristopher
David, this is a great piece. I think it’s also fun to study the systematic error of the umpires. As a physicist, you probably understand this much better than I, but I remember sitting around watching a ball game unable to grasp umpire position. They’d place themselves in a position that essentially guaranteed a poor optical angle. If you position yourself in “the slot,” you’re creating an angle, on purpose, that makes it impossible for the human eye to judge depth! Why would you ever do this?! I really knew nothing about umpiring at that point and my quest to… Read more »
Jetsy Extrano
Member
Jetsy Extrano

Removing context is a tricky idea. As long as you have measurement error, you’re going to convince your measurement with a Bayesian prior to get your estimate. You can choose a flat prior if you want, but that actually hurts your accuracy.

In game terms, umpires today call more strikes on hitters’ counts and more balls on pitchers’ counts. Maybe from Bayes, maybe from other motivations. But if you change that you change the game quite a bit. We’ll have significantly more K and BB and fewer balls in play.

Hank G.
Member
Member
Hank G.
Not necessarily. Once the hitters and pitchers realize that they are not going to get a freebie on certain counts, they will adjust their behavior. The thought that the umpire changes his zone based on the pitch count has always been an anathema to me. I understand that others apparently are not bothered by this. Do we as a group accept it because it’s always been that way? I am firmly in favor of getting the call right, if possible (and practicable). An automated strike zone might be larger than the human umpires. How many umpires will consistently call a… Read more »
The Stranger
Member
I’ve generally been against RoboUmp for a variety of reasons, but I hadn’t given much thought to the challenge of calling high/low strikes and how subjective the top and bottom of the zone really are. I can think of a couple workarounds for that, but I’m not convinced that they’re better than the current system. As a fan, though, I can accept that human umpires make bad calls. Even if it’s pretty egregious, I can still recognize that these are humans doing the best they can (and that I wouldn’t do any better). If a computer botched a critical call,… Read more »
Michael
Member
Michael

I agree with your last point. It seems from the info here that even if it improves, this technology is not as inherently perfect (ie random error) as people might assume a “robot” would be. I’m open to it, and it may be better than an human ump especially with time, but there will be disputes seemingly no matter what and it seems easier to accept human error for me for whatever reason.

DancingInPDX
Member
DancingInPDX
Awesome article! One thing that is not clear to me is the degree to which “accuracy” is actually the heart of the problem, as opposed to “consistency”. I’d argue that hitters and pitchers alike can adjust to a K-zone that is slightly inaccurate (relative to the stated rules). What I sense is far more frustrating is a K-zone that moves around. So to me there are two important margins of error, where the margin of error for accuracy, once below a certain threshold (e.g., an inch), becomes acceptable, making the margin of error for consistency more important. Regarding the top… Read more »
francis_soyer
Member
francis_soyer

They’ve been doing it in Tennis for years. Zero controversy.

As for hi-lo calls. Measure the players and have a strike zone based off of their heights.

It never made sense for a Rickey Henderson to crouch his way into the record books to begin with.

It’s about time the old hi-lo zones were abandoned, they were never consistent to begin with.

The Stranger
Member

Honestly, you could make a reasonable case that the strike zone should just be the same height above the ground for all players regardless of height. It would be a fundamental change to the rules, but there’s an inherent fairness in making every batter responsible for hitting pitches in the same area. It’s not like short basketball players get to shoot at lower baskets.

Hank G.
Member
Member
Hank G.
It sounds as though they are very close to being able to use automated balls and strikes, if they can get the processing and transmitting times down to an acceptable level. The error rate seems to be better than many human umpires already. The vertical strike zone issue could be addressed by processing all players (say in spring training) and then keeping a database on each player and his vertical strike zone. Then the identity of the player would be entered and the system would know his strike zone. There would probably have to be a process where a player… Read more »
v2micca
Member
Member
v2micca

Honestly, if the technology advances that far, they could simply re-calibrate the system before each game during the players batting practice.

Marc Schneider
Member
Marc Schneider

If you did this, how would players argue with the ump and waste more time? The point about tennis is interesting; the advent of the challenge system has eliminated all the McEnroe-esk arguments and, to made, made it more enjoyable. The players are usually wrong in challenging a call and not surprisingly. Even at my hacker level, it’s often hard to tell if a ball is in or out. I suspect it’s no different with baseball players.

Hank G.
Member
Member
Hank G.

Another side effect would be that catcher framing would no longer be a useful skill.

v2micca
Member
Member
v2micca

I am okay with that.