Does the Call Need to Stand?

Does the standard for replay reviews need to be clear and convincing evidence?  (via Robert Rescot)

Does the standard for replay reviews need to be clear and convincing evidence? (via Robert Rescot)

I don’t have too many complaints about baseball’s replay review system. It’s not great that it leads to a lot of standing around, though I find the manager/umpire banter before the video coordinator gives a thumbs up/thumbs down pretty funny, and it involves challenges rather than using strictly booth review, but those have turned out to be small issues relative to the benefit of fixing some important calls.

There is one thing, though, that I find pretty unsatisfying about the replay system. Section III of the portion of the rules applying to instant replay says that in the absence of “clear and convincing evidence” the original ruling should stand, and Section II.J.3 outlines how calls are confirmed or overturned in the presence of such evidence, but “stand” if the review is unclear. This is the part of the actual replay review mechanism (as opposed to the system for triggering reviews) that might need an overhaul, and I think that overhaul could, handled properly, provide additional transparency, which baseball could always use a bit more of.

On its face, allowing a call to stand seems pretty unobjectionable. After all, the umpire on the field is an expert at these sorts of things and got to see it up close, and so we should give him the benefit of the doubt. In practice, though, that means you end up with calls like this:

Call One

Call Two

Call Three

Those aren’t cherry-picked; in fact, they’re three challenges with a result of “call stands” selected at random. If you made me guess, my reaction to those is that the calls on the field were wrong, right, and wrong, respectively, though I certainly wouldn’t claim any of those are conclusive. These calls exemplify two problems with the current system. First, having calls that look wrong but stand is a problem. It makes the umpires look bad (unfairly), it makes the broadcasters looks bad, it upsets fans and players, and it undermines everyone’s faith in the system a little bit more. (It could also be a source of substantial controversy. Think about what would have happened had the original call on Eric Hosmer’s double play in Game 7 of this year’s World Series been upheld due to lack of evidence.)

The second issue is revealed in the second clip, where, as Vin Scully points out, it’s unlikely that the ump actually saw the foot off the base; put another way, he didn’t actually have a good angle that the camera didn’t. Given that an umpire’s hidden expertise is at least part of the basis for letting the call stand if replay is inconclusive, we should probably reconsider if it’s actually the best method in light of results like that. With 361 inconclusive challenges last year, or approximately one every seven games, this is a real area for improvement.

In my eyes, a replay system should have three goals: it should correct as many calls as possible, it should be quick, and it should be transparent, or failing that, simple. (It should also be reasonably consistent, in the sense that two independent reviewers would come to the same conclusion most of the time, but that goes along with correctness and simplicity.) The current system, when facing a borderline call, is reasonably transparent and simple, but it’s not correcting as many calls as it could be. The three calls above might not have been wrong (though I think two of them were), but some of the ones that stand must have been. Whether the current system is fast enough is a matter of opinion, but it’s clear that the borderline reviews are slower than others. Depending on how you choose to measure it (mean or median, with or without controlling for type of call and initiator of challenge), the typical call that stands takes 40 to 50 seconds longer to review than a call for which there is conclusive evidence. It’s thus clear that, at least in theory, there’s some room for improvement.

One way of changing how inconclusive replays are resolved could be called population-based resolution (PBR), which would entail predicting the probability that a given call was correct based on certain characteristics of the population of replay challenges, then drawing a random number to determine the result. For instance, let’s stipulate that there was an 80 percent chance that the call on the field in the second video was correct. Ignoring for a moment how we got that probability, the ump in New York would use a computer to pick a random number such that 80 percent of the time the call stood, and the other 20 percent it was overturned.

PBR has some noticeable shortcomings, both in principle and in practice. There’s the obvious issue of how the probabilities would be determined, but what’s even more important is that while PBR would be overturning the right number of calls, it probably wouldn’t overturn the right calls. The reason I suggest it at all is that this is a more general case of the current method used to adjudicate inconclusive cases.

The current system can be interpreted as follows: “The call was inconclusive, so we are assuming it is like all other calls we didn’t overturn and has a 100 percent chance of being correct.” The proposed population level inference could be viewed similarly: “The call was inconclusive, so we are assuming it is like all other force plays where the challenge was initiated by the umpire and has a 64 percent chance of being correct on the fields.” From the perspective of getting a call correct, the current system is just a very blunt form of PBR, which is why it seems to me like a frustrating abdication of the purpose of replay.

Thankfully, though, there’s what I think is a much superior option, which is to lower the standard of evidence. Instead of requiring that the replay provide “conclusive evidence” that the original call was wrong, just instruct the replay official to pick whichever call looks better from the replay (a “preponderance of the evidence” standard). The reviewer can consider whatever is necessary, so if the field umpire had a good view, that will count, but if the original angle was poor, the call on the field can be disregarded. (In a perfect world, it’d be great to do the review as blindly as possible—no knowledge of the direction of the original call or who made it—but it’s unlikely that all of that can be consistently edited out of video in enough time to keep replays proceeding at an appropriate pace.)

How does this rate on the three criteria I mentioned above? Unfortunately, it’s hard to say right now; fortunately, it wouldn’t be very hard for the commissioner’s office to study. Bring the umps to the league office in New York, and have 10 umpires look at each call that stood for lack of conclusive evidence plus some additional conclusive calls for benchmarking. That might seem like a lot of work, but if you do the math, it works out to a bit less than a business day of reviewing per ump, which is hardly an extreme expense.

MLB’s Diversity Fellowship Is a Step in the Right Direction
It is not a perfect program, but it certainly counts as progress.

These data would tell us (or really, the league) several important things. They would provide estimates of how many calls this would actually affect over the course of the system, how long these reviews would take, and how reliable the umps are when forced to make a call about 50-50 reviews (which they currently aren’t forced to do). These estimates are essential in figuring out how to rejigger the current replay system.

My suspicion, based on little but guesswork, is that the new system would be a few seconds slower on average, a bit less consistent (meaning that different umpires would disagree about calls more frequently), but probably increase the probability that a review yields the correct result by a substantial amount.

How substantial? We can do a rough calculation using assumptions about the fraction of inconclusive reviews that were actually correct to uphold the call and the probability that an umpire will make the correct determination under the new system. For instance, if 60 percent of the inconclusive reviews were of incorrect calls, then tacitly the current system makes the right call only 40 percent of the time. If a replay official will make the call correctly 80 percent of the time, then a non-obvious review is more likely to be correct by a margin of 40 percentage points, which would correspond to roughly 150 extra correct calls last year.

It is, of course, also possible that this proposal wouldn’t work so well. If the new review standard requires longer reviews, inconclusive plays are hard to reliably assess, or most of them are found to be correct calls, then the gains won’t be in line with my guess above (or, if the drawback is speed, won’t be worthwhile given the additional drawback). In that case, the current system can be left in place, and everyone can at least know that other options have been considered.

Ultimately, though, there’s no use substituting speculation for research. With results in hand (and ideally publicly released), it wouldn’t be too hard for the league to crunch some numbers and figure out the costs and benefits of a more nuanced replay system. It’s not a pressing issue, but it’s an area of the game that’s reasonably straightforward to improve, and unlike other pushes for umpiring transparency, it wouldn’t involve the criticism of individual umpires. As the league keeps trying to improve its umpiring, it’s an obvious place to start.

References & Resources

  • Retrosheet’s Expanded Replay Usage data
  • Baseball Savant’s MLB Instant Replay Database


Print This Post
Frank Firke crunches numbers for a tech company. He writes about baseball at The Hardball Times and irregularly about other sports at his blog, Clown Hypothesis. Follow him on Twitter @ClownHypothesis.
Sort by:   newest | oldest | most voted
Erik
Guest
Erik
I’ll preface this with the fact that I hate replay. To me, the goal has always been to let the players play the game, with as little incursion from outside forces as possible. When we were kids playing in the street, we didn’t need umpires, let alone replay. Sure there were arguments, challenges, maybe even a couple of fights, but we kept the games going and I don’t ever remember going home thinking I got cheated. We knew that if we didn’t want a call to go against us then we should’ve hit the ball a little further, or ran… Read more »
CLiddle
Guest
CLiddle
I can only completely agree to Erik’s comments. These interminable delays are not a part of baseball. Umpires making their best judgments on the field and folks on the field arguing about the calls, then abiding by the calls are parts of baseball. And, for us fans, part of the legacy of fan-dom, is enjoying arguing about some of the calls made on the field — now that facet is lost to us, and, unfortunately, to generations of fans to come. Shame on MLB for kowtowing to the short-sighted, anti-purist, techies who have lost sight of the century-old panorama of… Read more »
JohnH
Guest
JohnH
Forgive my intrusion on baseball’s sacrosanct history, but just because umpires have been making wrong calls for 100 years, doesn’t mean they have to continue to do so if there are better ways to get things done. Look at how many wrong calls were overturned this year because of clear and convincing evidence where the ump was just plain wrong! Why would you want calls to be wrong when you can make them right and the players/teams can play baseball games where the results are based more closely on what actually happened, rather that what human eyes think happened? Just… Read more »
Erik
Guest
Erik
I’m not completely against the idea of replay John. I am a fan of getting calls right. What I am not a fan of is delaying games and replaying parts of it based on the ‘what if’ scenario where the umpire got it right the first time. Implementations need to be seamless, and what we have now is not. We exaggerate the importance of getting calls right over the importance of the pace of the game and the excitement of watching what actually happens on the field unfold. Could you have imagined if Alex Gordon went home with 2 outs… Read more »
PackBob
Guest
PackBob

Three replay officials look at the same play independently, have a time limit, and then vote what they think is the correct call. Or, two replay officials and the third vote is the umpire’s original call. Same time limit so that all reviews take a similar amount of time. No burden of having to override an umpire’s call.

wpDiscuz