Umpires are Improving

Fact: one of the most exciting areas of study right now is catcher defense, and catcher pitch-framing. A little bit of the shine is off, but we’re still making discoveries, and the whole thing is exciting because at last we’re able to put some numbers to something that’s long been suspected or known. Previously, we were left with guesswork and anecdotal evidence. Now we have an understanding of who’s good and who’s not good, and though it’s all still evolving, more and more people are aware of it, and more and more people are talking about it.

Yet conversations about pitch-framing are seldom just about pitch-framing. Practically every time it comes up, the conversation turns to whether or not this ought to be left to skill. Sure, some catchers receive better than others, and it can make a meaningful difference. But why should it be that way? Why can’t umpires just call consistent strike zones for everybody? Why can’t we just have automated, perfect strike zones, to even the playing field? And so on and so forth. It’s exciting that we’ve identified pitch-framing as a talent, but people are split on whether or not they want this talent to keep having an effect.

So, conversations about receivers often end up as complaints about umpires. Indeed, we can all recall instances in which blown calls were made that were all but inexcusable. I think most people agree that strikes should be strikes and balls should be balls, or else the integrity of the game is jeopardized. But for all the complaining people do about umpires, we have to acknowledge one fact: umpires are getting better. They’ve been getting better for at least a few years.

In terms of calling the strike zone, at least. I don’t know if they’re getting any better at bang-bang plays at bases or traps. Probably not? But calling the strike zone is a part of their job, too, and it’s the biggest part of their job, and the numbers say they’ve been making progress as a group.

I mentioned this briefly on Wednesday in talking about Justin Masterson, but I figured this is worthy of its own post. Occasional FG writer Matthew Carruth runs StatCorner, and on the player pages you can find measures of the rate of pitches in the zone called balls, and the rate of pitches out of the zone called strikes. Bad calls, basically. Check out what’s been happening to the league averages during the recent PITCHf/x era:

Starting Pitchers

Year zTkB% oTkS%
2007 22.0% 9.2%
2008 19.2% 8.2%
2009 17.4% 7.9%
2010 15.2% 8.1%
2011 15.2% 7.5%
2012 14.4% 7.4%

Relief Pitchers

Year zTkB% oTkS%
2007 23.5% 8.5%
2008 19.8% 7.7%
2009 17.9% 7.6%
2010 15.7% 7.5%
2011 15.9% 7.0%
2012 15.0% 6.9%

The column headers should be intuitive, since I already noted what they would be. Just a few years ago, one in five pitches in the strike zone was called a ball. Last year it was more like one in seven. There’s been steady progress in that department, and there’s also been steady, if slighter progress in pitches out of the zone getting called strikes. The trends exist for starters and relievers alike, which, yeah, why wouldn’t they? And they’re pretty hard to ignore.

What might be driving this? Any number of things. Maybe there’s something to the idea that catchers are getting better at receiving quality pitches, but then it’s curious that they wouldn’t be getting more balls called as strikes. As PITCHf/x has become available to us, it’s also been available to umpires and to their superiors. Umpires all try to get better, their superiors all want for them to get better, and maybe umpires have just become more aware of their previous flaws. Only in the past few years have they been able to be confronted by so much information. It wouldn’t be surprising for there to be a response. The more data there is, the better umpires can be evaluated, and the more umpires can grow. And maybe the more consistent umpires have been rewarded while the lousier ones have been worked with or penalized.

The StatCorner strike zone, naturally, isn’t perfect, but strike-zone imperfection or inconsistency isn’t going to explain away the trends above. I’d consider that not a major source of error, but a minor one.

Of course, we’re still looking at less than 90% of pitches in the strike zone getting called strikes. That’s not close to good enough by many people’s standards, and some people won’t accept even 1% mistakes. Umpires, without question, remain flawed when it comes to calling the zone, and they’ll never be perfect so long as they’re human, because humans are incapable of perfection at even the simplest tasks, and calling balls and strikes isn’t simple. Never having done it myself, it’s probably terrifying! Balls fly fast and pitchers annoyingly make them move around, as if the velocity didn’t make judgment tricky enough. The argument for an automated strike zone is always going to have reason to exist, until or unless an automated strike zone is implemented.

But so long as we have red-blooded humans crouching behind the catchers, everyone can agree that it’d be good for them to get better, and the numbers say they’ve been getting better. It’s something, at least. It’s not about either having problems or resolving problems; life isn’t that binary. Reducing the frequency of mistakes is progress, and better than the alternative.

Incidentally, the other day on Clubhouse Confidential, they were talking about rising strikeout rates, a trend of which many of you were probably already aware. There’s no question that there are a ton of factors at play, there, but one wonders if improving umpires might be a contributing variable. Five years ago, roughly one in five pitches in the strike zone was called a ball, and pitchers struck out 17.5% of batters. Last year, roughly one in seven pitches in the strike zone was called a ball, and pitchers struck out 19.8% of batters. It’s probably not not a factor, even if it isn’t a major one. But maybe it’s a major one, along with the other major ones.

Umpires: flawed. But, umpires: improving. As long as we’re going to have human umpires calling the strike zone, steady improvement seems like a welcome compromise.




Print This Post



Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.


41 Responses to “Umpires are Improving”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Maverick Squad says:

    How many of these missed calls by umpires are line ball decisions- maybe alot of these uncalled strikes are pitches that are maybe just on the edge of the zone- 50-50 calls.
    Maybe all this virtual strike zone stuff we see on TV is helping umpires improve their game- they see where they make mistakes and maybe work to correct it.
    It would be interesting to see (maybe someone has done it) where most of these missed strike calls are in the zone, and maybe what type of pitches. Anectdotally to me it seems umpires don’t like calling the high strike.
    Also maybe the reason the % of uncalled strikes is so high is because the umpires use a strike zone different to that defined by the rules- an ‘unofficial’ stike zone. Thus these calls, while technically incorrect, are correct for the zone umpires use and thus aren’t so bad since the players understand these zones.

    Vote -1 Vote +1

    • Bad Bill says:

      Either a pitch catches the black, or it doesn’t. There is no 50/50 margin on inside/outside. (Knees and shoulders, I’ll give you.) Tennis has already demonstrated that a technological means for making that same call can be INCREDIBLY accurate, without impacting the flow of the game, even though the served ball, as hit by a top male player, is moving even faster than in baseball. I agree that it would be interesting to know where the misses are.

      Vote -1 Vote +1

      • Tim says:

        Judging the ball as it hits the ground makes things quite a bit easier in tennis. Not to say that it’s impossible in baseball, but it’s harder.

        Vote -1 Vote +1

        • Aaron (UK) says:

          If anything, it’s harder in tennis, as you have to track (or model) the deformation of the ball as it bounces.

          Cricket manages pretty well with Hawk-eye, including a predictive element. Though when decisions are reviewed, the original decision stands unless there was a clear error (i.e. a ball that would have just been clipping the stumps isn’t turned into an lbw, but if an lbw was originally given, that is upheld).

          Vote -1 Vote +1

        • ttnorm says:

          I agree. It is harder. Every player has a different strike zone. There is no perfect line at the top and bottom of the zone. And it has to be adjusted every time a new batter enters the box. I am sure that batters will find a way to influence the data entry in their way. I can see A.J. Pierzynski decking Peter Bourjos for not standing tall enough in the batters box.

          Vote -1 Vote +1

      • Greg Simons says:

        As my father, a long-time umpire, has drilled into my brain, the black is not part of the strike zone. The ball has to cross over the white part of the plate to be a strike.

        Vote -1 Vote +1

  2. Nick O says:

    Could it be that Pitch f/x has gotten better at identifying balls and strikes rather than the umpires? I know that most missed ball/strike calls are high/low, and pitch f/x sets each hitter’s knees and letters somewhat arbitrarily.

    +7 Vote -1 Vote +1

    • Paul says:

      This is a fair question, since the system has changed over time and one wonders if the software is exactly the same as when it was first rolled out.

      However, if your query is true then wouldn’t that mean that the improving Pitch f/x is confirming a strike zone that the eyeball test has for years noted as laughable? I’d say it’s much more likely that umpires are using the system – which we know they do now as a matter of course – to improve their performance.

      Vote -1 Vote +1

    • Chris says:

      The systematic error biases documented in are strong evidence that what we’re seeing is umpire behavior, not Pitchf/x errors. You would expect a different shape to the error distribution if it were intrinsic to the measurement system.

      Vote -1 Vote +1

  3. edgar4evar says:

    The most compelling thing (among several) to me in this piece is the overall error rate. A single blown call can completely alter the course of a game, and umpires are blowing something like 10% of them. Robots, please.

    Vote -1 Vote +1

    • Paul says:

      Would you say that NBA or college basketball refs get charge/block calls right more than 90% of the time? What about holding/pass interference in the NFL, or the even more outrageous various roughing calls? I’d say it’s more like 90% of the time replay showed the defender clearly used his shoulder to a receiver’s torso – exactly as they’re instructed to do – and still get a 15 yard penalty.

      A little perspective maybe. MLB umpires are improving, and the strike zone is to me is harder to call than fouls the other sports.

      Vote -1 Vote +1

      • LK says:

        If there were an automated system that could determine block/charge calls, I’d imagine a lot of people would be in favor of using that.

        Vote -1 Vote +1

        • wallbanger says:

          Seems to me the more things moving around, the harder it is on refs/umps. Blind spots are constantly being created and disappearing. No aroldis Chapman type speed elements, but big fast men being chased around by zebra striped mortals as opposed to a ball and a strike zone. Baseball seems like an easier task imo.

          Vote -1 Vote +1

      • Chris says:

        Frankly it’s a huge weakness of high-level basketball that, in essence, the refs pick the winners. Any game closer than ~ 5 points was basically decided by what fouls the refs did or didn’t call.

        Vote -1 Vote +1

  4. ChuckO says:

    Calling balls and strikes is hard. I know from experience. Thirty years ago, I taught HS for a year. The teacher who coached baseball became a buddy of mine. He was going to have a practice game against another school and they needed an umpire, so he asked me if I would do it. I did, and I was horrible. It was incredibly difficult to tell whether a pitch was in the strike zone. Like an idiot, I called the game from behind the plate. In that situation, I would have probably done better standing behind the pitcher.

    I also wonder how much more accurate umpires can become. If a ball is traveling 90 mph, you do not actually see the pitch during the last ten feet that it travels. That compounds the difficulty. The only way to ensure complete accuracy would be to employ a technological solution, and I don’t know how that would go over with players and fans.

    Vote -1 Vote +1

  5. Jackson says:

    I strongly believe that this “effect” is the result of pitch f/x more accurately recording the location of pitches, opposed to umpires getting better. The fact that the biggest improvement in umpire accuracy was from 2007-2008, when pitch f/x was still working out it’s kinks, signals that the improvements are in the data, not the umpires. The improvements also start to plateu in 2009/2010, when Sportvision started focusing more on hit f/x and less on improving pitch f/x. I would be interested to see if these “improvements” are still there when using corrected pitch f/x data. I would imagine there would be little if any improvements across years.

    +5 Vote -1 Vote +1

  6. Matt says:

    I’m curious as to how much the improvement in accurately calling the strike zone has decreased offense. The general consensus is that PED testing is solely responsible for lower power numbers, but a significant improvement in calling an accurate strike zone would have to have an impact.

    Vote -1 Vote +1

  7. Wally says:

    I wonder how the individual pitches rate in oTkS% and zTkB%? Do bad sliders more often get called strikes than bad fastballs? It seems like that would be interesting data, too.

    Vote -1 Vote +1

  8. All right, so it’s come to my attention that what the StatCorner numbers show is improved umpire consistency, as opposed to improved umpire accuracy with regard to the rulebook strike zone. This is because the StatCorner strike zone is based on the average strike zone called by all umpires in the league. But the FanGraphs plate-discipline data supports the idea of improved accuracy as well over the past three years, so that’s reassuring. And consistency, naturally, is important too. So basically, the overall message above remains intact, but I wanted to leave this comment in order to clear something up.

    Vote -1 Vote +1

    • Robbie Griffin says:

      I actually think consistency is MORE important (both being very important). If umpires don’t want to call the high strike, or if they want left handed hitters to flail at pitches outside, that’s frustrating, but if they do it consistently, at least it’s fair.

      Vote -1 Vote +1

      • Greg Simons says:

        It’s not fair to lefties compared to righties if lefties have to defend a wider strike zone that righties.

        Vote -1 Vote +1

  9. Detroit Michael says:

    As others have said, at least some of this effect is that the Pitch F/X data has become more consistent, with fewer calibration errors than when it was first installed.

    Vote -1 Vote +1

  10. Bryce says:

    I like the article, but have one request: please always explain abbreviations used in your charts. Yes, it’s possible to figure it out, but it would’ve taken the same amount of space to actually explain them, and it makes your writing much more accessible.

    Vote -1 Vote +1

  11. Cidron says:

    In defense of the umpires, they have to ….
    1. Set up behind and above the catcher, resulting in an above and to the side view of the pitch (which isnt ideal)
    2. They have to adjust to different catchers and their sizes, mannerisms, etc thru the course of the year
    3. They have to adjust themselves for comfort over the course of the game and season
    4. They have to hear it all from the fans, media, analysts, players, and coaches about a given call, or game, or their other faults.

    As a result of these four, (and more, no doubt) the umpires cannot be 100% consistent. Their views of a pitch vary, their mental makeup/attitude of a given moment, etc all factor into the game, its pace, and a given call. Yes, they are professional, and try hard to be accurate. But, even so,….. it just cant be done with these and other factors.

    Vote -1 Vote +1

    • Laison says:

      Yes….and…
      No one is saying umpires can be 100% consistent. Everyone (should) know its very tough to be an umpire.

      Vote -1 Vote +1

    • Wil says:

      Where they set up, I have noticed, plays a huge effect on how they call the game. The Braves have one of the best camera setups in the majors so it’s easy to see what is a ball and strike to the viewer. Often I’ll see a ump set up on the inside and completely botch calling the outside of the plate (probably because they can’t see it as well). But I am not sure if there is any remedy.

      Vote -1 Vote +1

  12. Tim says:

    It would be useful to have this broken down by count. If you stripped out 3-0 and 0-2, the error rate might be significantly reduced.

    Vote -1 Vote +1

    • Jay29 says:

      ^This.

      I think 3-0 and 0-2 (and 1-2 to a lesser extent) are where you see a vast majority of the bad calls. And I think if umpires somehow managed to call the zone more accurately in these counts, their numbers would look a lot better.

      I’d really like to see this broken down by count.

      Vote -1 Vote +1

    • Chris says:

      Read — errors are indeed much more common when the end of the at-bat is at stake. Unfortunately, the error rate is also high even when it isn’t.

      Vote -1 Vote +1

  13. fergie348 says:

    When Laz Diaz retires or is reassigned to the Pacific Coast League, I expect a couple percent drop in both categories..

    Vote -1 Vote +1

  14. crzy_guy says:

    As much as we’re “discovering,” I wonder how much of this stuff has been already studied at length and accurately modeled by major league teams. From people who have worked for them at length, there’s been many hints that they’re much further along than the sabermetric community at large.

    Vote -1 Vote +1

  15. rotowizard says:

    This is EXACTLY the reason K rates are improving at the rate they are. It doesn’t take 3 good/bad calls to make a strikeout, just one. It almost makes TOO much sense.

    Vote -1 Vote +1

    • Paul says:

      It’s just intuitive, isn’t it? Let’s look at the alternate arguments. K rates are way up and all the offensive metrics are way down… because of global warming? Um, the balls are being wound looser? The steroid use rate was actually more like 90% and for some reason pitchers did not benefit from them?

      The rate differences are dramatic and the only thing that we have any evidence of changing (or can be reasonably concluded to contribute) is that umpires are using Pitch f/x to improve their performance, and their bosses are evaluating them based on it.

      Vote -1 Vote +1

  16. Shane T. says:

    Okay, so more strikes are being called strikes and more balls are being called balls. This should mean there is less latitude for pitch framing to influence the calls, for good or bad. That means we should see the range of run values from pitch framing decreasing year-to-year as zone consistency increases. To wit, it should narrow from [+35 to -35] down to [+30 to -30], or numbers to that effect.

    Is this happening?

    Vote -1 Vote +1

  17. ChartNazi says:

    As a chemist, you should label your charts better. I know it’s intuitive, but including two straightforward sentences or a caption like zTkB% stands for (pitches in the ZONE TAKEN called a BALL?) would be more clear.

    Vote -1 Vote +1

  18. Greg Simons says:

    Interesting that relievers consistently get worse calls than starters in both tables. I imagine this is tied somewhat to speed of pitch, but that’s just a guess.

    Vote -1 Vote +1

    • legendaryan says:

      I would guess that it’s due to the changing of the pitcher in general. Umpires are human and likely get into a rhythm when just 2 pitchers pitch each inning for 5-7 innings.

      Suddenly changing to a new delivery/ style/ pitch type for only an inning or 2 probably affects the umpires.

      Vote -1 Vote +1

    • Paul says:

      It might also be type of pitch. Even though relievers throw more fastballs, they throw about 7% more sliders. Plus, relievers throw them harder. I would guess it’s the hardest pitch to call.

      Vote -1 Vote +1

  19. Fatbot says:

    Really fun area to delve into, thanks for the writeup! I’d love to see more research on those spots of the extreme corners of the strike zone rectangle and around the rim of the elipses heatmaps of called strikes, and as Tim said we need a break down by count, and even by each umpire! I’d love a discussion on what pitcher is most successful getting those 50/50 type calls and if we can build a pitcher type? Does the data support that guys with a reputation of “being around the plate” do indeed get more favorable treatment from the umps (correlate BB% to zTkB)? Are the power pitchers getting squeezed where the junk guys are thriving (we need to understand why MLB-wide % of fastballs thrown keeps falling every year while strikeouts go up every year, this could be a factor)? Or are some guys just lucky getting more calls, or is their deception/reputation the key, etc. etc..?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *