The State and Future of Pitch-Framing Research

Defensive metrics couldn't reach a consensus on Jason Varitek's final seasons (via Eric Kilby).

Defensive metrics couldn’t reach a consensus on Jason Varitek’s final seasons (via Eric Kilby).

We need to better understand catcher defense. It is not a matter of small weight. It is not a final flourish on player evaluation. It is a weighty, vital next and present step in baseball analytics.

Over the last decade, our metrics have blossomed, and we can propose with decent comfort: Miguel Cabrera has been worth around seven wins annually; Brandon Crawford has been worth around two wins annually; Mark Buehrle has lately managed an ERA and innings combo worth about 3.5 wins per season.

What we can not say with good comfort and what we should not be saying: Buster Posey was worth around about five wins in 2013, or Jonathan Lucroy was worth 3.6 wins each of his last two seasons. Why? Because a hearty chunk of the catcher’s role is not accounted for in any present form of WAR.

Of course, few good baseball writers make this error–the error of assuming the relative reliability of non-catcher WAR carries any similar weight to catcher WAR. A pitcher estimated at two WAR is nowhere as reliable as a catcher estimated at two WAR. The inscrutable wordsmith, Donald Rumsfeld, once summed the issue of catcher WAR in perfectly accurate and inaccessible language:

… [T]here are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown unknowns – there are things we do not know we don’t know.

In world of catching, the unknown unknowns are few. Why? Because we know how catchers are taught and how current catchers teach other catchers. An unknown unknown would be something such as body type, ethnicity or fingernail length affecting a catcher’s ability. Nobody is making claims these elements affect a catchers ability, but that may be because they are unknown unknowns.

Knowns

Right now, there are two enormous known unknowns: pitch framing and game-calling. Until the invention and implementation of PITCHf/x, these two elements had no hope of stepping into known-knowns. The baseball community has celebrated progress in these areas, mostly pitch framing. But further progress requires renewed urgency.

Matthew Carruth, a THT alumnus whose StatCorner site features the only publicly available pitch-framing database, explained the state of pitch-framing knowledge this way:

Observed Framing RAA = a * (framing skill) + b * (other skills) + c * randomness

“I think it’s important to remember that [the StatCorner] numbers aren’t a total reflection of a catcher’s skill,” Carruth says. “My hunch is that a is only at about 0.25 right now…”

And if we plumb the depth of PITCHf/x? If we beseech the heart of that dark and rich database and finally reach a point where we can alter the formula to a more confident:

Actual Framing RAA = a * (actual framing skill) + c * randomness

We still may have only threads in our hands. According to Carruth’s expectations, even if we managed to transition to the second formula, transition to a universe of known framing impact, then variable c would greatly outweigh variable a.

How an Ace Performance Impacts Reliever Workloads
Bullpenning has its advantages, but it's great when an elite starter eats up a bunch of innings, too.

“Baseball,” Carruth says, “is just incredibly random.”

How much can it matter?

I often discuss pitch framing with my colleagues. The most common source of doubt I hear: The numbers don’t pass the sniff test. The infamous Jose Molina has too many smart people crinkling their brows. This is a determination each of us has to make. Can there be a possible 5-win data inefficiency that existed for 100-plus years of baseball history? Can it be possible such a big deal was missed for so long?

We have to ask ourselves: How important is pitch framing and receiving? How important can it be? Let’s look at this both qualitatively and quantitatively.

The qualitative value

Let’s say it’s the fourth inning, runners on the corners. As the starting pitcher, your slider is not cooperating, and a walk and a duck-snort single have you in a jam. The manager brought the infield in, which increases your chances to prevent a groundball run, but also of a seeing-eye single, maybe a double.

It’s a 1-1 count. From a 2-1 count, hitters in 2013 have smashed a .351/.352/.580 slash. From 1-2, they hit .166/.173/.239. If you get a strike, you can even try throwing your slider–hopeless as it is today–but from 2-1, the slider is a non-option, possibly for the rest of the at-bat.

The next pitch matters. Even if the hitter leaves the bat on his shoulder, it could decide the fate of the inning, your outing, and the result of the game. You throw a two-seamer that tails toward the hitter’s knees. It misses the zone by at least two inches, but Lucroy–a master of receiving low strikes–snares it for a 1-2 strike.

The batter loses 185 points of batting average, 173 points of on-base percentage and 341 points of slugging average. He might as well be Neifi Perez with a tube of hard salami in his hands. The wild slider, which you’ve thrown only a few times, is now actually a weapon. He flails at it low in the zone, unsure where the zone ends, for a key strikeout. The infield moves back.

This hypothetical scenario may seem dramatic, but the most absurd element–the historical difference between a 2-1 hitter and a 1-2 hitter–is the most true part of it. Cabrera, the best, most fearsome man with a bat right now, has an 80 wRC+ after a 1-2 count. After 2-1, he smushes pitchers with a 167 wRC+. The power of a 1-1 pitch cannot be understated. A first strike is good, but even from 0-1, hitters had a .311/.319/.461 slash line in 2013.

Altogether, received pitches play a big role early in an at-bat and–as a 37 percent event–a significant role in 1-1 pitches:

Pitch Taking

With over a third of 1-1 pitches landing in a catcher’s glove as either a called strike or ball, being able to flip the borderline pitches can be nearly as damning as framing an 0-2 or 3-0 pitch, given how dramatically the hitter’s ability changes. Even the great Greg Maddux considered 1-1 pitches the key determinate of a plate appearance’s outcome (though shear quantity suggests 0-0 pitches matter more).

Let’s change perspectives and look at this from the batter’s box.

It’s the fifth inning. The pitcher, A.J. Burnett, is getting tired, and you’ve seen his every pitch. The game’s become a 5-4 slugfest, and this is your third at-bat. You fell behind early in your second at-bat and grounded out. During that at-bat, you got two good looks at his plus-plus curveball, with one whiff and one weak dribbler to short. The pitch is deadly.

Burnett misses the strike zone with a high fastball. The next pitch–the 1-0 pitch–matters. It matters a lot. From 2-0, Burnett will throw either a fastball or sinker, and you can handle either. The curveball, in a 2-0 count, is a three-percent event. From 1-1, there is almost a 40 percent chance he goes knuckle curve.

As he often does, Burnett goes fastball with his 1-0 pitch. It’s tight, and though you turn in, the pitch is not very tight. You’re hoping to influence the ump a bit because the fastball’s definitely brushed the zone; you could sense it a moment after you stopped your swing.

But Russell Martin isn’t catching today. Michael McKenry is behind the dish, and because he did not set up inside, and because his elbow has moved too much, the ump sees a ball. It’s 2-0, and Burnett has lost his best pitch, the curveball.

As the hitter, you don’t know it’s the catcher’s fault; you can’t credit him immediately. That’s the first inside pitch you’ve seen today; maybe the zone’s been tight all day? It makes you more patient. And when that surprise 2-0 curveball comes looping in over the outside corner, you take it for a happy strike.

And when the fourth pitch comes inside, you take it again for a ball. After that 1-0 pitch went for a ball, you get the feeling Burnett’s lost his control. He’s actually been popping the mitt, but the count is 3-1 when it should be 2-2. The Pirates have pitchers warming up fast in the bullpen. Whether the at-bat finishes with a single or strikeout, this could be Burnett’s last hitter, and the next batter will face a reliever with a rushed warm-up job.

Burnett throws a 3-1 sinker, but it’s headed for that narrow zone you’ve been focusing on since the 1-0 pitch. You blast the pitch right at Pedro Alvarez, who snares it for the out. Pedro is trying to shake feeling back into his fingers, and Clint Hurdle is marching to the mound before you even trot off the basepath.

The poorly framed 1-0 pitch has played a key role in changing the general course of the game. The bullpen enters earlier; the next pitcher has rushed his warm-up; the qualitative impact of that missed pitch has been potentially enormous.

Granted, these are high-leverage situations, illustrative scenarios of the importance of a called pitch. If the first pitch of a game is a borderline pitch and gets framed into a strike, it probably will not have an enormous effect on the game if the next pitch goes for a homer. But the point is this: Called pitches occur frequently, in both high- and low-leverage situations. It is a frequent event, a defensive event far more frequent than most other observed defensive events.

Jeff Sullivan of FanGraphs has explored the edges of PITCHf/x and framing data. In a recent conversation, Jeff explained his reasoning for why PITCHf/x data may provide even more reliable defensive numbers than UZR, but also how pitch framing could potentially be a catcher’s biggest defensive contribution:

A typical defender, like a middle infielder, sees a few hundred defensive opportunities each season, and the overwhelming bulk of those will either be fairly routine plays or unmakeable plays, so you have a pretty limited sample of plays where a really good or really bad defender can separate himself. But a catcher can catch up to 5,000 to 10,000 called pitches–and of course many of those are going to be automatic balls or automatic strikes. But a whole lot of those are going to be towards the fringes, and the fringes are actually pretty big.

I can buy that the very best and the very worst framers can make a pretty significant difference just by adding up the volume of pitches they actually catch around the corners.

In 2013, J.J. Hardy and Starlin Castro tied for 159 games started at shortstop. They played in 1,417 and 1,418 innings, respectively. Neither defender had more than 700 chances. Hardy’s teammate, backup catcher, Taylor Teagarden, appeared in 23 games and caught 1,296 called pitches.

Where does that lead us quantitatively?

The quantitative value

Tom Tango, in many ways an emissary of linear weights, offered this simple breakdown in 2011:

BB = +0.30 runs

SO = -0.27 runs

Therefore:

Called ball = +0.075 run state

Called strike = -0.09 run state

That’s a 0.16-run swing per framed pitch.

This could then extrapolate to about 20 runs per a full season (between 120 and 150 games) if the catcher can frame just one pitch per game.

This is the core of our analysis. But it does not consider how far the pitch was from the strike zone, nor does it consider an umpire’s tendencies or the leverage of the event. It does not consider the impact on later events, the potential for ripple effects (as in the second qualitative example).

At the simplest, we can believe an average pitch, converted from ball to strike by a superior reception, is worth 0.16 runs. This is a good starting point and a good sounding board–a structure for our sniff test of future numbers–but we should expect the truth to be more complex.

Three major methodologies have dominated the pitch-framing conversation. The first is the Mike Fast study. In 2010, then-Baseball Prospectus writer and Hardball Times alumnus, Fast published his seminal “Removing the Mask Encore Presentation” article in which he exposed a then little-known Molina brother as being perhaps the game’s greatest pitch-framer. He pegged Molina’s framing ability at about 3.5 wins per 120 games.

Fast admits he was not the first to explore the impact of pitch framing and receiving, but he appears to be the first researcher to put the pieces together so seamlessly. It certainly seems the baseball community took note, too. The winter following the publication of Fast’s data, the Rays signed a rare free agent deal with Molina, a deal made even more unusual in that it guaranteed Molina, then 37 years old, the first starting catcher job of his career.

One of the core difficulties of analyzing PITCHf/x framing data is that normal, publicly available PITCHf/x data does not include information about the defenders active in the field. It lists a pitcher and a hitter but no other players. Fast and Carruth both had to develop proprietary code magic in order to add catcher information to their PITCHf/x databases.

But another THT alumni and BP author, Max Marchi, employed Retrosheet data to slip around that problem (“Catcher Framing Before PITCHf/x”). Marchi had worked with no small amount of PITCHf/x pitch-framing data before looking at Retrosheet’s numbers, so he is among the best qualified to confirm or deny a correlation between the two. What he found, in short, was a more conservative estimation of expected impact, but a strongly similar one.

According to Marchi, the RetroFraming model and his PITCHf/x model had a covariance of 0.72, which is pretty good.

The third major study, one conducted relatively quietly (both Marchi and Fast got attention from the whole of the baseball community, including SABR), is Matthew Carruth’s methodology for his aforementioned publicly available framing data. While Carruth did not generate the waves Fast or Marchi did, his work has become the go-to resource for any baseball researcher without the coding know-how to build his own database. When I talked with Carruth, he was quick to admit his methodology wasn’t the trimmest or shiniest. But it does have a strong connection with Fast’s results:

Framing Comparisons and Distributions

The strong correlation is nice to see, but it can more illuminating to look at some specific entries in this data. And we can even fold in what few samples we have from Marchi’s RetroFraming study. Unfortunately, because he did not release a complete dataset–with individual season numbers and a full list of players–we are left with a rather select collection of catchers who played exceptionally and did so in the PITCHf/x era:

image001

These comparisons don’t really maintain any easy pattern. Marchi says the RetroFraming numbers tended to be more conservative, but here they appear bullish on both Joe Mauer and Jason Varitek. Meanwhile, StatCorner–which, in my mind, was always the most radical–looks conservative with respect to both Mauer and Varitek. I imagine those two perceptions–StatCorner being aggressive and RetroFraming being conservative–play out over the full data, but in these small overlaps, we cannot see that.

If we break this comparison down more specifically, we can see the nature of how these metrics diverge. Again, RetroFraming comes without a complete dataset. We can, however, prorate the career numbers across innings played as a catcher and plot in the select seasons that we know. That gives us a plot somewhat like this:

image003

The RetroFraming numbers, as in the non-prorated dots in 2008 and 2012, appear very similar to the numbers available via Mike Fast and StatCorner. But we also can see via the prorated numbers, which would be lower if they included the two RetroFraming dots, that RetroFraming is indeed the most conservative.

image004

With Martin, we see StatCorner and RetroFraming matching in 2008, but in 2010, RetroFraming suggests an aggressive 18-run season from Martin. Adding those numbers to his prorated line would push his RetroFraming run estimates far lower than StatCorner and Fast.

image007

Varitek has the most curious divergence in the group. In 2007, the first year with PITCHf/x data, RetroFraming believed he had a 26-run season. Both Fast and StatCorner had Varitek in negative run values.

It is quite possible, considering both Fast and StatCorner showed fairly precipitous declines late in Varitek’s career, that he was, in fact, a great receiver earlier in his career. And given the unreliability of PITCHf/x data in the early days, it would not be unreasonable to think the 2007 data is incorrect.

A Call for Progress

What I find most fascinating is the histogram from above (“Distribution of Framing Talent”). Granted, that’s a the minimum of 10,000 pitches received, so certain players may have had unusual seasons (10,000 pitches constitutes about a full season of catching). But I’m flummoxed that the three studies suggest there are about 15 catchers above 1.5 wins and about 15 catchers below 1.5 wins per season. This is mostly culled from data across the last seven seasons.

Can we trust the results from these recent studies? Consider this: The researchers have found steady year-to-year correlations. The different research methods have yielded strong inter-study correlations. Moreover, this is not a skill invented in the last 10 years, but one that has been taught since pitchers were no longer required to throw strikes.

Are the data perfect? No. The qualitative examples above demonstrate how there are impacts beyond the raw run value of a ball flipped to a strike. I think we may be many years away from an accurate distribution of credit and run values when it comes to framing, but that does not mean these early offerings are any worse than our present shift-weakened defensive run values.

It leads me to ask: Why can’t we fold pitch framing into catcher WAR?

A catcher’s contributions at the plate are obvious; we can grade them just as easily as any other hitter. A catcher’s job at controlling the running game appears increasingly less important in our understanding. The matter of blocking pitches and preventing passed balls also appears to offer only a minimal edge–a half-win advantage or penalty to the best or worst in 2013.

But the effect a catcher may have on a called pitch is potentially enormous. It’s time our data reflected that.


Print This Post
Bradley writes for FanGraphs and The Hardball Times. Follow him on Twitter @BradleyWoodrum.
Sort by:   newest | oldest | most voted
The Stranger
Guest
The Stranger

This is pretty fascinating stuff. I wonder, is this a zero-sum game in a lot of ways? If you increase catcher WAR to show the impact of pitch framing (and later, game calling), do you then have to decrease pitcher WAR by a comparable amount? You can’t prevent the same run twice, after all, and it seems like pitchers are currently getting credited with preventing the runs saved via pitch framing.

Baltar
Guest
Baltar

You took the words right out of my fingers. The pitchers have been credited or discredited for the catchers’ framing skill or lack thereof for more than a century.

Redsoxmaniac
Guest
Redsoxmaniac
One thing about normalizing the strength of the strike/ball is that the data will minimize the expected micro and macro outcomes of the catching/pitching battery. For example, Guys like Zaun were specifically catching certain pitchers toward the end of his career. Those comfort tandems (ie: Ace throws only to backup catcher) can influence a lot on how a game is pitched, and possibly the discomfort with pitcher throwing to another catcher, or the catcher catching someone he doesn’t usually play with, can skew the stats in favor of the comfortable pair. Even the pair baseline is out for the pitcher,… Read more »
*****
Guest
*****

if you think about it, pitcher performance is also very difficult to quantify. Does this guy really have a good K rate, or is it because his catcher turns his balls into strikes?

I Believe that a pitcher’s numbers should be adjusted based on his catcher, maybe a pitcher is worth 2 WAR on paper, but there is a good chance that his catcher heavily contributed to his WAR values.

Michael Guetti
Guest
Michael Guetti
Asking a pitcher to hit a moving target — a catcher shifting around behind the plate — is as ridiculous as asking an umpire to make a decision on an on-the-corner pitch when said catcher is reaching across his body to receive it: With so much activity from the catcher, why would an ump call the pitch a strike? I’m not sure what the definition of “framing” is. Is it holding the glove steady so the (sometimes myopic) ump gets a clear view where it was caught? Is it the movement to the edge of the strike zone when the… Read more »
Eric Chalek
Guest
W/R/T game calling: Redsoxmaniac’s point about MGRs calling pitches is important. I remember TLR or his staff calling a lot of pitches. Lots of camera shots of catchers looking into the dugout. That one seems beyond parsing. At least framing has some discreetness to it as a skill, and I would love to see it included in WAR in some way. And now here’s another wrinkle. Once we have a really firm grasp on what framing does to counting stats, can we find a way to apply it to pre-FX eras? Can we find some relationship among the counting stats… Read more »
Hank G.
Guest
Hank G.

It would be ironic if sabermetricians figure out the true value of pitch framing right at the point that MLB institutes automatic ball/strike calling.

Avery B.
Guest
Avery B.

One thought about the chained effects of pitch framing. Your examples cover a few ways in which the impact of a single framed pitch can cascade through a game, but I wonder if it might have a similar impact on umpires. If a quality pitch framer (or just a really comfortable pitcher / catcher battery) successfully hit targets or frame pitches on the edges of the zone early in a game, could that have an impact on the likeliness of similar pitches being called for a strike later on in the game?

M.Cook
Guest
M.Cook

I’d love to see a follow-up that goes in-depth on the year-to-year correlation on pitch framing runs.

wpDiscuz