The Hall of Fame’s Bullpen Problem

What exactly makes a reliever Hall worthy? (via Alex Kim, Terren Peterson, Dirk Hansen & Howell Media Solutions)

What exactly makes a reliever Hall worthy? (via Alex Kim, Terren Peterson, Dirk Hansen & Howell Media Solutions)

Editor’s Note: This is the third post of “Hall of Fame Week!” For more info, click here.

The Hall of Fame tends to bring about a lot of arguing. Arguments over players like Jim Rice (“the fear”) and Bert Blyleven (strikeouts are awesome!) went on for some time, as did the squabbling over Jack Morris and the importance of pitcher wins. Then we harangued each other about steroid users and the character clause. Now, over the next few years, the big Hall of Fame argy-bargy will be about relief pitching.

The debuts of Trevor Hoffman and Billy Wagner on the Hall of Fame ballot this year are about to expose the fact that the baseball community still has no idea what to make of relievers. They are a relatively new phenomenon, by baseball standards (40 or 50 years old), and their role and usage have changed a lot even within that short lifespan. As a result, modern-style relief pitchers didn’t appear regularly on the Hall of Fame ballot until the 2000s.

As we stand today, only five pitchers have been elected to the Hall primarily for their efforts out of the bullpen. (Compare this to 62 starting pitchers and 61 outfielders.) To this point, voters have been able to get away with treating reliever candidates on a case-by-case basis (though it has produced erratic results). But with two of the relief corps’s strongest candidates yet embarking on their Hall of Fame campaigns in 2016, we need a reliable way to compare Wagner to Hoffman, those two to ballot holdover Lee Smith, and those three to the five standing honorees. Above all, it’s time we established a fair Hall of Fame standard against which to measure relievers’ careers.

But first, let’s get one thing out of the way: Relief pitchers belong in the Hall of Fame. Reliever is a distinct and established position in today’s game, albeit a less valuable one, and the Hall doesn’t adequately capture the full story of baseball without including them. Keeping Hoffman or Wagner out of the Hall because you don’t like the position they played is about as fair as barring entry to Edgar Martinez for being a designated hitter. Some positions inevitably will be worth more than others, which is why voters should enshrine the best players at each position, rather than comparing relievers to starters, left fielders to catchers, or apples to oranges.

Leaving that debate for another day, the search is on for the best statistics to use to evaluate relievers. We can’t use traditional stats like wins or saves—artificial constructs that assign pitchers credit or blame for patterns of use that are out of their control. But we also can’t use more accepted advanced metrics like WAR. Because of how little they play, relievers, no matter how good they are, never have the chance to sniff the high win-contribution totals amassed by elite starters and position players.

At most positions, if a player has more than 60–70 career WAR, he’s in Hall of Fame territory, but not relievers. Is their magic number closer to 40? Thirty? Twenty? (This is an important question to answer for one of the more popular sabermetric Hall of Fame barometers, Jay Jaffe’s/Sports Illustrated’s JAWS, which relies on comparing candidates to their enshrined brethren at the same position.) The small sample size of relievers in the Hall makes the ideal positional reference point uncertain.

For the record, the average fWAR of the five bullpen inductees is 33.4. But we can’t in good conscience use that number as a benchmark; it’s skewed too high by the exceptional circumstances of the five. One, Dennis Eckersley, earned 41.4 of his career 61.8 fWAR as a starting pitcher before he was converted to relief. The others all played part or most of their careers as multi-inning closers,\.

SELECTION OF MULTI-INNING CLOSERS
Pitcher IP in Relief Relief Appearances Average Relief IP/G
Hoyt Wilhelm 1,872.3 1,018 1.84
Rollie Fingers 1,505.7   907 1.66
Rich Gossage 1,556.7   965 1.61
Bruce Sutter 1,042.0   661 1.58

In essence, guys like Gossage, Fingers, and Wilhelm got into the Hall of Fame throwing 150 to 200 percent more career innings than modern closers do. Of the four, Sutter was the closest to a modern reliever — he never started a game, for instance, unlike the other three — yet he still pitched an average of 1.58 innings per appearance. That’s a chance to pick up 1.58 times more WAR than modern relievers do thanks solely to managerial discretion.

Clearly, one-inning closers are still uncharted territory for the Hall of Fame. That’s bad news for Hoffman (1.05 IP/G) and Wagner (1.06 IP/G), who are likely to be unfairly compared to their more heavily used predecessors, despite the fact that on an inning-by-inning basis, they were actually far superior. There’s an easy way to quantify this, of course: Look at WAR per inning pitched. To make the numbers easier to conceptualize, let’s scale it to WAR per 200 innings.

MULTI-INNING CLOSERS, 200 IP SCALE
Pitcher Career fWAR fWAR as Reliever IP in Relief Relief fWAR/200
Dennis Eckersley 61.8 20.4   807.3 5.1
Rich Gossage 31.1 28.8 1,556.7 3.7
Bruce Sutter 19.2 19.2 1,042.0 3.7
Rollie Fingers 27.4 25.9 1,505.7 3.4
Hoyt Wilhelm 27.3 19.4 1,872.3 2.1

That’s an average of 3.6 fWAR per 200 innings among current Hall of Fame relievers. Now let’s see how the three relief pitchers on the 2016 ballot stack up.

Mental Health and the CBA
A particular bit of language in the latest CBA could have negative consequences for some players.
2016 HALL OF FAME RELIEVER CANDIDATES, 200 IP SCALE
Pitcher Career fWAR fWAR as Reliever IP in Relief Relief fWAR/200
Billy Wagner 24.2 24.2   903.0 5.4
Trevor Hoffman 26.1 26.1 1,089.3 4.8
Lee Smith 26.6 25.8 1,252.3 4.1

Those are significantly better numbers than four of the five current Hall of Famers. On this scale, only Eckersley was a better closer than Hoffman and Smith, and Wagner tops them all.

Still not convinced? There are other advanced metrics that do proper justice to relievers. Win probability, for instance, is an even more precise way to get at dominance, drilling down to the level of individual plays elicited by these bullpen aces. Two analogous stats on this front are context-neutral wins (a.k.a. WPA/LI) for a player’s effect on win expectancy and REW (RE24 converted to a wins scale) for its effect on run expectancy. Win-probability stats only go back to 1974, so data for Fingers (debuted in 1968) and Gossage (debuted in 1972) are incomplete, and Wilhelm (retired in 1972) is missing entirely, but here are the data we do have.

WIN PROBABILITY RELIEF STATS COMPARISON
Pitcher WPA/LI in Relief REW in Relief
Rich Gossage 17.67 20.96
Billy Wagner 18.03 19.91
Trevor Hoffman 17.50 18.84
Lee Smith 13.88 17.79
Dennis Eckersley 13.29 13.59
Rollie Fingers  9.80 12.72
Bruce Sutter 12.69 12.10

The four current Hall of Famers average 13.36 WPA/LI and 14.84 REW. As for the hopefuls? Well, it’s decisive. Again, the modern closers far surpass all but one of their decorated peers—this time Gossage, who bests everyone in REW but still trails Wagner and effectively ties Hoffman in WPA/LI.

The verdict is clear on the preponderance of evidence: Pound-for-pound, Hoffman, Wagner, and even the much-maligned Lee Smith were far more dominant than four of the five current Hall of Famers. This isn’t to say Wilhelm et al. aren’t worthy—they do deserve some extra credit for their durability and the value some of them brought as starters. But as pure relievers, the current ballot’s trio were more fearsome. They should be immortalized in the Hall, low-end WARs and all.

And leading the pack overall: Not 600-save club member and AC/DC fan Trevor Hoffman, but scorching southpaw Billy “The Kid” Wagner. Throw out saves (though Wagner still has 422, good for fifth all time), and it’s hard to make a case for Hoffman over the less heralded lefty. Wagner’s 2.31 ERA and 2.73 FIP far outstrip Hoffman’s (2.87 and 3.08). He is second all time among relievers in both RE24 and WPA/LI (Hoffman is fourth in both). Among pitchers with at least 500 innings pitched, Wagner’s 1.00 WHIP is the second-lowest in baseball history. His strikeout rate of 33.2 percent stands alone as the highest.

Wagner, simply put, is the best reliever ever to appear on a Hall of Fame ballot. Of course, this won’t be true in three years’ time, when Mariano Rivera makes what will surely be a brief appearance on the ballot, as he is the best reliever in history by virtually every measure (his 652 saves, 6.4 fWAR per 200 innings, 34.38 WPA/LI, and 34.75 REW are all the best marks ever amassed out of a bullpen).

Rivera will be elected easily when he’s eligible in 2019, and rightfully so, but hopefully his specter won’t keep others out in the meantime. Wagner may not be Rivera, but using that as a reason to keep him out of the Hall makes about as much sense as saying Tim Raines is unworthy because he’s not Rickey Henderson.

If you can only find room on your ballot for one reliever, make it Wagner. But measuring by more precise yardsticks, all three closers on the 2016 ballot clear the bar that voters have set for relief pitchers: an average of 3.6 WAR/200, 13.36 WPA/LI, and 14.84 REW, to go along with enough total value (a floor that has been set around 20 fWAR based on modern bullpen usage) to guarantee some longevity along with that dominance. You can quibble with where those thresholds should be set, but by precedent alone, Wagner, Hoffman, and Smith would all be credits to their position’s standing in the Hall of Fame.


Print This Post
Nathaniel Rakich writes about politics and baseball at Baseballot. He has also written for The New Yorker, Grantland, The New Republic, and Let's Go Travel Guides. Follow him on Twitter @baseballot.
Sort by:   newest | oldest | most voted
Jim S.
Guest
Jim S.

Well done. Wagner it is.

John DiFool
Guest
John DiFool

“At most positions, if a player has more than 60–70 career WAR, he’s in Hall of Fame territory, but not relievers. Is their magic number closer to 40? Thirty? Twenty? ”

More to the point, does WAR truly capture their value?

John
Guest
John

Obviously not. Clearly there is a serious problem if there is general recognition that a different WAR standard has to be used for relief pitchers getting into the Hall of Fame.

The Dude
Guest
The Dude

I think this is one of the most thoughtful explorations of relief pitcher’s place in the Hall of Fame I’ve ever read (and I’ve spent a lot of time trying to find analysis on this subject.) I feel this should be mandatory reading for all HOF voters.

Bpdelia
Guest
Bpdelia

Fair enough. Though I’m a big hall guy. I’m predicting Hoffman gets over 50% but Wagner checks in at less than 20% making it a tough slog for him.

And fact of unless these guys get in over the next 3 years the looming candidacy of Rivera will hurt them.

For a few years after Rivera is elected voters will compare them to Rivera and say, “these guys are far lesser players”.

Add the fact that Sutter is already viewed as a mistake and it’s easy to see saves being the benchmark for a while to come.

mac
Guest
mac

Which would be like measuring outfielders against Wills Mays

Studes
Guest
Studes

FYI, Baseball Reference has WPA stats for the entire WPA era.

Carl
Guest
Carl

Echoing The Dude above in that this was great, thoughtful analysis. Thank you for writing and sharing.

Rally
Guest
Rally
I was surprised to see Wilhelm so low in WAR. That’s because of the differences in fWAR (27.3) and rWAR (50.1). In his case, it comes down to beating his FIP (2.52 ERA, 3.06 FIP) considerably by limiting hits on balls in play (.245). I don’t mean to restart the debate on which method is best generally, but in this case it is obvious that fWAR underrates Wilhelm, since we’ve known that knuckleballers have consistently low BABIP since Voros first published his DIPS research. Using the current HOF relievers to establish precedent would lead to a flood of closers in… Read more »
Mike Green
Guest
Mike Green

Seconded. Very well put, Rally. Steve Treder’s work here many years ago on translation of starting pitching performance to relief pitching performance provides a basis for your views on Gossage/Guidry.

Carl
Guest
Carl

Actually for about 1/2 season Guidry was moved to the pen when Gossage was hurt and Guidry did quite well.

Rich Lipinski
Guest
Rich Lipinski
I love sabermetrics, but on relief pitchers it has to go deeper, As sabr people say you can’t judge on small sample sizes. Closers, especially today are a small sample size. So too totally judge a closer a few other things must be taken into consideration. The purpose of the closer is also to be the man at biggest times. At the biggest times (post season)Billy Wagner was so far beyond horrid it can’t be ignored. Add to that his 2005 melt down in back to back games against Houston that cost Phillies a playoff appearance. A year later against… Read more »
Marc Schneider
Guest
Marc Schneider
Talk about small sample size. You are taking a couple of bad games Wagner had-admittedly important games-and using it to judge his career? The problem with this is you get into the issue of what is an important game. It can’t only be a couple of late-season games. I’m sure Wagner also saved some important games; albeit maybe not direct win-or-go home games. What about the games he saved to get the Astros in position for the playoffs; those games weren’t important? It just seems very unfair to pick a couple of games from his career. I can think of… Read more »
Matt
Guest
Matt

Hold on. So you start off the post admitting to the problem of small sample sizes… and then blast Wagner for a handful of anecdotal games you remember.

Bold strategy, sir.

Dave B
Guest
Dave B

While I agree that Wagner was better than Hoffman, I don’t think it’s fair to compare rate stats between the 70’s relievers and the modern closers. Clearly, it’s easier to be dominant, and throw all-out, knowing that you’re only going one inning, versus having to maybe go two or three innings. The 70’s relievers also came into the game with runners on far more often, necessitating pitching from the stretch.

Jaack
Guest
Jaack
I think it really comes down to how many relievers ought there be in the Hall. Currently here are 73 starters (counting Eck as a starter), 4 relievers and between 16 and 24 at each position. Of course relief pitching is a relatively modern phenomenon, so you obviously wouldn’t want 16 relievers, unless you really like Sparky Lyle or Doug Jones as candidates. Since Wilhelm is the earliest reliever with any support, we can put estimate the reliever era to be 1960-present, which covers about half of Wilhelm’s career. That is just about 40% of baseball history, which translates to… Read more »
Rich
Guest
Rich

The article mentions comparing apples and oranges as something to avoid. Comparing Wagner to Gossage without somehow factoring in Innings Pitched (903 to 1556.7) *is* comparing apples and oranges. I like the approach in general, but feel it is missing that element.

Tim Johnson
Guest
Tim Johnson

Agreed- the one inning “closer” is a different animal than the multi-inning “fireman”. Rivera, Hoffman, Wagner, etc, should be considered on their own merits against each other, not against Gossage and his gang. Comparing closers to firemen in a HOF discussion is like comparing third basemen to shortstops…

Jerry Skurnik
Guest
Jerry Skurnik

One inning closers have a much easier job than earlier closers. Saying that Wagner’s numbers are better than Wilhelm’s or Fingers is like comparing Doc Gooden’s first inning numbers to Tom Seaver’s total numbers. I hate to say but the only way for HOF voters to rate closers is by using their own sense of how good they really are. I’d vote for only Wilhelm, Gossage, Fingers & Rivera at this time.

Jetsy Extrano
Guest
Jetsy Extrano

I think you’re being too hard on relievers when you use WPA/LI and REW. They should get some credit for leveraged use, by virtue of being good enough to pick in those high-leverage situations.

But on the other hand, if a reliever doesn’t match up with other Hall of Famers even by raw WPA — giving them 100% credit for leverage, probably too much — then how do you really make the case?

Rivera 57
(Randy Johnson 53)
(Schilling 35)
Hoffman 34
Gossage 32
(Tim Hudson 30)
Wagner 29
(Troy Percival 24)
Smith 21
(Todd Jones 19)
Sutter 18

Compare Helton 53, Olerud 33, Burrell 19.

Barney Coolio
Guest
Barney Coolio

One inning closers: Hoffman and Wagner were clearly one inning closers, but Hoffman did make 9 appearances of 4 or more innings. He also threw 4 innings once. I haven’t looked very closely at Wagner. Both of them retired with about 50 more innings pitched than appearances.

Compare that with a current guy like Craig Kimbrel who has fewer innings pitched than appearances in each of his seasons. He will surely retire with fewer innings than appearances. I wonder what the voters will feel about that?

Hot Toddie
Guest
Hot Toddie

Will Dan Quisenberry’s career be re evaluted?

Trace Juno
Guest
Trace Juno

Why is WHIP hardly considered? Looks to me like that is one reason Lee Smith doesn’t get in (and I tend to sgree).

Also, I like what Jaack wrote, and it seems to me it should be Mo, Hoffman, Wagner, and not much after that for quite some time.

pt
Guest
pt
Great article. It seems strange to give rate stats (WPA/LI or WAR/IP) primary importance when WAR is the gold standard when talking about Hall candidacy for non-relievers Raw WPA (http://www.baseball-reference.com/leaders/wpa_def_career.shtml) seems like a great first thing to look at to me. It suggest that Rivera is an inner-circle Hall of Famer (something almost everybody agrees with, unless you just don’t like the idea of relievers in the Hall). Raw WPA also suggests that Hoffman might come out ahead of Wagner (something many could agree with). Using WPA also reflects why teams want to use great pitchers in a relief role… Read more »
John
Guest
John

The problem with only using raw WPA is that is doesn’t reflect luck. It should be combined with FIP- (or something like that) in some way.

Michael Bacon
Guest
Michael Bacon
I am no fan of the yankees, but to read Marc Schneider write that “Mariano Rivera blew Game 7 of the 2001 World Series,” is a bit too much to take. The Closer known as Mariano induced Luis Gonzalez to induce an infield pop-up that would have been caught had the manager, Joe Torre, not drawn the infield in. The Closer did his job. If anyone should be questioned for the outcome of that game, I suggest that person be Joe Torre. (And how about IBB, called for by the manager, being assigned to the poor pitcher? Manager gives batter… Read more »
Barney Coolio
Guest
Barney Coolio

Yeah, sometimes pitchers get screwed by bad defense. And sometimes they get saved by bad defense. You say the pitchers should not be held responsible for their teammates’ bad defense, but what if a fielder makes a truly spectacular play that really “should not have been made”? Should that out not be recorded?

Marc Schneider
Guest
Marc Schneider
I concede your point about Joe Torre’s blunder in GAme 7. But, remember, Rivera had given up the tying run before Gonzalez’s bloop hit. Now, granted, some of that was due to his own error, but he did give up a couple of normal hits. In fact, while I agree it was a mistake to play the infield in, it would not have been in if the winning run had not already been on third base. So, the closer DID NOT do his job. The point I was trying to make, which you seem to have missed, is not that… Read more »
Michael Bacon
Guest
Michael Bacon
I am saying exactly what I said, Barney. A pitcher should not take the blame for something out of his control. Mariano Rivera should not take the blame for “blowing” a save in Game 7 of the 2001 WS because he did not “do his job.” Baseball is a team game and as such if any blame is given, it should be given to the “yankees” as a team because the manager, Joe Torre, a great manager (and ballplayer, who is HOF worthy on just what he did as a catcher alone since a catcher should be judged differently because… Read more »
Bruce
Guest

It seems like you are rewarding Billy Wagner for retiring while he was still at his peak. It seems to me that his having 20% less career IP than Hoffman should probably ding him a bit. If you subtract Hoffman’s last four seasons you get to about the same IP as Wagner and I bet that Hoffman would probably have better rate stats. Since Hoffman produced positive WAR during those last four seasons, why should Wagner get the nod over him?

CoolWinnebago
Guest
CoolWinnebago

Im not sure its as simple as just lopping off the end of Hoffman’s career.

Wagner was unquestionably more dominant with a career K/9 of 11.92 vs Hoffman’s 9.36

That being said, Hoffman is certainly more deserving than Sutter.

John
Guest
John

There is clearly a serious problem with relief pitchers and WAR. According to this according, radically different criteria for WAR have to be applied for relief pitchers in terms evaluating whether they should be in the Hall of Fame, and I think everyone would agree with that. But the entire idea of WAR is supposed to be that it’s a universal measure of value for any position. This is a contradiction. The method of calculating WAR for relief pitchers has to be changed significantly so that this mismatch isn’t so glaring.

wpDiscuz