How Should We Evaluate a Manager?

I’ve got a vote for American League Manager of the Year this season and I’m terrified. My first vote as a member of the Baseball Writer’s Association, and it’s the impossible one.

Maybe impossible is too tough a word. I’m sure I’ll figure something out in time to submit a vote. But evaluating the productivity of a manager just seems so difficult. We’ve seen efforts that use the difference between projected and actual wins, or between “true talent” estimations for the team and their actual outcomes. But those attribute all sorts of random chance to the manager’s machinations.

I’d like to instead identify measurable moments where a manager exerts a direct influence on his team, assign those values or ranks, and see where each current manager sits. So what are those measurable moments?

As far as I, noted idiot, can figure, it’s these variables that define a manager:

  • When he uses his best relievers.
  • How rigid his approach to the bullpen is.
  • Where he puts his best hitters in the lineup.
  • How often he bunts with non-pitchers.

What follows is a difficult, probably error-laden, and almost certainly futile effort to score managers on how they’ve done in these four categories. And the worst part is that, when it’s done, not only can I not reveal my vote, but the work won’t really be done — because there’s no doubt that managers are also in charge of personalities and shepherding a clubhouse full of athletes into a cohesive unit. We won’t find a number for that one, and it seems a necessary moment of chaos that threatens to upend our best efforts.

For the first of the above variables, however, what we’d like to do is have a measure of the reliever’s true talent, and then we can match it up with leverage index, which tells us how important the moment is in the context of the game. We’ve got two ways to do this.

The first, called Bullpen Management Above Random by creator Tim Kniker, uses weighted on base average allowed to lefties and righties to judge their true talent. Kniker was kind enough to run this year’s BMAR for us.

Bullpen Management Above Random, 2016
Manager Manager above Monkey Optimal above Monkey % of Optimal Runs Above Monkey WAM
Bryan Price 0.018 0.091 19.3% 34.2 3.5
Buck Showalter 0.019 0.084 22.4% 33.0 3.4
Joe Girardi 0.020 0.100 20.5% 32.8 3.3
Ned Yost 0.017 0.066 25.4% 28.5 2.9
Pete Mackanin 0.016 0.081 19.7% 27.5 2.8
Bob Melvin 0.015 0.071 21.7% 27.3 2.8
Brad Ausmus 0.016 0.053 31.0% 27.0 2.8
Mike Matheny 0.017 0.084 20.1% 26.9 2.7
Andy Green 0.015 0.061 24.3% 26.8 2.7
Clint Hurdle 0.014 0.063 22.4% 26.3 2.7
Jeff Banister 0.015 0.073 21.0% 25.7 2.6
Kevin Cash 0.016 0.064 24.6% 25.4 2.6
John Farrell 0.016 0.059 27.0% 23.9 2.4
Joe Maddon 0.015 0.068 21.6% 21.9 2.2
AVERAGE 0.012 0.109 11.2% 20.8 2.1
Craig Counsell 0.012 0.061 19.0% 20.6 2.1
Terry Francona 0.013 0.061 21.1% 20.4 2.1
Chip Hale 0.010 0.081 12.7% 19.9 2.0
Robin Ventura 0.012 0.061 20.2% 19.4 2.0
Don Mattingly 0.011 0.068 15.6% 18.7 1.9
Paul Moliter 0.009 0.068 13.9% 17.8 1.8
Walt Weiss 0.010 0.093 10.6% 17.5 1.8
A.J. Hinch 0.009 0.084 11.0% 15.5 1.6
Terry Collins 0.009 0.072 12.3% 14.3 1.5
Fredi Gonzalez 0.027 0.109 24.6% 13.3 1.4
Dave Roberts 0.007 0.069 9.7% 12.0 1.2
John Gibbons 0.008 0.077 10.1% 11.8 1.2
Mike Scioscia 0.006 0.069 9.3% 11.3 1.2
Bruce Bochy 0.007 0.067 10.7% 10.9 1.1
Brian Snitker 0.007 0.079 9.3% 10.2 1.0
Scott Servais 0.005 0.067 8.1% 8.9 0.9
Dusty Baker -0.003 0.062 -4.9% -4.6 -0.5
SOURCE: Retrosheet, Tim Kniker
Manager Above Monkey is expressed in points of wOBA the manager suppressed using reliever choice. WAM = Wins Above Monkey.

Many Cards fans don’t respect Mike Matheny‘s bullpen management, so perhaps this passes the sniff test. For the purposes of my later vote, I find it interesting that John Farrell is second in the league in the percent of optimal usage, and that Buck Showalter has a really good bullpen and uses it well. Terry Francona is above average by percent of optimal, but third in this trio.

However, wOBA allowed may not be the best way to judge a player’s true talent. Rob Arthur and Rian Watt developed Weighted Reliever Management score to judge the same thing, and he used Baseball Prospectus’ Deserved Runs Allowed as the measure of talent. He’s a generous person, so he ran a one-year version of wRM for this year for our benefit.

Weighted Reliever Management Scores by Team, 2016
Team DRAcor
Brewers -0.88
Red Sox -0.82
Pirates -0.80
Astros -0.70
Yankees -0.68
Braves -0.66
Royals -0.66
Padres -0.65
Phillies -0.62
White Sox -0.61
Mets -0.60
Tigers -0.54
Giants -0.52
Rays -0.51
Indians -0.49
Orioles -0.43
Reds -0.41
Athletics -0.41
Twins -0.40
Mariners -0.36
Blue Jays -0.35
Angels -0.32
Rockies -0.25
Marlins -0.25
Nationals -0.22
Dodgers -0.17
Rangers -0.15
Cubs -0.15
Diamondbacks 0.00
Cardinals 0.05
SOURCE: Rob Arthur, Five Thirty Eight

Arthur wanted to point out that these are not the same as the wRM+ scores in the article, because they aren’t based on multiple years and regressed in the same way. Instead, shown here is the raw correlation between DRA and the leverage index of the relievers on the team, weighted by innings pitched. More negative means that the better relievers are being used with the most leverage, which means the manager is doing well.

By this measure, Farrell does well; Matheny, not so much.

Finally, there might be a third way, a better way to evaluate reliever usage — namely, by using projections instead of true-talent estimators. Because Arthur used full-season DRA and leverage numbers in his analysis, he was more answering the question “Looking back, did the manager make the right decisions?” It might make sense, however, to ask “Based on the info he had then, did the manager make the right decisions?” We have the benefit of looking at the results here, but since we also don’t have the full season or multiple years of regression, there’s more uncertainty with these rankings than the ones he published.

I’m a big fan of flexibility in approach. Flexibility, to me, is the sign of a supple mind, one that allows for considerations beyond one’s own worldview. So it’s fun that Ben Lindbergh and Dan Brooks developed a stat that measures the rigidity of reliever usage. He looked at the number of outs the team had recorded before a given reliever came entered, and the variation in that number, to see if some managers were less flexible in their usage. Lindbergh was kind enough to run this stat for 2016, leading up to September, which is fine for me because September baseball is played on the moon by aliens.

Reliever Role Rigidity by Team, 2016
Team Team Reliever Role Rigidity
ANA 3.64
MIN 3.57
OAK 3.52
CIN 3.33
TBA 3.26
SLN 3.22
CLE 3.16
LAN 3.12
BOS 3.10
HOU 3.09
SDN 3.05
PIT 3.02
ARI 3.00
CHA 2.99
MIL 2.94
KCA 2.93
SEA 2.92
MIA 2.89
ATL 2.87
WAS 2.85
BAL 2.84
CHN 2.82
NYA 2.82
TEX 2.78
PHI 2.77
COL 2.73
NYN 2.68
SFN 2.68
DET 2.64
TOR 2.29
RRR is the standard deviation of the number of team outs recorded before a reliever’s appearance. In this case, we calculated the RRR for relievers with at least ten innings and then did a weighted average for each team. Lower = more rigid.

Here, a lower number means a more rigid approach to the bullpen. Brad Ausmus scores poorly even as he tried to find the right combination of relievers to lead up to closer Francisco Rodriguez. The Indians have been better at moving the pieces around once they got the right pieces — that was the reason Lindbergh brought out RRR this year, since Cody Allen has been moved around more than most closers.

For the next piece, Jonah Pemstein helped assess the ability of a manager to put the right players in the right spots in the lineup. What he did was look at the projected daily wOBA for each player in each lineup slot for each team, which was awesomely provided by Matt Hunter and SaberSim. Pemstein then ranked those wOBAs on the team, and then correlated their actual lineup slot to their ideal lineup slot, using Tom Tango’s The Book as a guide for the best way to put a club’s assorted players into the lineup.

What we have here, then, is a correlation between actual and ideal lineup slots. The closer a team is to one, the closer they were to putting out an ideal lineup every night.

Correlation of Actual to Ideal Lineup by Team
Team Correlation of Actual to Ideal Lineup
Pittsburgh 0.836
Anaheim 0.806
Chicago (N.L.) 0.751
Baltimore 0.724
Miami 0.716
San Diego 0.715
Toronto 0.707
Boston 0.676
St. Louis 0.674
Washington 0.672
Houston 0.668
Cincinnati 0.667
Colorado 0.652
Detroit 0.647
Arizona 0.641
New York (N.L.) 0.639
San Francisco 0.637
Tampa Bay 0.634
Philadelphia 0.617
Cleveland 0.611
Milwaukee 0.611
Atlanta 0.607
Seattle 0.605
Minnesota 0.598
Texas 0.593
Los Angeles (N.L.) 0.571
Chicago (A.L.) 0.515
Oakland 0.480
New York (A.L.) 0.444
Kansas City 0.426
SOURCE: SaberSim, Jonah Pemstein

To see what these distributions look like, Pemstein also created some visuals. Let’s look at the extremes and find the distribution of lineups for Pittsburgh and Kansas City. Clint Hurdle has his team’s lineups bunched up over by the right side, the ideal side, while Kansas City has a whole lot of Alcides Escobar at the top of the lineup.


For the purposes of my vote, Showalter gets some more love, while Farrell and Gibbons also do well. Francona, though, takes a little hit. Looks like SaberSim liked Tyler Naquin better than other projection systems, and he batted eighth for most of the year.

We’ve saved the easiest for last. Rather than going through each bunt situation in a granular way, I thought we could say something wide-ranging that should still ring true — namely, that it’s rarely a good idea to bunt with a non-pitcher. They should be good enough with the bat that it lowers win expectancy to trade their out for a base.

The American League plays the National League enough that a simple leaderboard tweak should give us what we want. Which teams bunted the least often with their non-pitchers? Answer: the Red Sox, with the Orioles in third, and the Tigers and Blue Jays in fifth and sixth.

In sum, we have some arguments for the managers who will be on the short list for American League Manager of the Year. John Farrell, John Gibbons, and Buck Showalter do well across the board, while Jeff Banister and Terry Francona have their strengths and weaknesses. Picking between them will be difficult. Almost impossible, even.

Thanks to Ben Lindbergh, Dan Brooks, Tim Kniker, Jeff Zimmerman, Jonah Pemstein, Rob Arthur, Rian Watt, and Matt Hunter for their help in putting this together.

Print This Post

Graphs: Baseball, Roto, Beer, brats (OK, no graphs for that...yet), repeat. Follow him on Twitter @enosarris.

Comments Are Loading Now!