## Team-Specific Hitter Values by Markov

In my first **article**, I wrote about the limitations of the linear weights system that wOBA is based on when it comes to the context of unusual team offenses. In my **second**, I explained how Tom Tango, wOBA’s creator, also came up with a way of addressing some of these limitations by deriving a new set of linear weights for different run environments, thanks to BaseRuns. Today, I will tell you about the next step in the evolution of run estimators — the Markov model. Tom Tango created such a model that can be accessed through his website, and I’ve turned that model into a spreadsheet that I’ll share with you here.

I’ve told you that the problem with the standard run estimator formulas is that they make assumptions about what a hit is going to be worth, run-wise, based on what it was worth to an average team. That means it’s not going to apply very well to an unusual team. What’s so great about the Markov is that it makes no such assumptions — it figures all of that out itself, specific to each team. And when I say it figures it out, I mean it basically calculates out a typical game for that team, given the proportion of singles, walks, home runs, etc. the team gets in its plate appearances. It therefore estimates the run-scoring of typical teams better than just about anything, but it also theoretically should apply much, much better to very unusual or even made-up teams.

**Will this spreadsheet thing make my life complete?**

Well, not really. But it is fun to explore. The thing I think it’s most useful for is to guess how many runs a team would score with or without certain players. To demonstrate why this may be eye-opening for you, I’m going to show you how even two players with identical wOBA and wRC+ ratings could have significantly different offensive values to different teams.

### Markov: I must break you…r perceptions of player values

In 2011, Mark Trumbo and Alberto Callaspo had identical wOBAs (0.328) and therefore identical wRC+ as well (108), seeing as how they both played for the Angels. However, they achieved these above-average wOBAs in very different ways: Callaspo with a 0.366 OBP and 0.375 SLG, and Trumbo with a 0.291 OBP and 0.477 SLG. So, let’s place these two onto various teams to see what happens. To keep things simple, let’s just pretend there’s no such thing as park effects.

Now, before I get into this, let me remind you that teams don’t have a fixed number of *plate appearances* per season, but their number of *outs *in a season is close to fixed; e.g. 162 games/season * 9 innings/game * 3 outs/inning = 4374 outs. Of course, it’s not exactly that, mainly because of extra innings and the fact that the home team won’t have a full 9 innings of offense in games they win. Anyway, I’m going to try to equalize Trumbo and Callaspo for playing time by giving them the same number of outs, defined as: *Outs = PA – H – BB – HBP + CS + GDP*. Ideally, that would also add outs on the bases as well, but FanGraphs doesn’t provide that as of yet.

Another thing: I really ought to be removing a player from each of these teams to make room for Trumbo or Callaspo, but so as not to add the additional variable of different players being removed from different teams, we’ll just reduce each team’s outs (and the rest of their numbers proportionally) to make room. This means we’re basically just pretending that all the original players on that team had their playing time reduced a bit to make room.

So, without further ado, here’s what happens when 2011 Trumbo’s (T) or Callaspo’s (C) numbers are inserted into various especially good or bad offenses:

Season | Team or Player | OBP | SLG | Aggro | Actual | Markov (tweaked) | Markov (default) | BaseRuns | Runs Created |
---|---|---|---|---|---|---|---|---|---|

2011 | Mark Trumbo | 0.291 | 0.477 | -0.193 | ? | 4.440 | 4.765 | 4.828 | 5.066 |

2011 | Alberto Callaspo | 0.366 | 0.375 | -0.043 | ? | 4.988 | 5.211 | 5.125 | 5.219 |

1963 | Colt .45′s | 0.283 | 0.301 | 0.190 | 2.864 | 2.837 | 2.774 | 2.921 | 2.959 |

1963 | Colt .45′s+T | 0.284 | 0.318 | 0.154 | ? | 2.997 | 2.975 | 3.115 | 3.156 |

1963 | Colt .45′s+C | 0.292 | 0.308 | 0.165 | ? | 3.023 | 2.978 | 3.114 | 3.162 |

1965 | Mets | 0.277 | 0.327 | 0.119 | 3.018 | 2.956 | 2.968 | 3.121 | 3.153 |

1965 | Mets+T | 0.278 | 0.342 | 0.089 | ? | 3.187 | 3.144 | 3.289 | 3.327 |

1965 | Mets+C | 0.286 | 0.332 | 0.105 | ? | 3.215 | 3.145 | 3.292 | 3.343 |

1968 | Mets | 0.281 | 0.315 | 0.238 | 2.902 | 2.945 | 2.850 | 3.035 | 3.110 |

1968 | Mets+T | 0.282 | 0.331 | 0.199 | ? | 3.094 | 3.040 | 3.214 | 3.289 |

1968 | Mets+C | 0.290 | 0.321 | 0.208 | ? | 3.120 | 3.042 | 3.216 | 3.300 |

2011 | Mariners | 0.292 | 0.348 | 0.195 | 3.432 | 3.454 | 3.385 | 3.538 | 3.608 |

2011 | Mariners+T | 0.292 | 0.361 | 0.159 | ? | 3.554 | 3.525 | 3.670 | 3.749 |

2011 | Mariners+C | 0.300 | 0.351 | 0.171 | ? | 3.590 | 3.537 | 3.681 | 3.763 |

1994 | Yankees | 0.374 | 0.462 | -0.283 | 5.929 | 5.904 | 6.516 | 6.404 | 6.630 |

1994 | Yankees+T | 0.364 | 0.464 | -0.271 | ? | 5.663 | 6.227 | 6.163 | 6.427 |

1994 | Yankees+C | 0.373 | 0.450 | -0.246 | ? | 5.774 | 6.331 | 6.223 | 6.423 |

1996 | Mariners | 0.366 | 0.484 | -0.197 | 6.168 | 6.098 | 6.526 | 6.452 | 6.765 |

1996 | Mariners+T | 0.360 | 0.483 | -0.196 | ? | 5.911 | 6.328 | 6.279 | 6.602 |

1996 | Mariners+C | 0.366 | 0.473 | -0.178 | ? | 5.989 | 6.397 | 6.323 | 6.607 |

1999 | Indians | 0.373 | 0.467 | -0.161 | 6.228 | 6.119 | 6.547 | 6.454 | 6.688 |

1999 | Indians+T | 0.366 | 0.468 | -0.162 | ? | 5.925 | 6.340 | 6.279 | 6.538 |

1999 | Indians+C | 0.373 | 0.457 | -0.148 | ? | 6.006 | 6.414 | 6.321 | 6.535 |

A bit more explanation: besides the default version of the Markov that Tango has on his site, as well as the simple versions of BaseRuns and Bill James’ Runs Created that the webpage also produces, I’ve listed the results for a slightly altered version of the Markov that I came up with, which attempts to account for certain factors that are missing from the Markov (I’ll talk more about this later). The “aggro” factor is my stab at measuring base running aggression and effectiveness that I use in the tweaked Markov.

So, at the top two spots on the list, we have the theoretical runs scored of teams full of clones of either Trumbo or Callaspo. This is basically the same idea as the RC27 you can find amongst ESPN.com’s sabermetric stats (which places Trumbo at 4.47 and Callaspo at 5.22, by the way). You can see right away that the Markovs favor Callaspo over Trumbo more than you might expect from their wOBAs and wRC+. Do you remember seeing the exponential growth curve of runs depending on team OBP in my **last article**? That explains why this is the case — it’s an important team effect that wOBA doesn’t try to account for.

You’ll also notice that relative to Trumbo, Callaspo is worth a lot more to the good offenses than to the bad ones. In particular he’s worth more to the high-OBP teams, as besides the exponential impact his better OBP has on runs, his relative lack of power hurts less. That’s because the value of a single to a high-OBP team is greater than it is to a low-OBP team, especially relative to a HR (see the graphs in my **second article** if that confuses you). There is a threshold of team suckitude at which 2011 Trumbo’s offense would become more valuable to a team than 2011 Callaspo’s, but it appears that even a bad team in the deadball era of the 60s is still a little bit short of that.

### Play along at home or work

I took a page out of Bradley Woodrum’s book and I’m giving you a peek via the Excel Web App. Just click on the green Excel icon in the bottom right area of the app to download the spreadsheet (about 1 MB in size). Once you’ve downloaded it, you’ll be able to paste data from the Standard section of team batting numbers from FanGraphs (link) into the “Enter Data Here” tab of my spreadsheet, or enter whatever you want manually. You’ll then be able to see the results of the calculations on the “Results” tab (surprise), which you should be able to find near the bottom of the spreadsheet. Here ya go:

### The Perfect Run Modeler? Almost.

Tom Tango says his model is “mathematically perfect,” but readily acknowledges that it’s a bit simplistic, ignoring not only steals (SB) and caught stealing (CS), but grounded into double plays (GIDP) and other outs on bases (OOB). To properly account for these factors would require a much more complicated model, but I’ve come up with some modifications that attempt to account for those factors, without fundamentally changing Tango’s model.

The first thing I did was to reduce each team’s expected plate appearances per game by their expected GIDP and CS per game, along with an empirically-derived OOB constant tied to their on base rates. It’s not a perfect solution, because, for one, OOB rates aren’t so constant, as James Gentile recently pointed out at THT. You can, however, get OOB data from Baseball-Reference.com, if you have the patience and the desire. Another issue (I think) is that GIDP rates are dependent on how likely it is for a batter to have men on base, which would mean, for example, that I shouldn’t be penalizing a team full of 9 Trumbos so much for GIDP, because that team would be less likely to be able to hit into one. That could be worked out better, but it’s tricky.

The other main thing I did was to create the aforementioned base running aggressiveness modifier to the extra-base-taking rates that are essential to the model (they’re really the main assumptions in the model that are a bit tricky to estimate). It’s based on things like steals and caught stealing per runner on 1B, as well as 3B/2B. It’s probably not so proper that I’ve also included GIDP/PA as a major factor here, but the last trick I did didn’t fully account for the negative impact of GIDPs. I also included team OBP and SLG as factors, as one can expect weaker teams to be more aggressive on the base paths due to low odds of scoring without taking extra bases.

Finally, I changed the default extra-base-taking rates to be more in-line with Tango’s empirical findings. Of course, those rates aren’t entirely stable. Feel free to change anything in the “Results” tab that is bordered in red, as you see fit. You can even mess around with the “Calculations” tab if you know your stuff.

Well, that’s my time. Hope you’ve enjoyed. There’s plenty more I can say about this subject, if you’re interested — let me hear your questions and comments, and if you’d like to see me apply this to something else or make changes.

Print This Post

can’t wait to download this sheet when I get home (have to do work now). this looks like great stuff – thanks!

Excellent.

good freakin lord.

I heart this. I also feel this is why people hate advanced statistics. It’s difficult for most people to support something they can barely read, let alone understand.

I generally support most advanced metrics, but what’s the possible application of this particular one? I’m not saying there isn’t one but’s it’s not readily apparent. If the answer is that there doesn’t have to be one and it’s just a “fun theoretical exercise,” well then yeah it’s a venture into the uber-esoterica.

For a team that has decide who 7 or 8 of it’s everyday hitters are with an estimate of the run environment that they create, it could be a good way to figure out which FA are most worthwhile to pursue.

… and evaluate prospective trades. I’d like to think that there are smart teams (Rays?) already doing something like this.

Exactly, guys. It has a practical application to a GM and his staff, in analyzing potential moves. To most of us… well, what application does *any* stat or tool have, really? It’s not like we have a say in what our favorite teams do. It’s just something to improve our understanding of the game, and for fun.

I would imagine there are teams out there that have something more advanced than this, really. Something that takes more factors and lineups into account, even.

That is the most ignorant comment I’ve possibly ever read. Who would ever want an accurate run estimator!!

“the most ignorant….”

jeez, what an imbecile.

He asked a legitimate question. Don’t be a dick, just politely answer it.

Oh my god.

Digging deep for the answers, I like it. Could you now add in the park effects please. (jus KIDDIN) Great work Steve!

I will be busy for weeks with this stuff. Thanks.

I have nothing smart to say to this. Just wow.

Nice work. Confirms the rule of thumb that the value of slugging goes up as OBP decreases while the value of OBP goes up as slugging increases. I always say that the handy way to think about it is this:

Imagine a team that draws a walk 95% of the time. They will scored dozens of runs in a game and a home run won’t add much.

Imagine a team that gets out 95% of the time. A player that reaches base will almost never score. Home runs are nearly the only way to score.

This is basically what Steve showed in his previous article.

I know this is going to infuriate the thinnest skinned sabres in the community, but isn’t this sort of instinctual on its own? I mean don’t we already look at a team and say that guys like Mark Trumbo need players in front of him like Callaspo to get on base so that his XBH ability can be maximized?

Isn’t there (in theory) already some thought process that goes into constructing a lineup with the idea of maximizing run creation? Obviously not all managers look at this the same way, but I’m just saying that no one looks at each player without considering his affect on the rest of the lineup.

Sort of. I’m not sure that everybody realizes, or at least knows how to quantify, the synergy that a bunch of high-OBP guys on the same team creates, though.

Doesn’t this article claim that Trumbo would be relatively more valuable on a bad OBP team?

This actually runs counter to instinct. A team with a good leadoff hitter would actually be better served by adding another leadoff hitter.

Well, it’s like philosofool said earlier — a low-OBP team will be unlikely to string a bunch of hits or walks together in the same inning, so won’t score much outside of home runs.

As for the team with a good leadoff hitter adding another good leadoff hitter — well, that depends on the rest of the team. If there are enough high-OBP guys to string together rallies, then singles, walks, and doubles have a lot more potential to score runs, which makes home runs less critical.

I explain this stuff more in my previous two articles, in case you missed them.

Hot.

You are already one of my favorite FanGraphs authors. Keep it up!

Thank you! (and everybody else, for the nice comments above)

could you use this at the start of the offseason to determine the “best fit” for free agents?

Me personally? I’m not sure I’ll be around writing that long, but that’s a great idea for whoever wants to try.

Sorry to be clueless, but I’m pretty unsure of what to make of this. For the first part, it seems like the upshot is that maybe wOBA’s weights are a little off such that OBP is a little undervalued vis a vis SLG. Then it looks like you do a lot of stuff and show that in the end, you kind of get back where you started: wRC+/ wOBA gets transformed, but in the end when you look at Runs Created (last column) after the magic, it turns out that in most cases the teams get very nearly the same improvement from two players with identical wOBAs even if the wOBAs are achieved differently. If the author or another commenter could clarify a little more I’d appreciate it. Thanks.

Oh, sorry, I was unclear there. “Runs Created” is a run estimator formula by Bill James, which has nothing to do with the Markov. So I have the results of 4 different run estimators there — 2 versions of the Markov, BaseRuns, and Runs Created (RC). The RC formula basically sees no difference between Callaspo and Trumbo, and that’s the problem with it.

That helps, thanks!

No problem. I also failed to explain that “Actual” refers to actual runs per game, which is the target for all the run estimators (except for the hypothetical teams, of course).

beautiful.

I think the only step up from this is a good Monte Carlo or Simulator. With a Monte Carlo you are able to get a sense of how lineup construction mixes in with skillsets (high/low OBP, high/low SLG). Kind of like this.

Definitely. The perfect simulator is the ultimate run estimator. I’d be interested in seeing how simulators stack up against this. I don’t have any, though.

I know someone that does. What kind of test do you want to run?

Hm, what are my options? I’m open to suggestions, of course.

I’d probably want at least 10 years’ worth of runs scored estimates for every MLB team, analyzed in terms of correlation to the actual runs per game, as well as the mean absolute error and/or RMSE vs. actual runs per game.

The more years, the better, though.

I was thinking more of, plugging players in and out of lineups. Less science projecty. :)

Haha, sorry, I’m a sciency guy… I like large sample sizes whenever possible. That’s a big part of why I went through the trouble of making the spreadsheet — I wanted to test it out on a ton of teams without having to enter them one-by-one on Tango’s site.

Great! Thank you so much for the article and the spreadsheet.

I think that this could also be used as a developmental tool, not just something to help evaluate FAs. Specifically, since most smaller market teams depend on player devolopment and home grown talent, this could be used to develop a consistent philosophy.

What I’m getting outside of the compounding effect that OBP and slugging have when surrounded by “like” talent, is that it might be less beneficial to construct a diverse lineup. But I don’t have the savvy to properly look into that.

This looks very neat.

I never went anywhere with it, but Mike Hunnersen and I used to talk about narrowing the context for batters in an offence to the 3 or 4 batters around them, with the idea that someone batting 2nd doesn’t have so much influence over the person batting 7th, and the other way around. We would probably use this kind of idea, but perhaps limit the context to half the batting order, to at least arrive at a good estimate not only of a player’s impact on the overall offence, but roughly speaking which groups of batters should hit together in the lineup, which could aid in lineup construction, or at the least lead to strange insights like “if X is forced into the lineup, put Y in there, too”.

(Of course, you might believe that batting order is totally unimportant. I think it matters, but it’s nowhere near the bottleneck in tweaking team offensive performance, so I’d attack bigger questions first.)

No, I agree — lineup effects definitely matter. Believe it or not, I was bouncing around the idea of splitting up the batting order just last night. I just couldn’t decide how to do it.

Here’s my conundrum: going along the lines of your example, if the 2nd hitter leads off the inning and gets on, you could have 2 outs following, 2 more men on base, and then the next batter drives him in — the 7th batter. That’s not likely, but even if you say it was the 6th batter who drove him in, you’re still saying plus or minus 4 positions in the lineup are relevant, which of course is almost the entire lineup.

I’m thinking one idea would be to do something where only the batter in question plus or minus 3 lineup positions is considered, then 4, then 5, and then take the weighted average of the results based on the relevance of each result to the batter?

Of course the real solution would be a more sophisticated model that works off of lineups.

Like a simulator.

Sure, assuming it’s well-made.

Yes. Assuming it has done well enough vs Vegas in the past.

Are you referring to the simulator you use on your site?

Something’s bothering me here. wOBA comes from linear weights, and linear weights come from modern offense, so in theory, the offense provided by two players with the same wOBA should be equally valuable to an average modern offense, and nearly equally valuable to anything close to an average modern offense, but what you’ve done shows Callaspo anywhere from beating to clobbering Trumbo over the entire range of modern offenses. That can’t be right (unless wOBA is way wrong).

I think there’s a problem in the way you handled the outs. Holding team outs constant is fine, but you can’t multiply by 8/9 and then add 1/9 Callaspo or 1/9 Trumbo outs in because they don’t make outs at close to the same rate (which is the whole point). By allocating Callaspo the same number of outs as Trumbo, you’re effectively adding in a lot more Callaspo PAs than Trumbo PAs, and since he’s an above average hitter, that improves offenses. I think you need to figure out how many outs each is expected to make on a team with 8/9 original out% and 1/9 new player out% (Callaspo less than 1/9th, trumbo more than 1/9th of team outs) and then add in their stats based on that number of outs.

Ah, good catch on the outs issue. I spent a bunch of time on the Markov itself, but then rushed that part, sorry…

Anyway, I did as you suggested for all the teams, and it did of course change the numbers a bit, but Callaspo still came out ahead in every case, believe it or not (even with the default Markov). Both Markovs agreed that the Callaspo over Trumbo advantage was smallest in the ’65 Mets (the lowest OBP team) — 0.002 runs per game in the default and 0.027 in the tweaked. I think the tweaked version favors Callaspo more because of factors that aren’t included in the default or in wOBA — GIDP, SB, and CS (really, though, 0.027 RPG amounts to only 4.37 runs per 162 games).

I’d appreciate if somebody could double-check that.

Regarding wOBA… the Markovs and BaseRuns all do seem to suggest that it underrates players who are good at avoiding outs. The linear weights produced by both Markovs especially seem to suggest that wOBA’s weights underrate walks and HBP.