In my first article, I wrote about the limitations of the linear weights system that wOBA is based on when it comes to the context of unusual team offenses. In my second, I explained how Tom Tango, wOBA’s creator, also came up with a way of addressing some of these limitations by deriving a new set of linear weights for different run environments, thanks to BaseRuns. Today, I will tell you about the next step in the evolution of run estimators — the Markov model. Tom Tango created such a model that can be accessed through his website, and I’ve turned that model into a spreadsheet that I’ll share with you here.
I’ve told you that the problem with the standard run estimator formulas is that they make assumptions about what a hit is going to be worth, run-wise, based on what it was worth to an average team. That means it’s not going to apply very well to an unusual team. What’s so great about the Markov is that it makes no such assumptions — it figures all of that out itself, specific to each team. And when I say it figures it out, I mean it basically calculates out a typical game for that team, given the proportion of singles, walks, home runs, etc. the team gets in its plate appearances. It therefore estimates the run-scoring of typical teams better than just about anything, but it also theoretically should apply much, much better to very unusual or even made-up teams.
Will this spreadsheet thing make my life complete?
Well, not really. But it is fun to explore. The thing I think it’s most useful for is to guess how many runs a team would score with or without certain players. To demonstrate why this may be eye-opening for you, I’m going to show you how even two players with identical wOBA and wRC+ ratings could have significantly different offensive values to different teams.
Markov: I must break you…r perceptions of player values
In 2011, Mark Trumbo and Alberto Callaspo had identical wOBAs (0.328) and therefore identical wRC+ as well (108), seeing as how they both played for the Angels. However, they achieved these above-average wOBAs in very different ways: Callaspo with a 0.366 OBP and 0.375 SLG, and Trumbo with a 0.291 OBP and 0.477 SLG. So, let’s place these two onto various teams to see what happens. To keep things simple, let’s just pretend there’s no such thing as park effects.
Now, before I get into this, let me remind you that teams don’t have a fixed number of plate appearances per season, but their number of outs in a season is close to fixed; e.g. 162 games/season * 9 innings/game * 3 outs/inning = 4374 outs. Of course, it’s not exactly that, mainly because of extra innings and the fact that the home team won’t have a full 9 innings of offense in games they win. Anyway, I’m going to try to equalize Trumbo and Callaspo for playing time by giving them the same number of outs, defined as: Outs = PA – H – BB – HBP + CS + GDP. Ideally, that would also add outs on the bases as well, but FanGraphs doesn’t provide that as of yet.
Another thing: I really ought to be removing a player from each of these teams to make room for Trumbo or Callaspo, but so as not to add the additional variable of different players being removed from different teams, we’ll just reduce each team’s outs (and the rest of their numbers proportionally) to make room. This means we’re basically just pretending that all the original players on that team had their playing time reduced a bit to make room.
So, without further ado, here’s what happens when 2011 Trumbo’s (T) or Callaspo’s (C) numbers are inserted into various especially good or bad offenses:
|Season||Team or Player||OBP||SLG||Aggro||Actual||Markov (tweaked)||Markov (default)||BaseRuns||Runs Created|
A bit more explanation: besides the default version of the Markov that Tango has on his site, as well as the simple versions of BaseRuns and Bill James’ Runs Created that the webpage also produces, I’ve listed the results for a slightly altered version of the Markov that I came up with, which attempts to account for certain factors that are missing from the Markov (I’ll talk more about this later). The “aggro” factor is my stab at measuring base running aggression and effectiveness that I use in the tweaked Markov.
So, at the top two spots on the list, we have the theoretical runs scored of teams full of clones of either Trumbo or Callaspo. This is basically the same idea as the RC27 you can find amongst ESPN.com’s sabermetric stats (which places Trumbo at 4.47 and Callaspo at 5.22, by the way). You can see right away that the Markovs favor Callaspo over Trumbo more than you might expect from their wOBAs and wRC+. Do you remember seeing the exponential growth curve of runs depending on team OBP in my last article? That explains why this is the case — it’s an important team effect that wOBA doesn’t try to account for.
You’ll also notice that relative to Trumbo, Callaspo is worth a lot more to the good offenses than to the bad ones. In particular he’s worth more to the high-OBP teams, as besides the exponential impact his better OBP has on runs, his relative lack of power hurts less. That’s because the value of a single to a high-OBP team is greater than it is to a low-OBP team, especially relative to a HR (see the graphs in my second article if that confuses you). There is a threshold of team suckitude at which 2011 Trumbo’s offense would become more valuable to a team than 2011 Callaspo’s, but it appears that even a bad team in the deadball era of the 60s is still a little bit short of that.
Play along at home or work
I took a page out of Bradley Woodrum’s book and I’m giving you a peek via the Excel Web App. Just click on the green Excel icon in the bottom right area of the app to download the spreadsheet (about 1 MB in size). Once you’ve downloaded it, you’ll be able to paste data from the Standard section of team batting numbers from FanGraphs (link) into the “Enter Data Here” tab of my spreadsheet, or enter whatever you want manually. You’ll then be able to see the results of the calculations on the “Results” tab (surprise), which you should be able to find near the bottom of the spreadsheet. Here ya go:
The Perfect Run Modeler? Almost.
Tom Tango says his model is “mathematically perfect,” but readily acknowledges that it’s a bit simplistic, ignoring not only steals (SB) and caught stealing (CS), but grounded into double plays (GIDP) and other outs on bases (OOB). To properly account for these factors would require a much more complicated model, but I’ve come up with some modifications that attempt to account for those factors, without fundamentally changing Tango’s model.
The first thing I did was to reduce each team’s expected plate appearances per game by their expected GIDP and CS per game, along with an empirically-derived OOB constant tied to their on base rates. It’s not a perfect solution, because, for one, OOB rates aren’t so constant, as James Gentile recently pointed out at THT. You can, however, get OOB data from Baseball-Reference.com, if you have the patience and the desire. Another issue (I think) is that GIDP rates are dependent on how likely it is for a batter to have men on base, which would mean, for example, that I shouldn’t be penalizing a team full of 9 Trumbos so much for GIDP, because that team would be less likely to be able to hit into one. That could be worked out better, but it’s tricky.
The other main thing I did was to create the aforementioned base running aggressiveness modifier to the extra-base-taking rates that are essential to the model (they’re really the main assumptions in the model that are a bit tricky to estimate). It’s based on things like steals and caught stealing per runner on 1B, as well as 3B/2B. It’s probably not so proper that I’ve also included GIDP/PA as a major factor here, but the last trick I did didn’t fully account for the negative impact of GIDPs. I also included team OBP and SLG as factors, as one can expect weaker teams to be more aggressive on the base paths due to low odds of scoring without taking extra bases.
Finally, I changed the default extra-base-taking rates to be more in-line with Tango’s empirical findings. Of course, those rates aren’t entirely stable. Feel free to change anything in the “Results” tab that is bordered in red, as you see fit. You can even mess around with the “Calculations” tab if you know your stuff.
Well, that’s my time. Hope you’ve enjoyed. There’s plenty more I can say about this subject, if you’re interested — let me hear your questions and comments, and if you’d like to see me apply this to something else or make changes.