In last episode’s thrilling cliffhanger, I left you with a formula that I brashly proclaimed “does a great job of explaining the trends” in strikeout rates for meetings between specific groups of batters and pitchers. Coming up with a formula to explain what was going on wasn’t pure nerdiness — making formulas to predict these results is the point of this research project. You see, the goal of my FanGraphs masters is to come up with a system by which we can look at a batter and a pitcher, and tell you, our loyal followers, some educated guesses of the chances of pretty much every conceivable outcome that could result from these two facing off against each other. Getting a sense of the expected strikeout rate is merely the first step in what will likely be a long process of continuous improvement.
The idea of this matchup system is to not only give you estimates that are more free from the whims of randomness than “Batter A is 8-for-20 with 5 Ks and 1 HR in his career against Pitcher B,” but also to provide some evidence-based projections for matchups that have never even happened. How do we propose this can be done? By looking at the overall trends and seeing how players fit within them. Can it really be done? It definitely looks that way to me. Today’s installment will be about attempting to convince you of that.
A quick recap: I looked at about 1.5 million plate appearances (over 2002-2012), considering the handedness and historical K% (strikeouts per plate appearance) of the pitcher and batter involved in each. If the pitcher was a lefty for a particular PA, I considered the historical K% against lefties of the batter involved in that PA, for example. Rounding off to the nearest percent, I looked at the results of the PAs between all the different combinations of groups (e.g. 8% K batters vs. 20% K pitchers) to find the patterns.
Now, in coming up with formulas, the main goal was of course to use only the batting and pitching historical K% numbers (by relevant handedness) as inputs, and to output a “best fit” of the observed K% for all the matchups. In doing so, I put more importance on the types of matchups that happened more often (i.e., I PA-weighted). The other goal was for the results of the formula to stay within the range of 0-100% no matter the K percentages that were input; this meant linear formulas were not a good option.
After I came up with several formulas (and wrote most of this, by the way), I received some great feedback from commenters like Peter Jensen, MGL, and Pizza Cutter (and on his site, Tom Tango) that I should try out the “odds ratio” method instead (of which Pizza Cutter gives a good explanation in the comments, if you want the details). J. Cross suggested I could try out a logistic regression on the odds ratio’s components as an alternative to that (and Pizza Cutter elaborated on how that’s done). So I’ll be showing you results for those as well.
Take a look at this interactive tool that will tell you the expected results for any pitcher-batter combo according to these formulas. It will also give you an idea of how large of a role you’d expect randomness to play, based on how many PAs you have to go by. “Formula 2,” by the way, is the one I showed you at the end of the last article, while “Formula 1” is the supposedly “fine-tuned” version of it (so named because I started off with something more complicated than either formula and kept whittling factors down). “Logistic” is the formula that came up by using J. Cross’ suggestion.
You can download it by clicking the green icon at the bottom, if you want to actually see the formulas involved. Feel free to change around the numbers in the white boxes to get a better feel for how these formulas work (you can do that right here within the browser). To explain: in the default spreadsheet above, it says that if a batter with an 8% strikeout rate goes against a pitcher with a 25% strikeout rate (both rates being specific to the handedness of their opponent), then Formula 2 expects the batter to strike out 11.31% of the time.
However, to temper your expectations of how much you can depend on that projection, the numbers at the bottom tell you that even if 11.31% were a perfectly accurate estimate of the matchup K% between the two, if they were to face each other 100 times, there’s only a 24.50% chance that the batter would actually strike out somewhere between 10.31% and 12.31% of the time (plus or minus 1% from the estimate). Now, of course these estimates aren’t perfect — but these numbers are just telling you the highest standard you could possibly measure them against. They’re based on the binomial distribution, which I provided an interactive illustration of in a previous piece.
The same technique applies to flipping a coin — if you only flipped it 100 times, there’d only be a 15.76% chance that heads would come up somewhere between 49% and 51% of the time. So it depends not only on the number of trials (or observations) but the expected percentage itself. You can actually write over the expected matchup K%s and the Plus/Minus numbers if you feel like messing around with numbers like those. It has plenty of other baseball stat-related applications, when you think about it.
|PA Minimum||n||Average Error||Mean Absolute Error (MAE)||wMAE (PA-weighted)|
Here, we have some indications of how well the four formulas (#1, #2, Odds Ratio, and Logistic) perform over the 2002-2012 set. The average error is just an indication of whether the formula tends to overestimate or underestimate the rates; odds ratio guesses the K% to be higher than it actually is, on average, it shows. That’s what hurts it in the other measures; otherwise, it works about as well as the other formulas. The Mean Absolute Error takes the absolute value of each error and averages those out, so it’s something we want to look at to see whether a formula is matching up well with the movements in the data. The wMAE, then, is a version of MAE that puts more emphasis on the groups with more PAs to represent them, realizing that you can’t take the results for a group with 100 PA as seriously as those of a group with 10,000. Anyway, I’ll call this a slight victory for the logistic regression over Formula 1, with Odds Ratio in last place due to its overestimations.
Each point represents one matchup group (e.g. a point for 15% K batters vs. 20% K pitchers). There are 635 groups represented when the minimum PA requirement is 100, and 473 groups when the requirement is 400 PA.
Notice that the R-squareds look good for the odds ratio, but only because the regression line through the points has a slope in the low 0.9 area. That indicates overestimation once again. A perfect formula would have a slope of 1 and an R-squared as close to 1 as randomness allows (so an extremely large sample size helps). So in this contest, we have a draw between Formula 1 and the Logistic formula.
Now, let’s look at 2013 data (through Saturday June 8th, that is). I took a different approach in a couple of ways here. First of all, I made this about projecting into the future, rather than explaining the past. I used players’ 2009-2012 data to forecast the matchup K% here, as I do think it’s better to go by more recent data, if you can get a decent sample size. I therefore applied the 2009-2012 league K% of 18.7% where applicable.
The second thing I did differently was to calculate the formulas directly from the un-rounded K% for the batter and pitcher of each matchup, then group the matchups by their expected K% according to each formula. I just wanted to show it could work both ways. Here’s how things looked over the 7%-31% expected K% range, for which all of the formulas had at least 100 PA per group:
There are a lot fewer PAs to work with, so you’d expect the R-squareds to be lower. As you can see from my spreadsheet near the top of the article, there’s a LOT of randomness that can get thrown into the mix when your sample size is 100. The PAs per group generally peak at around 1000, in the 1%-17% range.
Given the small sample size of a partial 2013, I don’t think a victor can be declared, but I think it’s safe to say there is definitely a good amount of predictability going on here. I’d lean towards giving the victory to Formula 2 — the simplest of the bunch — followed by Logistic, F1, and Odds Ratio here. Odds Ratio still has issues with calibration, but so does F1 here.
For these future projections, it would be great to have some projected K% splits — those could be more reliable inputs than the 4-year averages I arbitrarily decided to use. J. Cross and the Steamer Projections gang? Dan Szymborski (ZiPS)?
For some reason, it seems like something is holding back the accuracy of the odds ratio method. I was speculating that the skewed shape of the K% distribution might have something to do with it. Or maybe there really is some different weighting that needs to come into play to produce accurate results — which is why the logistic formula worked well.
Let’s take a look at the two main types of formulas involved here — first the ones I came up with on my own:
It looks like a quarter of a dome — pretty simple, much like the formula.
And now the Odds Ratio-related ones (the odds ratio’s shape is almost identical to this):
Which do you think makes more sense? I’m thinking the logistic one now. It does make sense to me that a batter bad enough to strike out close to 100% of the time will still strike out at an extremely high rate even against a very low-strikeout pitcher. It has to do with taking the league mean (around 18%, let’s say) into account; a 99% K batter, for example, would have to be ridiculously, hopelessly bad, whereas a 1% K pitcher could be merely terrible. Advantage: pitcher. There’s probably a better way of explaining that, but there you go.
Where Do We Go from Here?
I would like some more feedback from you experts out there on what might be wrong with the odds ratio method here. Although the simple formula of mine (#2) does work very well here, I think I’ll have to go with the complicated, but more theoretically sound (I guess) logistic regression equation. Make things as simple as possible, but not simpler, as the saying goes.
Once the dust is settled, I plan to get to work on BB%, batted ball stats, BABIP… you name it.