Batter-Pitcher Matchups Part 2: Expected Matchup K%

In last episode’s thrilling cliffhanger, I left you with a formula that I brashly proclaimed “does a great job of explaining the trends” in strikeout rates for meetings between specific groups of batters and pitchers.  Coming up with a formula to explain what was going on wasn’t pure nerdiness — making formulas to predict these results is the point of this research project.  You see, the goal of my FanGraphs masters is to come up with a system by which we can look at a batter and a pitcher, and tell you, our loyal followers, some educated guesses of the chances of pretty much every conceivable outcome that could result from these two facing off against each other.  Getting a sense of the expected strikeout rate is merely the first step in what will likely be a long process of continuous improvement.

The idea of this matchup system is to not only give you estimates that are more free from the whims of randomness than “Batter A is 8-for-20 with 5 Ks and 1 HR in his career against Pitcher B,” but also to provide some evidence-based projections for matchups that have never even happened.  How do we propose this can be done?  By looking at the overall trends and seeing how players fit within them.  Can it really be done?  It definitely looks that way to me.  Today’s installment will be about attempting to convince you of that.

A quick recap: I looked at about 1.5 million plate appearances (over 2002-2012), considering the handedness and historical K% (strikeouts per plate appearance) of the pitcher and batter involved in each.  If the pitcher was a lefty for a particular PA, I considered the historical K% against lefties of the batter involved in that PA, for example.  Rounding off to the nearest percent, I looked at the results of the PAs between all the different combinations of groups (e.g. 8% K batters vs. 20% K pitchers) to find the patterns.

Now, in coming up with formulas, the main goal was of course to use only the batting and pitching historical K% numbers (by relevant handedness) as inputs, and to output a “best fit” of the observed K% for all the matchups.  In doing so, I put more importance on the types of matchups that happened more often (i.e., I PA-weighted).  The other goal was for the results of the formula to stay within the range of 0-100% no matter the K percentages that were input; this meant linear formulas were not a good option.

After I came up with several formulas (and wrote most of this, by the way), I received some great feedback from commenters like Peter Jensen, MGL, and Pizza Cutter (and on his site, Tom Tango) that I should try out the “odds ratio” method instead (of which Pizza Cutter gives a good explanation in the comments, if you want the details).  J. Cross suggested I could try out a logistic regression on the odds ratio’s components as an alternative to that (and Pizza Cutter elaborated on how that’s done).  So I’ll be showing you results for those as well.

Take a look at this interactive tool that will tell you the expected results for any pitcher-batter combo according to these formulas.  It will also give you an idea of how large of a role you’d expect randomness to play, based on how many PAs you have to go by.  “Formula 2,” by the way, is the one I showed you at the end of the last article, while “Formula 1″ is the supposedly “fine-tuned” version of it (so named because I started off with something more complicated than either formula and kept whittling factors down).  “Logistic” is the formula that came up by using J. Cross’ suggestion.

You can download it by clicking the green icon at the bottom, if you want to actually see the formulas involved.  Feel free to change around the numbers in the white boxes to get a better feel for how these formulas work (you can do that right here within the browser).  To explain: in the default spreadsheet above, it says that if a batter with an 8% strikeout rate goes against a pitcher with a 25% strikeout rate (both rates being specific to the handedness of their opponent), then Formula 2 expects the batter to strike out 11.31% of the time.

However, to temper your expectations of how much you can depend on that projection, the numbers at the bottom tell you that even if 11.31% were a perfectly accurate estimate of the matchup K% between the two, if they were to face each other 100 times, there’s only a 24.50% chance that the batter would actually strike out somewhere between 10.31% and 12.31% of the time (plus or minus 1% from the estimate).  Now, of course these estimates aren’t perfect — but these numbers are just telling you the highest standard you could possibly measure them against.  They’re based on the binomial distribution, which I provided an interactive illustration of in a previous piece

The same technique applies to flipping a coin — if you only flipped it 100 times, there’d only be a 15.76% chance that heads would come up somewhere between 49% and 51% of the time.  So it depends not only on the number of trials (or observations) but the expected percentage itself.  You can actually write over the expected matchup K%s and the Plus/Minus numbers if you feel like messing around with numbers like those.  It has plenty of other baseball stat-related applications, when you think about it.

Formula Comparisons

PA Minimum n Average Error Mean Absolute Error   (MAE) wMAE (PA-weighted)
F1 F2 OR Logit F1 F2 OR Logit F1 F2 OR Logit
100 635 -0.12% -0.54% 1.80% -0.12% 1.25% 1.44% 2.09% 1.27% 0.57% 0.68% 1.36% 0.55%
1000 327 -0.09% -0.03% 1.26% -0.11% 0.57% 0.66% 1.35% 0.55% 0.48% 0.57% 1.26% 0.46%
5000 114 0.10% 0.39% 1.17% 0.06% 0.40% 0.51% 1.18% 0.38% 0.39% 0.51% 1.16% 0.37%

Here, we have some indications of how well the four formulas (#1, #2, Odds Ratio, and Logistic) perform over the 2002-2012 set.  The average error is just an indication of whether the formula tends to overestimate or underestimate the rates; odds ratio guesses the K% to be higher than it actually is, on average, it shows.  That’s what hurts it in the other measures; otherwise, it works about as well as the other formulas.  The Mean Absolute Error takes the absolute value of each error and averages those out, so it’s something we want to look at to see whether a formula is matching up well with the movements in the data.  The wMAE, then, is a version of MAE that puts more emphasis on the groups with more PAs to represent them, realizing that you can’t take the results for a group with 100 PA as seriously as those of a group with 10,000.  Anyway, I’ll call this a slight victory for the logistic regression over Formula 1, with Odds Ratio in last place due to its overestimations.

Graphically (click to enlarge):

K formulas

Each point represents one matchup group (e.g. a point for 15% K batters vs. 20% K pitchers). There are 635 groups represented when the minimum PA requirement is 100, and 473 groups when the requirement is 400 PA.

Notice that the R-squareds look good for the odds ratio, but only because the regression line through the points has a slope in the low 0.9 area.  That indicates overestimation once again.  A perfect formula would have a slope of 1 and an R-squared as close to 1 as randomness allows (so an extremely large sample size helps). So in this contest, we have a draw between Formula 1 and the Logistic formula.

Now, let’s look at 2013 data (through Saturday June 8th, that is).  I took a different approach in a couple of ways here.  First of all, I made this about projecting into the future, rather than explaining the past.  I used players’ 2009-2012 data to forecast the matchup K% here, as I do think it’s better to go by more recent data, if you can get a decent sample size.  I therefore applied the 2009-2012 league K% of 18.7% where applicable.

The second thing I did differently was to calculate the formulas directly from the un-rounded K% for the batter and pitcher of each matchup, then group the matchups by their expected K% according to each formula.  I just wanted to show it could work both ways.  Here’s how things looked over the 7%-31% expected K% range, for which all of the formulas had at least 100 PA per group:

K formulas 2013

Formula: F1 F2 OR Log.
wMAE: 1.97% 1.58% 2.50% 2.03%

There are a lot fewer PAs to work with, so you’d expect the R-squareds to be lower.  As you can see from my spreadsheet near the top of the article, there’s a LOT of randomness that can get thrown into the mix when your sample size is 100.  The PAs per group generally peak at around 1000, in the 1%-17% range.

Given the small sample size of a partial 2013, I don’t think a victor can be declared, but I think it’s safe to say there is definitely a good amount of predictability going on here.  I’d lean towards giving the victory to Formula 2 — the simplest of the bunch — followed by Logistic, F1, and Odds Ratio here.  Odds Ratio still has issues with calibration, but so does F1 here.

For these future projections, it would be great to have some projected K% splits — those could be more reliable inputs than the 4-year averages I arbitrarily decided to use.  J. Cross and the Steamer Projections gang?  Dan Szymborski (ZiPS)?

Getting Theoretical

For some reason, it seems like something is holding back the accuracy of the odds ratio method.  I was speculating that the skewed shape of the K% distribution might have something to do with it.  Or maybe there really is some different weighting that needs to come into play to produce accurate results — which is why the logistic formula worked well.

Let’s take a look at the two main types of formulas involved here — first the ones I came up with on my own:

K Formula

It looks like a quarter of a dome — pretty simple, much like the formula.

And now the Odds Ratio-related ones (the odds ratio’s shape is almost identical to this):

K logistic

Umm, I have no idea how to describe that one’s shape.

Which do you think makes more sense?  I’m thinking the logistic one now.  It does make sense to me that a batter bad enough to strike out close to 100% of the time will still strike out at an extremely high rate even against a very low-strikeout pitcher.  It has to do with taking the league mean (around 18%, let’s say) into account; a 99% K batter, for example, would have to be ridiculously, hopelessly bad, whereas a 1% K pitcher could be merely terrible.  Advantage: pitcher.  There’s probably a better way of explaining that, but there you go.

Where Do We Go from Here?

I would like some more feedback from you experts out there on what might be wrong with the odds ratio method here.  Although the simple formula of mine (#2) does work very well here, I think I’ll have to go with the complicated, but more theoretically sound (I guess) logistic regression equation.  Make things as simple as possible, but not simpler, as the saying goes.

Once the dust is settled, I plan to get to work on BB%, batted ball stats, BABIP… you name it.

Print This Post

Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?

42 Responses to “Batter-Pitcher Matchups Part 2: Expected Matchup K%”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. rusty says:

    The logistic formula graph looks like a saddle that got cut off just past the saddle point (in the higher green band).

    The difficulty in extrapolating which graph is better based on its extrema (i.e. where batter or pitcher K% approaches 100) is that we don’t have any tool by which to evaluate it besides intuition. Even during Michael Jordan’s lone year of pro ball (in AA, no less), he only had a K% of 26% or so.

    Vote -1 Vote +1

    • Ahh, OK I kind of see that.

      Only one solution to the problem of not enough super-high-K% data points — put me in the majors! For science!

      Vote -1 Vote +1

      • rusty says:

        Haha, I appreciate the response.

        That said, since we’re really only looking at that left-most quarter or so of the shape, the two share the same type of convexity, so the analytic story you’re telling shouldn’t change that much. I’m using as high-water marks Kimbrel’s 50.1% K last year, and Adam Dunn’s 35.7% K from 2011…

        Vote -1 Vote +1

  2. tangotiger says:

    Steve: can you tell me what your formulas would give you for:

    a. 10% K v 10% K, in league of 17.5% K?

    b. 90% K v 90% K, in league of 82.5% K?

    Vote -1 Vote +1

    • Sure, Tango. A) 5.5%; B) 94.5%

      The spreadsheet app near the top of the article is interactive, for whoever is interested in figuring out more of these — you can input the numbers directly into it on this page.

      Vote -1 Vote +1

      • tangotiger says:

        It’s blocked at the office. Those numbers are for the odds ratio method.

        What do you get using your other methods?

        Vote -1 Vote +1

        • Whoops, I assumed that’s what you were interested in, and misread. OK, now listing Formula 1, 2, and Logistic:

          A) 5.35%, 5.94%, 5.55%
          B) 98.25%, 96.38%, 94.45%

          Formulas 1 and 2 won’t take a different league K% as an input, though (I know they should, ideally).

          Vote -1 Vote +1

        • tangotiger says:

          A+B must equal 100%.

          Because all I did for B was say “not strike out”. So, if you have a 5% chance to strikeout, you have a 95% chance to not strikeout.

          Vote -1 Vote +1

        • Ah, true. Another mark in favor of the logistic and/or odds ratio.

          Vote -1 Vote +1

        • tangotiger says:

          On my site, I also suggest that maybe you should limit it to 2007, because K rates jumped starting in 2008 or 2009. The league average is nothing close to constant.

          Vote -1 Vote +1

        • tangotiger says:

          The other thing that is wrong with the logit: the three coefficients must add up to 1. In your case, it’s +.92, +.92, -.77.

          So, a 17.5% hitter v 17.5% pitcher in a 17.5% league will NOT give you 17.5%!

          I mean, if you want to argue for +.92, +.92, -.84, we can have that discussion.

          Otherwise, you are overfitting.

          Vote -1 Vote +1

        • Ooh, good point. I’ll set that constraint and see what happens.

          I’ll give your 2007 suggestion a shot, but it’ll take some doing…

          Vote -1 Vote +1

        • OK, with that constraint, the coefficients go to 0.91,0.94, and -0.85

          The 100+ PA line (over the ’02-’12 set) now has a slope of 0.935 (going through the origin) and R^2 of 0.958, vs. the 0.906 and 0.956 for odds ratio.

          At 1000+ PA, it’s a 0.932 slope and 0.987 R^2, vs. 0.919 and 0.984 for odds ratio.

          So that constraint definitely brings the logistic function a lot more in line with odds ratio.

          Can you think of any way to adjust for the shifting league K%? My thought was that it would all average out, but I don’t know…

          Vote -1 Vote +1

        • Tangotiger says:

          The easiest way is to limit your data to 2002-07. Is there a reason you can’t do that?

          Vote -1 Vote +1

        • No, I can (eventually). I was just curious about what kind of math would be involved in fixing the problem for this data set.

          Vote -1 Vote +1

        • Tangotiger says:

          You won’t like the answer: Odds Ratio Method!

          Vote -1 Vote +1

        • Haha, might I then need an odds ratio to adjust the odds ratio that’s adjusting the odds ratio?

          OK, so how might that work? Something like:
          (2002-2006 Odds)*(2007-2011 Odds) / (2002-2011 Odds) ?
          (then converted back into a probability)

          Vote -1 Vote +1

  3. Ecological Fallacy (not really, I guess) says:

    I wrote about this being an example of an ecological fallacy yesterday. I take that back. It is interesting and obviously took a lot of effort! Congrats!

    I was merely pointing out that the analysis may not tell us very much about Adam Dunn because it appears to be based on grouped data and seems to assume that outcome of a PA is independent of all others. I suppose my concern is that your model may be an overly simplistic representation of reality and thus may not account for some theoretically relevant variables. In particular, it seems as though you are modeling matchups using historical K% only as if those are the only variables that determine matchup outcomes.

    An alternative modeling strategy would be to control for unobserved differences between pitchers and hitters by including fixed or random effects. Of course, it could get awfully complicated since it would probably make sense to also account for game, team and year effects as well.

    Just my .02.

    Vote -1 Vote +1

    • Yeah, this is the simplest way I could have done it — it’s just a baseline. Some batters may have particular weakness to a curve, or a certain type of delivery, which throw them off of the baseline. But I don’t think we can really judge that until we have that baseline.

      You have to admit — this system does at least explain/predict a huge share of what’s going on, right?

      Vote -1 Vote +1

    • Oh, thanks, by the way!

      Vote -1 Vote +1

  4. J. Cross says:

    First off, I really enjoyed these articles. I think you’re right about using projected K%’s instead of historical K%’s. Off hand, my best guess as to why you’d get coefficients of .92 and .92 for the logistic regression instead of 1 and 1 is that the equation is actually serving to regress the observed odds ratios to the “true” odds ratios. Steamer doesn’t (yet) have projected splits but I think we should start working on them.

    Vote -1 Vote +1

    • Thanks! Yes, please do — those splits will be awesome for more applications than just this. I think the conclusions here could be much stronger when taking aging (and whatever else you do) into account. What else do you take into account for K%, if that’s not top-secret?

      Vote -1 Vote +1

  5. Colin Wyers says:

    Are you reporting MAE/RMSE on the same sample used to derive the F1/F2 formulas?

    Vote -1 Vote +1

    • Not on the 2013 data, but yes on the ’02-’12 (and also for the logistic). I know it’s not the best way to do it, but I did use 1.5 million data points and simple (rounded, even) formulas. I don’t think they’re picking up much random error; systematic error, maybe.

      Vote -1 Vote +1

  6. Cyril Morong says:

    I am way late to this party. I just saw it mentioned awhile ago on the weekly SABR notes email. So great work. This is lots of fun

    Vote -1 Vote +1

  7. Cyril Morong says:

    About a month ago Darvish faced the Astros. I posted the following on my local SABR chapter’s email list

    “He has struck out 40.4% of the batters he has faced. The league
    average is 20.2%. Astro batters have struck out 26.7% of their PAs.”

    Then I Predicted about 49% (which is what I think the Odds Ratio gives here). But he only struck out 25.8%

    Vote -1 Vote +1

    • Interesting. Given they faced Yu 27 times that game, my binomial distribution calculator above thinks there’d be at least a 3% chance of the observed K% being that far off of the mark (looks like 8 of 27 K’ed, so 29.6%?), even if all those numbers are perfect estimates of the “true” rates, and even if the formula is perfect.

      If you use Yu’s (say that 3 times fast) career K% of 29.2%, and the Astros’ 2012-2013 23.5% against RHP, then the formulas all guess around 30-34%.

      Vote -1 Vote +1

  8. Dan Rosenheck says:

    Have you tried this with other components, like walk rate or groundball rate? Also, how do these compare to the venerable old log5 formula?

    Vote -1 Vote +1

  9. Nivra says:

    I’m not sure I’d go with an odds ratio approach. If it were me, I’d explore using a skill comparison similar to ELO that predicts winning%. Then I’d use a modifier to both the pitcher and the batters’ skills based on handedness, home park, etc. Then do ELO or whatever skill comparison system you want to predict outcome %.

    Vote -1 Vote +1

    • Would that involve starting everybody out with an equal rating, and raising their rating based on the ratings of who they beat (and dropping based on who they lose to)? If so, sounds good but kind of tough (at least for me).

      Vote -1 Vote +1

  10. Kevin says:

    oh…. this is the project you were referencing in the reliability thread a few weeks back…. interesting.

    A couple of brief points: first, I’m not really sure why you are including the league wide strikeout rate here. Well, I understand why, but I think you’re wrong to do it, as the league-wide strikeout ratio really shouldn’t have anything to do with your prediction for an individual at bat, unless you are trying to bias yourself in the correct direction. Basically, including it is a version of the gamblers fallacy, in a way. But that’s a minor point.

    More importantly (and nice work again, by the way), I would like to persuade you, in the future (and present if possible), to use positive predictive power, hit rate and negative predictive power instead of mean averaged error in reporting your findings. I, and presumably other informed readers, would not find it trivial to know in what direction each formula made its errors. To be honest, information like that is usually how I determine the relative utility of predictive models. I also have a hunch that if you break down the distributions of the errors a bit more, you may find what is “wrong” with the odds ratio method.

    Vote -1 Vote +1

    • Heh, I can’t win on the league rates thing — I leave it out, and MGL and Tango criticize that…

      Anyway, Formulas 1 & 2 don’t directly include league K%, as do Odds Ratio and the Logistic function, but they indirectly include approximately 2002-2012’s league K% through their weighting. It’s important, I think, in light of the fact that the historical K% of each player (which I’m using as an approximation of their “true” rate) has a lot to do with who they faced in the past — and league K% is a proxy for that.

      That reminds me of: by the way.

      Positive and negative predictive power — how could I apply those here? In what I’m seeing, it looks like they’re used for yes or no predictions. I’m not making any yes or no predictions on strikeouts — only giving a probability.

      Not sure what you mean about the direction of the errors — I thought the slopes of the regression lines on the scatterplots, along with the average errors (as opposed to the mean absolute errors) were indicators of that.

      Vote -1 Vote +1

      • Kevin says:

        I would think this is more an example of the ecological fallacy than the false positive paradox. The false positive paradox would be relevant if you were trying to predict what the strikeout rate would be for a group of individuals. But you are not. You are trying to predict the outcome of a single PA, so including league averages is an example of the ecological fallacy, IMO at least. I think reasonable minds may differ on this point, but for me at least, this is a clear example of confusing the correlations of individuals and groups.

        Regarding PPP/NPP, it might not be something you can use here explicitly, but as you have described it thus far, your goal is to create essentially a probability matrix of outcomes for each at bat. Which means you will have a probable outcome (along with other less probable outcomes) for each PA. Although we man never have a K% above maybe 30% for a given at bat, there are likely to be situations in which a K is the most probable outcome. Also, since there are a finite number of outcomes to a PA, computing PPP/NPP will be possible (its not strictly binary). Regardless, the basic idea here is that by comparing the amount of error variance above and below the regression lines to the total amount of variance above and below the lines (which is what PPP/NPP does, in a way), I think you can essentially accomplish what you are trying to do with the league k% without explicitly adding it into your formula. Or something like that, at least. I wanted to see it in table form because… well… I like tables better than graphs. In any case, the reason I think this matters is that in this experiment, since strikeouts are the low probability event, I would give more weight to the formula that over-predicts strikeouts than the formula which under-predicts them.

        BTW, is this moving towards a measure of managerial effectiveness (manager WAR if you will)?

        Vote -1 Vote +1

  11. Well, I think the example Tango trapped me into earlier is a good point in favor of the need for a league average:
    If you say the league K% is 20%, and you’re talking about a batter who Ks 20% of the time, and a pitcher who Ks 20% of batters, you’d probably expect that pitcher to strike out that batter about 20% of the time, right? That’s what all the formulas say, more or less. How about a batter and pitcher who both have a K% of 80? With the league K% of 20, all the formulas say something in the 90% range will be the outcome. But if you adjust the league K% up to 80%, then odds ratio instead says the batter will K 80% of the time against that pitcher.

    The trick is, the chance of not striking out is 1 minus the chance of striking out; that means, if the formula tells you that a strikeout will happen 20% of the time in a certain circumstance, then it ought to also tell you that it won’t happen 80% of the time in that same circumstance. It seems to me that taking the overall average into account is the only way you can do that.

    I’m still not getting it regarding PPP/NPP — everything I’m seeing shows it dealing with True Positives, False Positives, etc. I don’t think, even if a K% is the likeliest outcome of a particular PA, that there could be enough confidence in that assessment to label it a “positive.”

    I noticed that the following form of the PPP formula is actually similar to the odds ratio method: (sensitivity x prevalence) / (sensitivity x prevalence + (1-sensitivity) x (1-prevalence)), as prevalence is analogous to league average, and both factor in the (1-…) of the terms. Both are based off of Bayes’ Theorem, I bet.

    Hm, not a bad idea regarding managerial effectiveness. Not easy, though — there are a lot of variables to consider, some of which we are not always privy to (e.g. health status of bench players or relievers).

    Vote -1 Vote +1

  12. Steve says:

    Have you taken a stab at other stats yet?

    Vote -1 Vote +1

    • Hi there, my fellow Steve. Thanks for asking. You know, I was working on batted ball stats, when I got distracted by my THT Annual project, which dealt with a more in-depth look at this sort of stuff (specifically, whether some hitters’ K% were legitimately better or worse than expected by the odds ratio against certain types of pitchers). There were a couple other big projects around then which haven’t been released yet, btw.

      Point is, it’s definitely on my to-do list, though I’ve been busy lately with other stuff.

      Vote -1 Vote +1

      • Steve says:

        Steve, I’m actually also looking at batted ball stats. My goal is to quantify why GB(FB) pitchers have an edge vs. GB(FB) hitters. So, I’m very interested in your work.

        I really enjoyed your work in THT Annual by the way.

        Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>