Formulating a Hitter xK% Metric

Since I develop my own player forecasts, I am always looking for better ways to use the advanced metrics to project the more basic ones. You may remember the quest Chad Young and I engaged in earlier this year to predict HR/FB ratio using batted ball distance. That didn’t go as far as I had hoped, but it did highlight the value of the batted ball distance data. When I started doing the research for my Contact% articles last week, I figured there might be a combination of plate discipline metrics, including Contact%, that does a good job of estimating a hitter’s strikeout percentage. I was right. Of course, this is nothing earth shattering, as Jeff Zimmerman reminded me that he looked at that very same concept late last season, which clearly had already escaped my brain. Since I had the data and regressions done, I decided to do a version 2.0 of formulating a hitter’s expected K%.

Initially, I used a smaller data set than Jeff. But if I was going to duplicate his work, it would have been much more useful if I used the same data set so I can compare my results to his. So I did, collecting all hitters with at least 200 plate appearances from 2002 to 2012, which gave me a total of 3,796 player seasons. My initial thought was that a hitter’s strikeout rate would depend largely on his Contact% and the rate of pitches he saw inside the strike zone, indicated by Zone%. I then tested a host of different combinations that made logical sense to me (if anyone knows how to tell Excel to automatically test every single regression combination from a series of variables, PLEASE share!) until I found the best one, as follows:

xK% = 1.095 – (Z-Swing% * 0.250) – (Contact% * 0.888) – (Zone% * 0.076)

xK

I wasn’t sure whether it was even worth posting a version 2.0 given that Jeff’s R-squared was at 0.79, so this was only a slight improvement. However, it has one less variable, plus, it includes Zone%. However, my initial assumption was that Zone% would have a positive correlation with K%. A hitter who sees more strikes will strike out more often seems like an obvious concept. However, this equation, and the others that I generated, all had Zone% with a negative value. I wonder if that is because a hitter who sees fewer pitches in the zone is liable to chase more pitches out of the zone, which are harder to make contact with. That’s the only explanation I can think of, but it does seem to make sense.

On the other hand, the negative value for Z-Swing% makes complete sense. If you’re getting thrown strikes and not swinging at them, they will be called strikes, and the hitter is more likely to get called out looking. The strong negative correlation with Contact% is self explanatory. I tried breaking Contact% up into Z-Contact% and O-Contact% instead of the umbrella term, but it led to a worse R-squared (and more terms! gasp!).

There was also an interesting comment on Jeff’s article by slash12, known around here for his work on an xBABIP formula. He noted that he looked at an equation for estimating a hitter’s strikeout rate in the past, but did not find it to be a good predictor and players who beat the model seemed to consistently do so. Unfortunately, this is going to happen with any estimator metric we derive with outliers always screwing up our dreams of the metric working for every player. There is always going to be other factors involved that either don’t show up in our specific stats, all stats, including those not available here, or they do show up, but we just haven’t figured out exactly where to look.

Given that plate discipline metrics stabilize sooner than results-based statistics like strikeout rate, it is still worth using an xK% formula this early in the season. As usual, I will follow up on this with the names of hitters striking out more and less often than the xK% formula estimates.




Print This Post

Mike Podhorzer produces player projections using his own forecasting system and is the author of the eBook Projecting X: How to Forecast Baseball Player Performance, which teaches you how to project players yourself. His projections helped him win the inaugural 2013 Tout Wars mixed draft league. He also sells beautiful photos through his online gallery, Pod's Pics. Follow Mike on Twitter @MikePodhorzer and contact him via email.

26 Responses to “Formulating a Hitter xK% Metric”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Ruki Motomiya says:

    For some reason, I read hitter as Hitler.

    Never read FanGraph article titles at 5:32 AM, kids.

    Vote -1 Vote +1

    • Dan Greer says:

      Now if it were a Notgraphs article penned by Dayn Perry, you’d have more than likely read it correctly.

      Vote -1 Vote +1

  2. Skin Blues says:

    I’ve found pretty good success with using a simple multiplier for each player based on weighted data from the 3 prior seasons. It’s basically just saying “we know that this guy is usually x% better or worse than his K%/SIERA/etc but we don’t know why, so we’ll just use this multiplier”. It’s not very scientific I don’t think. But it works, and helps negate the fact that the reason each player is consistently better or worse than certain metrics is different for each player. Of course the biggest issue with doing something like this is that you need to have a couple seasons worth of data. I suppose you could do a PECOTA type thing where you just use multipliers for players that historically have been similar to the rookie in question.

    BY the way, I’m also very interested in finding out an answer to your request of “how to tell Excel to automatically test every single regression combination from a series of variables”. That would save a lot of time, and I’ve often wondered about it.

    Vote -1 Vote +1

  3. jfree says:

    Wondering whether First Strike % lends extra value to the prediction. Yes it is an additional variable but it is the one variable that best captures whether the pitcher is able to use what are often the better pitches in his repertoire – esp forcing the batter to potentially chase bad pitches out of the zone.

    Vote -1 Vote +1

  4. MightyJimcat says:

    I’m thinking that maybe what the data is showing with Zone% is correlation and not causation. Hitters who see fewer pitches in the zone are hitters that pitchers are afraid to throw to. They’re generally power hitters, and power hitters tend to strike out more.

    Vote -1 Vote +1

  5. labe says:

    Awesome. I would love to see an xBB% metric for hitters as well!

    Vote -1 Vote +1

  6. dcs says:

    Nice article. How about doing the same with xBB% ?

    Vote -1 Vote +1

  7. shapular says:

    Why not use R?

    Vote -1 Vote +1

  8. deezy333 says:

    Mike,

    No Roto Riteup today, so im going to burden you with my unrelated question today.

    I was just offered Castro for Bumgarner. I wasn’t high on Castro going into the season and seems to be underperforming across the board. Power is down, walks are down, stolen bases down, and strike outs are up. However he also has Segura, which I am a big fan. Even if the power surge doesn’t continue, the rest of his game looks solid.

    Right now I have Rutledge at short. My pithing staff consists of Bumgarner, Zimmerman, Harvey, Peavy, Lester, Miller, Buchholz, Santiago.

    I just counter offered with Segura for Buchholz and am waiting to hear back. If he is adamant about keeping Segura, Im thinking Lester or Peavy is the most I would offer for Castro.

    What should I do? Thanks, Pod!

    Vote -1 Vote +1

    • Wow, that’s one hell of a staff. You are right in trying to shop some of them and I think Buchholz is a good guy to try selling high. I can’t imagine what it would take to get Segura, but I like your offer, though I’m not so sure Rutledge should be riding the bench. Castro is still a 10-20 guy with a good batting average. As usual, start off low, and either Lester or Peavy are both good starting points, then see what he says. Actually, I’d prob just start with Buchholz if he doesn’t give you Segura.

      Vote -1 Vote +1

      • deezy333 says:

        Thanks for getting back with me. I know you guys are always crazy busy.

        I forgot to preface that this a 10 team league, bit either way my pitching is dominating.

        Are you concerned with Castro at all and do you think Segura is the real deal? Im a big Rutledge fan and im worried Segura/Castro wouldn’t not be a huge upgrade. However, when you have the chance to trade pitching for hitting at your weakest position, you have to consider it.

        Im ok with Rutledge on the bench as he has both 2B and SS eligibility. Currently have Altuve at 2B and Hill on DL.

        If you don’t mind, could you rank my pitching staff in order of guys you would trade so I know what direction to move in counter trades.

        Thanks so much and I love your work like crack (I don’t actually love crack).

        Vote -1 Vote +1

      • Starlin is a tough one. In his rookie season, he only stole 10 bases and was successful at a poor rate. So we just can’t be sure he’ll rebound to the 20 steal plateau like the last two years given his poor success rates. Also, was last year’s small power surge real? He could easily fall back to being a 10 HR. Again though, that’s basically a min of 10 HRs and 15 steals with solid contributions everywhere else. But, suddenly that means he’s really not much better than Rutledge.

        No idea where this Segura power is coming from. His batted ball distance is 300 feet though, which actually validates the power spike. The BABIP and avg will obviously decline, and he’s not going to hit over 20 HRs I’m sure. But maybe 15-20 HRs with 30+ steals, so an upgrade over Rutledge.

        Sell High rank, guessing also at perceived value:

        Buchholz
        Zimmerman
        Lester
        Miller
        Peavy
        Santiago

        I guess if you could trade Harvey like he’s a top 5 pitcher, obviously consider it. But I think he has a decent chance of being top 10 rest of the way. No point in trading Bumgarner, he’s doing what’s expected.

        Vote -1 Vote +1

      • deezy333 says:

        Awesome. Thank you so much for the in depth reply.

        The only ranking I was having trouble with was Lester/Peavy. I think Peavy is the better pitcher but also more fragile.

        Would you trade either/both for Segura?

        Vote -1 Vote +1

      • deezy333 says:

        * obviously not both for Segura, just meant would you trade either or just one.

        Vote -1 Vote +1

      • Prob agree on Peavy vs Lester. Lester’s peripherals aren’t much different than last year, but his SwStk% is down to a career low. Given your pitching depth, trading one of them is a good idea. But again, if this pushes Rutledge to the bench, I just don’t know how big an upgrade Segura will be. He may very well be an upgrade, but seems like maybe instead you pair a pitcher and a hitter to upgrade that hitter at that position.

        Vote -1 Vote +1

  9. Konoldo says:

    Have you considered looking at interactions in your models? I know they can be complicated and hard to explain but might be worth checking out.

    I don’t much about using excel as statistical program for analysis, but it wouldn’t be too difficult to export the data from excel to “R” (a free statistical program) and program that to test all combinations of variables or use some kind of stepwise variable selection algorithm.

    Vote -1 Vote +1

    • Way above my head! I’m really a novice at this stuff so don’t even know what you mean by looking at interactions. And that’s funny, a commenter above suggested I use R, and I thought he meant R for correlation, rather than R-squared, HA!

      Vote -1 Vote +1

      • Bill says:

        I’ve heard great things about R, but I am stuck in the Excel world myself. However, I have spent many a year nerding out in the program, and could probably guide you through some simulation methods so you can do some testing without taking classes. Let me know if you want me to take a stab at something. I also play in a league with Ben Pasinkoff if you need a reference.

        Vote -1 Vote +1

  10. gnomez says:

    As usual, great work! This is just my own preference, but I wish these sorts of studies were in the main Fangraphs section instead of Rotographs. As a non-fantasy player, I tend to only read FG and NG, which means I have a decent chance of overlooking these types of cross-over posts amid the fantasy rankings and “waiver wire” posts.

    Vote -1 Vote +1

  11. slash12 says:

    I’ve recently been yearning for this equation again, not so much for year to year strikeout surgers, but to build an xK% for small sample sizes where K% hasn’t stabilized yet. I find myself comparing one guy’s swinging strike % to another’s wondering where their K% might end up based on it.

    I want to convince fangraphs brass to let us put in our own custom equations for our player pages/leaderboards. Help me out mr. podhorzer! I know you want this too.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *