This ShH Just Got Real!

Should Hit, or ShH – pronounced like: “Shh! Be quiet or the Nazis will hear!”

Last week, while rifling through the lump of cold numbers that is the 2011 season, I stumbled upon a self-illuminating chest of gold coins: A reliable, fielding independent hitting formula. Today, we’re going to take it to the next level and get nerdy up in this beach.

Before we proceed, let’s do some of the research I did not care to do the first time around. Here are some of Should Hits‘s predecessors (though they did not directly influence the creation of ShH):

FIP for Hitters
In 2010, Matt Klaassen wrote “FIP for Hitters? Defense Independent Offense.” His work in the article — though it does stay truly defensive independent and does not bring BABIP into the conversation — probably mirrors my work the most. However, his works is different both in process (he excludes BABIP and does not use wRC+) and intention (my tool focuses on regression, not really true talent levels).

Four Factors
In the middle of last year, Jack Moore got freaky with walks, strikeouts, a special ISO, BABIP, and home runs. Moore’s work also resembles my own in many ways. The key differences are: (1) the greater complexity of formula, (2) the inclusion of POWH (an ISO on hits only), and (3) Moore’s impressive display of algebraic prowess.

Ultimately, though, Moore’s work is not park nor league adjusted (a benefit ShH accrues through using wRC+), and it also is not defensive independent.

Someone mentioned “ProOPS” in the comments in my previous article and I was like: “Say what?!”

Well, PrOPS, spelled with one O, is a tool Hardball Times had been using for many an age. It’s an OPS predictor that uses a player’s batted ball data (line drives, groundballs, etc.), walk rates, strikeout rates, homer rates, and home ballparks to spit out the OPS his hitting profile suggests he should have.

Outside of being regression-based hitting-prediction tools, though, PrOPS and ShH have little in common. Where ShH is almost suspiciously simple, PrOPS is all-encompassing — and oddly less reliable (apparently a .81 R-squared, go figure).

Still each of these three metrics provide interesting insight and different angles of the same pursuit: understanding hitters.

Well, ShH just took it to the next level today.

Last week, many commenters — and myself — began to suspect ShH was systematically undervaluing speedy hitters, who hit more doubles and fewer homers. Because ShH has no doubles component or triples component (other than the unfairly weighted BABIP, which says both events equal 1), the Jose Reyes-types appeared to be getting a short shrift from ShH.

So, one of the suggestions was to include steals into the regression because steals aught to act as an acceptable (though not perfect) proxy for doubles and triples. Well, first, I wanted to see if this relationship really existed. It didn’t:

The hitters from 2009 through 2011 showed a weak — maybe even nonextant — relationship betwixt double/triples and steals.

Still, when we look at the top and bottom tiers of speed, we see there’s a problem. Looking at the best 37 thieves (i.e. those with 40 or more steals) from 2009 through 2011, we find that ShH undervalues them by an average of 2 points. Conversely, the 38 slowest lumberers (9 steals or less) were overvalued by 2 points.

As a whole, the two groups were still crazy close to their Should Hit predictions:

The “sluggers” (I put sluggers in quotes because this group included the likes of Yuniesky Betancourt) performed a little better, but not by much:

So, yeah, maybe ShH undervalues speed hitters? FIP — ShH’s mother — does the same thing with groundball pitchers and other specific types (thus: we need SIERA).

HOWEVER: Should Hit’s value is not in telling us a player’s present wRC+ — that’s what wRC+ does on its own. Should Hit allows us to adjust for a players BABIP — given his present peripherals. We are comparing the player internally, not externally.

In other words, we would not want to use Jose Reyes’s ShH to compare him to another player, but to compare him to himself with a different BABIP (or walk rate, strikeout rate, etc.).

To clarify this process, I created ShHAP!, a new Google Doc that should make everything clearer:

NOTE: You’ll need to save the Doc to your own Google Docs collection or download it as an Excel spreadsheet. Unfortunately, I can’t let just anybody edit it, or t’will likely be abused.

Using the same ShH formula as before, I have concocted the consummate Google Doc. Instead of having the Dying Ball Era-skewed wRC+ as before, this present rendition offers an internally adjusted formula:

Simply, ShHAP! takes the difference between a player’s career and current-BABIP-based ShH and applies that difference to the player’s present wRC+. Not simply, it looks like this:

Let’s look at Babe Ruth‘s 1923 season — a great season by any standards — wherein Ruth hit 226 wRC+ with a hefty .423 BABIP. Using his career BABIP, ShHAP! says he would have hit closer to his career norm — actually, worse.

Altogether, the ShHAP! formula narrowly follows Ruth’s career, indicating the changes in his performance mostly came from varying walk, strikeout, and homer rates — not from BABIP:

If we look at Lou Brock‘s career — the career of the prototypical high-speed, less homers guy — we see a similar trend:

Here we see ShHAP! predict Brock’s late-career resurgence as a low BABIP plagued his 1977 and ’78 seasons. Overall, ShHAP! keeps very close to Brock’s wRC+ — a good sign with regards to the speedy guy problem, especially seeing as how Brock had almost as many triples as he did homers.

So: Go! Play with ShHAP! and let me know what you think! Does it appear to still struggle with quick guys?

Print This Post

Bradley writes for FanGraphs and The Hardball Times. Follow him on Twitter @BradleyWoodrum.

44 Responses to “This ShH Just Got Real!”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Crpls says:

    Can’t mess with the predictor ’cause it’s read/view only.

    Vote -1 Vote +1

  2. This is really cool. I wonder if one could make it (or a similar formula) work for a predictor for how minor leaguers could adjust to the majors when their BABIPs regress. Minor league player lines are frequently skewed by BABIPs that exceed .340, and I’ve always felt that is the biggest problem in looking at minor leaguers stats. Now, I’m sure it would be difficult to do, as changes in difficulty of competition would also effect peripherals, but if such a thing could work, it could be a very valuable tool.

    Vote -1 Vote +1

  3. Justin says:

    You should simplify that formula since most of those terms drop out.

    Vote -1 Vote +1

    • Oh. My. Holy crap.

      It’s just this:

      ShHAP! = (present wRC+) – (465*BABIP – 465*xBABIP)

      Vote -1 Vote +1

      • Seriously. Like, this is huge. This has to mean something, but my brain can’t comprehend it.

        Vote -1 Vote +1

      • However, I don’t want to change the equation because we can use ShHAP! to examine what effect a change in walks, strikeouts, or homers might cause.

        Vote -1 Vote +1

      • You might call it the Aaron Rowand finder.

        Vote -1 Vote +1

      • Matt P says:

        If for hitters, HR%, BB% and K% are due primarily to skill, while BABIP is primarily due to luck, then the only variable you need to look at to determine whether a player is lucky is BABIP. That’s why all the rest of the variables cancel out, because you’re trying to come up with ways to determine how lucky a batter has been in a specific year.

        Now, if you were to use a players CAREER HR% rate, BB% rate, K% rate as well as xBABIP then it would have a connection with ShH and would allow you to examine the effect a change in walks, strikeouts or homers might cause.

        In other words, make ShHAP = (wRC+) – (Current ShH – Career ShH).

        But in order for that to make sense, you’d have to argue that either all four factors are due to luck or skill.

        Vote -1 Vote +1

    • mattinm says:

      Beat me to it. All ShHAP! is is BABIP vs xBABIP adjusted wRC+. Unless I’m misreading something somewhere.

      Vote -1 Vote +1

  4. benjipants says:

    Ok, so here’s the test for me: Michael Morse. I get an expected wRC+ of 143 compared to his 152 wRC+
    Dude really doesn’t walk much, but he doesn’t strike out a ton, and he hits home runs. Assuming his career BABIP of .358 is remotely sustainable, he’s a damn fine player, but if it regresses to .300, he’s something like a 116 wRC+ player.

    Here’s the question though–with such an extreme outlier like Morse in terms of BABIP, career arc, etc. do we really gain much predictive power here? He’s not exactly a TTO guy, but he’s not a speedy hitter’s hitter.

    Vote -1 Vote +1

  5. Crpls says:

    My favorite thing so far is Bonds’ 2001 wRC actually goes up to 245. Is there a way to see career BABIP through a particular season? Just curious what it’d be with his career BABIP through 2001 rather than the latter end of his career.

    Vote -1 Vote +1

    • You could use the leaderboards, looking at — say — 1986 through 2001 with a min PAs of something like 8000 so that only Bonds and a few other come up.

      Vote -1 Vote +1

      • Since the original xBABIP regression model attributed about ~35 % of the variance of hitter performance to the variables that were regressed, are you indicating you’ve attributed the rest of the variance or pulled the key variables from the xBABIP model and reformulated them by something akin to renormalization?

        Vote -1 Vote +1

      • Toffer Peak says:

        Even better just go to Baseball Reference and let them add it up. This is nice since it works for all players.

        Scroll down and you’ll see he had a .284 BABIP from 1986-2000.

        Vote -1 Vote +1

      • @channelclemente: Boy, I’m having a doozy of a time trying to understand your question.

        Vote -1 Vote +1

      • Bradley, what I originally read about xBABIP suggested that when used as a measure of hitter performance, it explained about 35 % of hitter performance. The 8 or so variables used in the regression would have to be scaled, normalzed, properly for that to be meaningful, I think. Your approach/model seems to explain about 90+ % of the variance in modeling the hitter performance. What I was perhaps inartfully asking, was, did you find additional factors or by simply (or maybe not so simply) adjust the scales of factors where regression yielded a linear relationship, renormalization? I think the original regression contained BABIP, hitter eye (BB/SO),pitchesperextrabasehitt, line drive rate, etc. To regress these effectively, I would suspect you’d have to scale them at least, or render them dimensionless.

        Vote -1 Vote +1

      • JDB says:

        The article and the comments are very interesting and entertaining.

        I too cannot follow channelclemente. I don’t think you are being “inartful” – just over our heads.

        Vote -1 Vote +1

      • @channelclemente: My regression was simply a linear regression of wRC+ on BB%, K%, HR%, and BABIP. I go into more detail in the original post, here:

        I never used any xBABIPs in the process, just a player’s current BABIP. As I mention in the first post, there are several credible xBABIP calculators out there, and each uses a different process. For ShH, I prefer to use career BABIPs (unless they’re a rookie or second year player) and a little common sense (such as older players tend to have decreased BABIPs).

        In my regression, I don’t use pitchesperextrabasehitt or line drive rates or any truly fancy stuff.

        Hope this answers your question! If not, consider this as me saying, “Sorry, I might be too dumb to help you.”

        Vote -1 Vote +1

  6. Oscar says:

    Okay, can you explain to me how this is any different than just normalizing a player’s batting average by using xBABIP or career BABIP instead of current BABIP and then recalculating their OPS (or wRC+ or whatever)? I can’t see a difference.

    Vote -1 Vote +1

    • Well, the method of normalization is one of the key differences. Also, no one’s using batting average (yuck!), but yeah, ShHAP! is entirely a regression tool. You can use it to regress a hitter based on a different K%, BB%, HR%, or — most likely — BABIP.

      Vote -1 Vote +1

      • Oscar says:

        But the only inherently “defense-independent” aspect is taking the framework you’ve set up and using a regressed BABIP, right?

        I’m not trying to be snarky. I really just don’t understand. Maybe I’m missing something. But is anything happening here (e.g. the Lou Brock chart) besides calculating what a player’s wRC+ would have been if their BABIP didn’t fluctuate at all?

        Vote -1 Vote +1

      • Yeah, I think so. And yes, the Lou Brock ’78 example shows how he would have performed with a normal BABIP in that season. The graph, however, shows how most of the time, Brock had a reasonable BABIP and performed appropriately given his other inputs (walks, Ks, and homers).

        And no worries, I could see you weren’t being a snark.

        Vote -1 Vote +1

  7. MauerPower says:

    Too much math x_x

    Vote -1 Vote +1

  8. Matt says:

    Yes, but the real question is, are strike-outs bad?

    Vote -1 Vote +1

  9. mbt online says:

    to regress a hitter based on a different K%, BB%, HR%, or — most likely — BABIP.

    Vote -1 Vote +1

  10. ezb230 says:

    i asked this question of neyer/james in a chat years ago, and got shot down. yet to this day, i don’t get why FIP works for pitchers and not for hitters. if pitchers can’t control what happens after the ball leaves their hands, why do we assign credit/blame to hitters for what happens after the ball leaves their bats? either the pitchers get too little credit/blame or hitters get too much. anything else seems like a contradiction, imo.

    Vote -1 Vote +1

    • Cliff says:

      What does that even mean? Obviously there is no physical way for a pitcher to control the ball after it is out of his hand or for a hitter to control a ball after it is off his bat. But clearly what the pitcher does with the ball in his hand does effect where the ball ends up, and same with the hitter and his bat.

      Vote -1 Vote +1

      • Will says:

        “Obviously there is no physical way…for a hitter to control a ball after it is off his bat.”

        That’s absolutely false. I once saw Carlton Fisk wave a home run fair.

        +6 Vote -1 Vote +1

    • Matt P says:

      Pitchers do control what happens after the ball leaves their hands. That’s why monitoring strikeouts, walks and home runs makes sense. What they don’t control is what happens when a ball is put into play. Which is why penalizing a pitcher for having a high BABIP doesn’t make sense.

      Pitchers usually pitch behind a set number of defensive players. Unless they’re traded, then their defense is going to be roughly the same for the whole season. Hitters hit against many different groups of fielders. So, it makes sense to say that fielding has a significant impact on how a pitcher performs but not on how a batter performs.

      Vote -1 Vote +1

  11. holdenpi says:

    another small simplification is ShHAP= wRC+ – 465*(BaBIP-xBaBIP)

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *