Last week, while rifling through the lump of cold numbers that is the 2011 season, I stumbled upon a self-illuminating chest of gold coins: A reliable, fielding independent hitting formula. Today, we’re going to take it to the next level and get nerdy up in this beach.
Before we proceed, let’s do some of the research I did not care to do the first time around. Here are some of Should Hits‘s predecessors (though they did not directly influence the creation of ShH):
FIP for Hitters
In 2010, Matt Klaassen wrote “FIP for Hitters? Defense Independent Offense.” His work in the article — though it does stay truly defensive independent and does not bring BABIP into the conversation — probably mirrors my work the most. However, his works is different both in process (he excludes BABIP and does not use wRC+) and intention (my tool focuses on regression, not really true talent levels).
In the middle of last year, Jack Moore got freaky with walks, strikeouts, a special ISO, BABIP, and home runs. Moore’s work also resembles my own in many ways. The key differences are: (1) the greater complexity of formula, (2) the inclusion of POWH (an ISO on hits only), and (3) Moore’s impressive display of algebraic prowess.
Ultimately, though, Moore’s work is not park nor league adjusted (a benefit ShH accrues through using wRC+), and it also is not defensive independent.
Someone mentioned “ProOPS” in the comments in my previous article and I was like: “Say what?!”
Well, PrOPS, spelled with one O, is a tool Hardball Times had been using for many an age. It’s an OPS predictor that uses a player’s batted ball data (line drives, groundballs, etc.), walk rates, strikeout rates, homer rates, and home ballparks to spit out the OPS his hitting profile suggests he should have.
Outside of being regression-based hitting-prediction tools, though, PrOPS and ShH have little in common. Where ShH is almost suspiciously simple, PrOPS is all-encompassing — and oddly less reliable (apparently a .81 R-squared, go figure).
Still each of these three metrics provide interesting insight and different angles of the same pursuit: understanding hitters.
Well, ShH just took it to the next level today.
Last week, many commenters — and myself — began to suspect ShH was systematically undervaluing speedy hitters, who hit more doubles and fewer homers. Because ShH has no doubles component or triples component (other than the unfairly weighted BABIP, which says both events equal 1), the Jose Reyes-types appeared to be getting a short shrift from ShH.
So, one of the suggestions was to include steals into the regression because steals aught to act as an acceptable (though not perfect) proxy for doubles and triples. Well, first, I wanted to see if this relationship really existed. It didn’t:
The hitters from 2009 through 2011 showed a weak — maybe even nonextant — relationship betwixt double/triples and steals.
Still, when we look at the top and bottom tiers of speed, we see there’s a problem. Looking at the best 37 thieves (i.e. those with 40 or more steals) from 2009 through 2011, we find that ShH undervalues them by an average of 2 points. Conversely, the 38 slowest lumberers (9 steals or less) were overvalued by 2 points.
As a whole, the two groups were still crazy close to their Should Hit predictions:
The “sluggers” (I put sluggers in quotes because this group included the likes of Yuniesky Betancourt) performed a little better, but not by much:
HOWEVER: Should Hit’s value is not in telling us a player’s present wRC+ — that’s what wRC+ does on its own. Should Hit allows us to adjust for a players BABIP — given his present peripherals. We are comparing the player internally, not externally.
In other words, we would not want to use Jose Reyes’s ShH to compare him to another player, but to compare him to himself with a different BABIP (or walk rate, strikeout rate, etc.).
To clarify this process, I created ShHAP!, a new Google Doc that should make everything clearer:
NOTE: You’ll need to save the Doc to your own Google Docs collection or download it as an Excel spreadsheet. Unfortunately, I can’t let just anybody edit it, or t’will likely be abused.
Using the same ShH formula as before, I have concocted the consummate Google Doc. Instead of having the Dying Ball Era-skewed wRC+ as before, this present rendition offers an internally adjusted formula:
Simply, ShHAP! takes the difference between a player’s career and current-BABIP-based ShH and applies that difference to the player’s present wRC+. Not simply, it looks like this:
Let’s look at Babe Ruth‘s 1923 season — a great season by any standards — wherein Ruth hit 226 wRC+ with a hefty .423 BABIP. Using his career BABIP, ShHAP! says he would have hit closer to his career norm — actually, worse.
Altogether, the ShHAP! formula narrowly follows Ruth’s career, indicating the changes in his performance mostly came from varying walk, strikeout, and homer rates — not from BABIP:
If we look at Lou Brock‘s career — the career of the prototypical high-speed, less homers guy — we see a similar trend:
Here we see ShHAP! predict Brock’s late-career resurgence as a low BABIP plagued his 1977 and ’78 seasons. Overall, ShHAP! keeps very close to Brock’s wRC+ — a good sign with regards to the speedy guy problem, especially seeing as how Brock had almost as many triples as he did homers.
So: Go! Play with ShHAP! and let me know what you think! Does it appear to still struggle with quick guys?