Estimating Hitter Platoon Skill
I don’t think I’m all that different from most fans who glance at stats — when I see them, I automatically tend to view them as a player’s real talent. But one thing I’ve taken away from my reading of baseball analysts far more intelligent than I (granted, that’s not a very high standard), is that there’s an important distinction to be made between observed performance and true talent. Past performance should certainly inform how we estimate future performance. But it isn’t enough on its own. One of the most important tools for estimating true talent relative to observed performance and its sample size is regression to the mean. A good place to start reading with reference to the current discussion is The Book.
One bad habit many of us might get into it looking at the platoon splits of two players at the same position, one with a career wOBA of .390 vs. RHP, the other with a career wOBA of .400 vs. LHP, and thinking, “Wow, that platoon would be almost as good as Ryan Braun!.” It isn’t that simple. As in most other things, regression shows us that the distance from average is closer than it appears. Technical explanations aside, I’ll simply summarize what is relevant for estimating platoon skills.
How much we regress depends on the variation of skill in the relevant population. The less variation there is, the more likely deviations from the mean are random occurrences. Practically speaking, left-handed hitters display more variation in platoon skill than right-handed hitters, so in estimating the platoon skills of left-handed hitter, we use less regression.* According to The Book, we regress lefties’ platoon skills against 1000 PA against LHP of league average splits for left-handed hitters, and righties against 2200 PA against LHP. This means that when hitters have less than 1000/2200 PAs vs LHP, we estimate their platoon skill to be closer to league average than to their observed platoon performance. In practical terms, it also means that for righties, we’re usually safe in assuming they have near-average platoon skills.
* Switch-hitters display the most platoon skill variation as a population, but that is a can of worms for another day. The Book says that after 600 career PA against LHP, one has a pretty good idea of a switch-hitter’s platoon skill.
Some concrete examples might help. For my league average, I’ve taken MLB-wide splits from 2007 to 2009 from Baseball Reference and converted them to wOBA. This is just going to be a very basic demonstration, as, e.g. I wasn’t able to exclude pitchers from the splits, or remove switch-hitters, or leave out steals, weighted, and so on, but I think it will give the general idea. From 2007 to 2009, the average wOBA split for left-handed hitters was about 8.6%, and for right-handed hitter, about 6.1% (following The Book [I think], I use a percentage split to avoid potential logical absurdities and to reflect the reality that better hitters usually have larger splits.
We’ll begin with everyone’s favorite example of a “big splits” guy: Curtis Granderson. For his career, Granderson is a .358 wOBA hitter. However, while he has hit a robust .380 vs. RHP, in 685 versus LHP, he’s been 2009 Yuniesky Betancourt with a .270 wOBA. That’s a whopping 110 points of wOBA difference, about 30.7% in observed performance.
But remember — skill is closer to average than it appears. Regressing Granderson’s 685 PA of 30.7% against 1000 PA of league average (8.6%) — (.307*685+.086*1000)/(685+1000) — we get an estimated platoon skill of 17.6%. “Centering” the split is a bit of a challenge, but I weighted it by the number of PAs the player has against LHP in his career (for Granderson, about 23.7%). For Granderson’s split, then, I have +4.2% vs. RHP, and -13.4% vs. LHP. Applying this to his 2010 CHONE projection of .359 wOBA, we’d forecast his 2010 wOBA against RHP as .374, and against LHP as .311. .311 is below average, but it’s far better than .270, and given Granderson’s skill in the field, you’d be hard-pressed to find a right-handed platoon partner that would offer an overall advantage to just playing Granderson. You’d also need a pretty good right-handed bench bat in order to overcome the “pinch-hitting penalty” when hitting for Granderson.
For a right-handed example, let’s use Ryan Garko, recently acquired by the Mariners as a platoon 1B/DH. Garko’s career wOBA is .347, .332 vs. RHP in 1229 PA, and .382 vs. LHP in 485 PA — a 14.4% difference. But he’s a righty, so we regress toward 2200 PA of the average (6.1%): (.144*485+.0611*2200)/(485+2200) for an estimated platoon skill of 7.6%. Using the CHONE projection of .345 wOBA, we’d estimate Garko to be a .338 hitter versus RHP, and .364 versus LHP. That’s a good hitter versus lefties, and while the .338 isn’t great for a 1B/DH, it isn’t as if he’s helpless against RHP.
Before I call it a post, I thought it would be interesting to quickly estimate the platoon skills of two players who have “reverse” splits for their careers.
Right-handed hitting Matt Holliday has a career wOBA of .400, but has hit .402 vs. RHP (2793 PA) and and .377 vs. LHP (845 PA), a -6.3% split (negative indicating “reverse”). After regression, we get a 2.7% estimated platoon skill. Given CHONE’s .389 wOBA forecast for Holliday, we’d estimate his skill as .387 wOBA vs RHP, and .397 vs. LHP. Not quite a “reverse,” but you don’t really want to “burn” a ROOGY against Holliday, either.
Colorado’s Ian Stewart has a career .337 wOBA, .334 vs RHP (655 PA) and .346 vs LHP, a -3.6% split. After regression, it comes to a 6.7% split. Given CHONE’s .358 wOBA forecast, we’d expect Stewart to his around .363 vs. RHP and .339 vs. LHP, a nice split for a lefty, but not a reverse one.
Like all forecasts, these are estimations (and crude ones, at that). To be more thorough, we’d have to assign confidence intervals/reliability scores. We’ simply trying to minimize our error. But keep in mind that splits in the retrospective mirror are almost always smaller than they appear.
[Note: After completing this post, I realized that Tom Tango had already posted about this on his blog, using Granderson as an example. D'oh. Fortunately, my results are almost exactly the same]
What about pitcher platoon splits? I have a few questions that I’ve been unable to answer myself, so perhaps you can help.
1. Do pitchers exhibit splits in a similar manner (similar effect, similar distribution of effect, etc.)
2. When a pitcher with a strong split faces a hitter with a split, is the effect combined?
The only reason I could see the pitcher not exhibiting splits would be if the true effect actually resided in the hitter, i.e., the pitcher is handedness-neutral, while the hitter’s altered ability creates the delta in results. This almost makes the most sense to me, personally, what with we know about things like pitcher BABIP, etc.
Other question: in your Granderson example, isn’t it possible that he exhibits a huge platoon split as HIS true talent, whereas your analysis is only in the aggregate? After all, if we were talking about regressing to the mean in the HR category, we wouldn’t pick out Albert Pujols to suggest anything, would we? I understand the sample size concerns, but it seems to me that while the aggregate split might be minimal, a guy like Granderson (or Ryan Howard is another one who comes to mind, although I don’t have the data handy) is a bad example to illustrate this, as they both look to be rarities in the platoon-split component.
Travis —
1. According to The Book, not only do pitchers exhibit splits, but they can be measured much more reliabily — at around 700 PA(BFP) vs. LHP for RHP, and 450 vs. LHP for LHP. In The Book, they use wOBA generally — I’m not sure if the same numbers could be used for FIP or RA or whatever, but the general principle is the same.
2. I would assume that the effect would be combined. I’m not sure exactly how it would go, but my guess is that one would figur eit using something like the Odds Ratio Method. I’m not an expert on that (or anything else), but see
http://www.insidethebook.com/ee/index.php/site/comments/the_odds_ratio_method/
As for the Granderson/Howard example — yes, it is POSSIBLE that Granderson, Howard, or any hitter, or any skill for any player that deviates greatly from the mean. But the question, as in any of these cases, is — how to you know (without hindsight) which player is the exception to the general rule? In the regression above, we still have Granderson with a larger than usual split. That accounts for the “rarity” as best we know how. Is it your “gut” telling you this when you lok at the numbers?
I don’t any of that in a sarcastic manner, but that’s the whole point of looking at data as a whole, in group context — I’d need to know why we’d want to ignore the usual methods in these particular cases.
Again, note that this is an “estimation” of the skill — it’s making the best estimation based on the same methods we use for all players (and again, this does _not_ ignore Granderson’s numbers — that’s why we estimate a 17.6% split for him instead of an average 8.6). By applying the same methods to all players, we try to minimize our error as a whole. But I’d like to know how people would suggest how to pick out the exceptions _ahead of time_ who aren’t going to regress, then have them do that for a large number of players, and have reasons for why they picked the players they did.
How about using a learning model? In a sense you are doing it: you are putting lesser weight on the average (8.6) with every passing year (1000/1685 as of now), and more weight on the player’s own numbers (685/1685 as of now), but this seems to be somewhat arbitrary unless there is good theoretical foundation behind it.
Why not using something like a Bayesian learning? I.e., use the prior distribution of 8.6 percent (with the adjoining standard deviation) to begin with, but after each period, update the distribution for the player using Bayes rule to obtain the posterior distribution, and keep doing after every year.
It would be hard to pick out those outliers, but I think that is where scouting can come into play. Or, at the very least, a scouts recommendations can give you more things to consider and another perspective beyond looking at a player’s splits from a pure statistical viewpoint. I’m not entirely sure how they would do this, but I’m also not a scout. A stats guy is supposed to provide the research and stats, like you correctly did in this article. And then the scouts are supposed to give the GM their recommendations. Maybe Granderson has some mechanical flaw against lefties that prevents him from hitting them well and maybe his true talent level against them will end up being below a .300 wOBA? From a statistical perspective, it is too early to tell. But, scouting and statistics/research should be, imo, split about 50/50 when making roster decisions, so the statistical point of view only gives one side of the argument.
Scottwood:
I completely agree that scouts (and Pitch f/x and Hit f/x “heat zone sort of stuff) come into play here. Of course, the challenge would be to integrate that with a statistical analysis.
However, since almost all LHH are worse against LHP, I would guess that most of them also have more holes in their swing against LHP, etc., so one would need to show 1) how Granderson, Howard, or whoever has “worse” mechanical flaws — and it would need to be other than saying, “well, they just do, just look at their platoon splits numbers,” since the numbers are what we’re supposedly trying to get beyond, and 2) again, how that integrates into estimating the platoon skills.
The possibility isn’t denied by this method of estimating skills. But I would object to the claim that someone can tell “just from the numbers” that someone is an exception to the rule.
I would agree 100% that you cannot tell from the numbers that someone is an exception to the rule or an outlier. If I implied anything but that in my post, then I apologize b/c that was not be intention. My main point was that this is an area where a GM can lean on his scouts to see if they can see anything, b/c looking strictly at the numbers won’t less us conclude all that much.
In the grand scheme of things, it sucks b/c this is a tough area to get a read on. But, there will always be outliers and the best teams and front offices find ways to discover those outliers and use them to their advantage. A combination of looking at various data points from Hit F/x and Pitch F/x might help and leaning on team scouts is always a nice option. This is just another area where we need to grow and learn more about in the coming years.
Scottwood:
No need to apologize at all, even if we were disagreeing, since disagreement is part of rational life.
IN any case, I don’t even think we’re disagreeing!
Thanks for the thoughtful comment.
Great reply, thanks!
This is totally off the top of my head (or pulled out of my ass, depending on your anatomical preference), but could there be something to the fact that both of your “reverse split” examples have played most(all for Stewart, I guess) of their home games in Coors? Some kind of effect on the breaking balls in the thin air, maybe?
Good point, and I’ve wondered about that myself, but decided to leave the park issue out of the already-over-long post. I don’t have component factors by handedness in front of me… (looking on hard drive…) I have some older component factors, and they’re pretty even, maybe a bit favoring RHH, but, of course, we’d really want it broken down by platoon matchups (RHH vs. LHP vs. RHH vs. RHP).
I also left out a graph that’s wroth looking at –Stewart’s wOBA split “unreversed” itself for 2009. Sometimes regression to the mean happens before our very eyes.
On the other hand, Holliday’s reverse split continued into 2009, when he didn’t play in COL.
Food for thought, though.
Interesting. Have to keep an eye on it next year(especially for Stewart), but looking at the graph you pointed out, that definitely looks more like an aberration than a true talent thing(his BABIP shot up against lefties in ’08 too, although obviously there are sample size issues).
Dumb question: what is the % split referring to, speficially? It doesn’t mean that, for example, Granderson’s numbers are 30% better against RHP.
Not dumb at all — it helped me realize just how unclear I was being.
When I say that his (observed) split is 30%, I mean that the difference between his observed wOBA vs. LHP (.380) and RHP (.270) — which is 110 points of wOBA — is 30% of his observed total wOBA (110/358).
Does that make more sense? I’m trying to think of how I could express this better in the future…
That makes more sense. Another question – when you ‘centered’ Granderson’s splits, you said you used the fact that 27.3% of his PAs were against LHP. If you simply multiply the 17.6% you got by that percentage that gives 4.17, which is close to what you put in for credit towards RHP. I’m missing the intuition as to why that would be the case. To me it seems like the best way to do it would be to scale the old numbers back while keeping the ratio between them the same – i.e. you want
x/y = const (from old split)
x + y = new split
where x,y are the percentage above and below the ‘regular’ wOBA. That way a guy who is 3 times worse, percentage-wise vs LHP than RHP would still have that hold.
So for Granderson that would mean
x + y = 17.6
x = 7.34y
which would give
x = 14.5
y = 2.1
So he would be 2.1% better vs rhp and -14.5% worse vs lefties, giving wOBAs of
v LHP: .306
v RHP: .365
That’s just my intuition though, and is probably wrong (you do need to weight playing time somehow to make sure that the final wOBA is actually .358!0
Actually, I backed out the equation (that makes sure the overall wOBA matches) and you get the simple equations
x = (1-P_L)*S
y = S – x
where S is the regressed split percentage, P_L is the percentage of PAs vs lefties and x and y are the percentages you discount vs LHP and RHP, respectively
berselius:
quickly, I wrote that 23.7% of Granderson’s previous PAs have been against LHP, not 27.3%…
whoops, thanks. I’ve been able to muddle through enough of it to feel like what I’m doing now
To be frank, I’m saddened by the lack of Matt Diaz.
Very good article, analysis and discussion. While it is indeed difficult to “tweak” the estimate of a batter’s true platoon split from something other than the numbers (say, a “scouting report about his hitting mechanics), not so with pitchers.
Just as we can regress a batter’s HR rate, for example, toward the mean of those player’s with a similar body build (e.g., surely you would not regress Juan Pierre’s HR rate toward the same mean as Howard or Pujols, and that has NOTHING to do with their numbers), we can regress a pitcher’s platoon ratio or differential toward that of pitchers with similar pitching styles (arm angles and pitch types and their frequencies).
For example, we know intuitively that a pitcher who throws from the side rather than over the top is likely to have a larger true platoon split, such that you would certainly not want to regress Chad Bradford’s platoon split to the same mean as Justin Verlander, simply because one throws over the top and the other from the extreme side (underhand actually). Or say Zito and Fuentes, for lefties.
As well, in some brilliant research written up in the THT Annual (and on their web site, I think) from one or two years ago – unfortunately I forgot who did it – it was determined that fastballs and sliders have by far the largest platoon splits and that curve balls, contrary to popular belief, do not. So, you can use a pitcher’s repertoire to also determine the proper mean to adjust his platoon numbers towards.
And yes, batter and pitcher platoons get “combined” using an odds ratio method when you want to determine the overall expected platoon ratio of a particular mathcup. For example, if a pitcher with a true reverse split (maybe like a Zito or someone with a screw ball) would face a batter with an extreme true split like Grandy or Howard, you might expect that the overall effect would be a normal platoon split, or basically somewhere in between. As with most things that are a reasonably significant talent among pitchers AND batters, we find that an odds ratio combination works just fine. It is a “myth” or a misconception that the batter or pitcher would “dominate” in a matchup. The fact that one or the other may have much more significant control over the “skill” or outcome in question (such as the fact that batter’s have a much larger spread in HR per PA or FB than do pitchers) is already accounted for when we estimate the batter’s and the pitcher’s true talent with respect to that skill or outcome by doing the regression.
Thanks, MGL. Great stuff. I hope people are reading this…
MGL: that was John Walsh in one of the THT Annuals.
I wonder what part of “regression” is actually learning. It seems to me that any player with terrible platoon splits is going to make an effort, probably in concert with his coaches, alter his approach. I don’t think the answer to that question affects the analysis, by the way.
Well, if we’re going to talk about reverse splits, we really have to talk about Ichiro:
career wOBA vs LHP: .365
career wOBA vs RHP: .342
Actually, that’s 2002-2009 so it’s not really his career (Fangraphs doesn’t have splits for 2001 or, obviously, for Japan). But it’s 1708 PA vs LHP and 4161 PA vs RHP, so it’s not a small sample.
It’s late and I’m tired so I’ll let somebody else do the regression vs average over those years.
Something that I don’t see here is that different managers have different strategies for different players in platooning. Leyland would sit Granderson only vs Top flight LHers, which would lead me to believe that his stats vs Southpaws would be even worse had he faced the better ones.
This is quite different from the stats of strict LH/RH platoons.