## The Right Denominators

Rates are obviously important to sabermetrics, particularly when discussing player skill. That’s why we don’t just look at a player’s raw numbers like total hits, walks, or home runs. That’s why batting average, and later, on-base percentage and slugging became popular. If we want to break things down more precisely to examine specific skills, we can look at things like walks, strikeouts and home runs per plate appearance. That works pretty well, and depending on how careful you want to be, at a certain point practicality outweighs precision. But what if you are really trying to look at a player’s skills carefully, is the good ol’ plate appearance really always the right denominator?

The answer to the rhetorical question is “no,” as you might suspect. In a way (loosely) analogous to different stats becoming reliable at different sample sizes, different skills need different denominators. Fortunately for a statistically-limited person such as myself, the process of deriving those denominators is much simpler than figuring out sample stability! Of course, I’m not smart enough to come up with this stuff myself. This sort of analysis was (as far as I know) originally applied to baseball by the legendary Voros McCracken and extended by Tom Tango and others. I originally read about it a few years on the dearly-departed Statistically Speaking blog in posts by Brian Cartwright (the brains behind the OLIVER projections) and Russell Carleton (a.k.a. Pizza Cutter).

For the sake of this post, we’ll work from the perspective of the hitter (although it could go either way). Although not all would agree, I will leave out sacrifice hits (SH), and of course, reached on errors since they introduce their own difficulties of context and/or scoring. We we also leave out intentional walks because those are also determined by situation and the decision of the pitching team. The method we use to go through and figure out the “right denominators” is called the binomial method. At the name implies, that means that we break down the plate appearance (given the exceptions noted above regarded sacrifices and intentional walks, for the sake of this post, a plate appearance will mean AB + uBB [unintentional walks] + HBP) such that at each “step,” two possibilities are available.

[**Side note on rates and ratios:** on FanGraphs and most other stats sites, most stats are expressed as “rates,” and these are the easiest to understand in many ways. However, analysts like Tango prefer that these sorts of things be expressed as ratios. Our own Matt Swartz, on the other hand, says that the rates versus ratio issue ultimately is one of aesthetic preference. ~~It was my understanding that there would be no math.~~ Rather than get sidetracked in this debate, I will simply put both expressions down below.]

[**Author’s note, August 22, 2011**: Originally, I excluded sacrifices flies (SF) from this analysis, but it bothered me, so I’ve put them back in where relevant.]

At the beginning of a plate appearance (AB + uBB + HBP+SF), before we get to either a play ending on contact, a walk, or a strikeout, the batter might get hit. What is this rate or ratio?

**Rate: HBP/(AB+uBB+HBP+SF), Ratio: HBP/(AB+uBB+SF)**

The next step is a bit trickier. It is obvious that we should separate “contact events” before “non-contact events,” but should walks or strikeouts come next? We could express this as a multinomial, but that would violate the relative simplicity we are going for here. Most versions of this I’ve seen put walks “before” strikeouts, and I agree. If you’re looking for specific grounds for this ordering, perhaps the fact that some strikeouts end with contact (foul tips) would be the reason.

Walks:

**Rate: uBB/(AB+uBB+SF), Ratio: uBB/(AB+SF)**

Strikeouts:

**Rate: K/(AB+SF), Ratio: K/(AB-K+SF)**

What if the hitter ends the plate appearance on contact (other than foul tips, which aren’t separated out in the official statistics)? Does it stay in the park or not? We need to separate out home runs. What is the appropriate rate or ratio for home runs?

**Rate: HR/(AB-K+SF), Ratio: HR/(AB-K-HR+SF)**

[Note that the official statistics as given on the player pages don’t separate out inside-the-park homers; given that information, one would want to include those as balls in play.]

So if a ball is in play, how often is it a hit or an out? This is just good-ol’ BABIP.

**Rate: (H-HR)/(AB-K-HR+SF), Ratio: (H-HR)/(AB-K-HR-H+SF)**

Now we get to types of hits in play (excluding inside-the-park homers given the information we are working with). Once a player gets to first base, he can turn it into an extra base hit or not.

**Rate: (2B+3B)/(H-HR), Ratio: (2B+3B)/(H-2B-3B-HR)**

Of the hits where the player can reach second, does he go to third?

**Rate: 3B/(2B+3B), Ratio: 3B/2B**

There, in simplified form, you have it. Each “step” of the plate appearance is broken down in binomial form to determine what the “right” denominator for the skill really is. I would guess this is how most of the more sophisticated projection systems analyze these events. This doesn’t mean the player pages should be revised to conform with this sort of thing. As I alluded to at the beginning of this post, there is a trade -off between precision and practicality. Indeed, Marcel projections stand up pretty well to the big boys by PA as the universal denominator and regressing every skill the same amount. But, if you want to be more precise…

*My thanks to Matt Swartz and Tom Tango for discussions on this topic, although neither should be held responsible for my errors.*

Print This Post

Wow, lots to discuss here! I wonder why you feel SF’s should be left out of the denominator? I would include them.

Also, I’m surprised you think that Tango prefers ratios. Personally, I avoid them at all costs because interpreting ratios is not a straightforward process.

I’ll just leave it at that for now. :)

Thanks, Studes.

I don’t necessarily think they “should” be left out, I just did it here for the sake of simplicity. Other people can fight it out. I do think SFs should stay in the “official” OBP calculation, of course (and SHs should arguably be put in as well, they aren’t at the moment).

I guess on the side of “should be left out,” one could argue that they depend on the base-out state, and since we’re going for context-independent skills, they muddy the water. On the side of “should be left in,” it could be argued that unlike SHs, they aren’t really deliberate acts (although some will point out that SHs aren’t scored consistently, either).

It’s all a rich tapestry…

/cop-outs

Oh, they should definitely be left in. Not gray at all.

And serious analysts should not use ratios. It’s not just aesthetic, it’s fundamental.

Fight fight fight!

‘Qalks’ sounds like something dirty

oh baby

That’s a baseball stat from when the Starfleet Commanders play vs the Ferengi Barterers.

Nerd.

I get what you’ve done, but I’m not quite sure what it’s given us in a practical sense. Maybe you could apply it to a real set of data for some players and show us how this new lens gives us a better look at their skills.

Yes. This. Does changing the denominator really show us something different about the player?

What Telo says, except I’m not quite sure I get what you’ve done. Even just an example rather than a data set would help. What does this tell us about Albert Pujols/Yunesky Betancourt/Joe Shlabotnik that I misunderstood before you pointed it out to me?

Ok, I’m having a hard time wrapping my head around what you just did. Actually, I guess I’m less confused about what you did but about why you did it. What is the point of this? I don’t really understand why 2B+3B/H is better than 2B+3B/PA. And why is SF in the denominator for BB but not for HBP?

The SF thing is a typo, as tangotiger pointed out and I fixed (originally I was going to leave them in, then decided to simplify and take them out as Tangotiger does here: http://www.tangotiger.net/agepatterns.txt

Maybe an overly broad, non-mathematical way of putting it is by analogy. Humans and dogs are both classed as mammals, and those give you a lot of their characteristics, but if you want more specific information about them, you’d want to get down to the genus and species:

canis lupusandhomo sapiens. Strikeouts and triples are both plate appearances, but by you analyzuing them as non-contact outs and extra-base hits in play you get more specific information about the player’s true skills.A SF is a regular batting out, and should be treated as such at all times.

A SH is more questionable.

There’s a typo here:

Rate: uBB/(AB+uBB+SF), Ratio: uBB/(AB+SF)

You intended to not have SF, so you need to remove that, if you want to be consistent.

As for ratios/rates: you can present it as rates, which is fine and preferable. But when you do the calculations, you have to do it as ratios, since a ratio is the odds. And calculations are applied on the odds, not the rates.

Sorry, but could you explain that last part more? Why are calculations applied on the odds instead of the rates?

Why are calculations applied on the odds instead of rates? Aren’t stats like wOBA and FIP rates, not ratios?

Tango has posted a reply here:

http://www.insidethebook.com/ee/index.php/site/article/why_do_we_apply_adjustments_to_ratios_and_not_rates/

It’s a good reminder that adjustments–those that apply one “rate” on another “rate”–are best done using the Odds Ratio method. I think that’s fine, but I don’t think it really is germane to what Matt has presented here.

i’ll come back to this when im more sober

Can someone explain or clarify some things for me.

1. “At the beginning of a plate appearance (AB + uBB + HBP), before we get to either a play ending on contact, a walk, or a strikeout, the batter might get hit.”

How about…before we get to a play ending with a batter getting hit by the pitch, the batter might draw a walk or strike out or end the play on contact. Why can’t this be the case? I’m still just not sure why HBP’s have to be accounted for first.

2. “It is obvious that we should separate ‘contact events’ before ‘non-contact events,’ ”

This is not obvious at all to me. I suspect this is related to the troubles I have with the first point I mentioned. Can someone explain why this is supposed to be obvious?

I’ve been wondering forever why Fangraphs uses AB (as opposed to PA) as the denominator for some of the rates around here, like K-rate. And this post is explaining why that is, which is good to see. But until I understand more about the points mentioned above, I still won’t get why using PA is not the ideal choice. Thanks.

I believe Fangraphs changed the denominator of K-rate to PA recently

I was under the impression that successive branching binomial distributions simplify to a multinomial distribution, that in fact, the two are identical. The former makes more sense when explaining the theory, but is mathematically a pain in the butt to deal with so statisticians use the multinomial when modeling.

Matt,

Don’t really see any sense in your strikeout denominators.

Strikeouts: Rate: K/(AB), Ratio: K/(AB-K)

Why would you not use PA instead of AB? Using AB is like saying the batter had no chance of striking out when he drew a walk or got hit by a pitch. You are not measuring contact when swinging here since people can be called out on strikes.

I think some of the reasoning for Fangraphs’ approach to “descending denominators” can be found here:

http://www.hardballtimes.com/main/article/a-quick-look-at-four-hitting-rates/

http://www.insidethebook.com/ee/index.php/site/article/why_do_we_apply_adjustments_to_ratios_and_not_rates/

I think there’s an error with the (H-HR)/(AB-K-HR-H) ratio. It should be (H-HR)/(AB-K-H) because in the one you presented, HR are double counted in the denominator.

no cause you’re try to take HR out of the equation. Also, shouldn’t the XBH rate be (2B+3B)/(H-HR), instead of (2B+3B)/H. HR are out of play therefore by the shouldn’t be in the denominator if your asking how many times a player can turn a single that is in play into a base hit

To answer the question as to why this is important — a few years ago Coors had the expected high HR/Game park factor, but it turned out to be entirely an artifact of its depressed SO rate. The actual HR / (AB + SF – SO) (aka HR/Contact or HRC) park factor was dead average.

You get the denominators right in order to understand what’s really going on. For instance, triples are pretty much a speed skill, but a low 3B/PA or 3B/AB might just reflect an inability to hit the ball in the gaps. 3B / (2B + 3B) isolates the actual skill.

Re the question of whether K or BB should be taken out first, I have to admit to violating the binomial logic and taking them out simultaneously. Imagine a hitter taking a 3-2 pitch on the black and you’ll see an argument for regarding them as equals.

One thing that I do some times is to take out both SO+BB, and then doing SO/(SO+BB).

This is similar to taking out 2B+3B together, and then isolating the 3B relative to the 2B+3B.

Bottom line–there are no ‘right’ denominators. The right denominator is the one that best answers the question that you are asking. End of story.

http :// www. buygreatshoes. org

If K/AB is a better rate stat, then why did fangraphs change K% for batters from K/AB to K/PA?