# Introducing: xW, xBABIP, xLOB%, xHR/FB, and more

Alright guys, prepare yourselves for stat overload. I’m about to introduce 12 new stats that will help us better understand pitchers. Now I know that 12 sounds like a lot, but don’t worry too much — most of them are related to stats you’re already familiar with and/or come in pairs. I’ll explain everything as plainly as I can (while leaving enough of the guts in there for people who care), and if you have any questions, I’ll be happy to answer them.

### Where we’re at

As it stands now, many fantasy analysts are starting to make use of stats like BABIP and HR/FB, but the analysis often goes something like, “Player A has a .315 BABIP and 15% HR/FB. He is getting unlucky and both will regress toward league average.” There’s nothing wrong with this — it will usually be correct — but I think we can take things a little further. After all, not all pitchers should be expected to post an exactly league average BABIP or HR/FB or LOB%.

For example, a pitcher’s home park will effect his HR/FB, so a guy throwing half his games in Coors should be expected to have a higher HR/FB than a guy who plays in PETCO. We also know that ground balls become hits at a higher rate than fly balls, so groundball pitchers should be expected to have a higher BABIP than flyball pitchers. These are the kinds of things that my new stats will try to account for. So, without further ado, here they are:

### 12 new pitching stats

**xBABIP**: While we often say that a pitcher has little control over his BABIP — and this is true — they do not relinquish all control. Most importantly, we know that a pitcher has a lot of control over his groundballs and flyballs, a good amount of control over his pop-ups, and little control over his line drives. To calculate xBABIP, we first neutralize line-drive rate and adjust the other three rates accordingly (like we do to calculate xGB%). Then we assume a league average rate of hits on all types of batted balls. Add up those hits, and we can calculate an expected BABIP.

What we’ll see is that extreme GB pitchers have higher xBABIPs and extreme FB pitchers have lower xBABIPs (while also realizing that guys who induce a lot of pop-ups will have low xBABIPs too). This past season, for example, GB’er Aaron Cook had a .314 xBABIP while FB’er Jered Weaver had a .291 xBABIP.

**xHR/FB**: This is calculated very simply by using park factors. We assume a 50/50 home/road split for the pitcher, a neutral road schedule (HR/FB park factor of 1.00), and account for the pitcher’s home ballpark’s HR tendencies. It is very important to note, as I have in the past, that even if a pitcher calls an extreme HR park home, his expected HR/FB will still remain pretty close to neutral. The xHR/FB for Rockies pitchers, for example, was just 12.39 percent in 2009 (with a league average of 11.18 percent).

Analysts often like to credit deviation further from the mean than this to a pitcher’s home park, but that simply is not the case (unless the pitcher has thrown a disproportionate number of games at home, and even if he has, that shouldn’t be expected to continue going forward). Simply put, HR park factors are not quite as extreme as most seem to believe.

**xLOB%**: Of the three main ‘luck indicators,’ LOB% has the most room for skill-based variation. This is because LOB% is actually an exponential function. To put it simply, if Pitcher A allows hits at a 24 percent rate and Pitcher B allows hits at a 30 percent rate, once men reach base, more of them will score on Pitcher B because he is more likely to give up hits to begin with. His hits will be clumped closer together. As such, LOB% has a fairly strong relationship with the rate at which batters reach base.

xLOB% is calculated using a regression formula derived from BAA and BB%. Now, of course, BAA is subject to extreme variation since it is largely comprised of BABIP. So instead of using actual BAA, we use xBAA, which accounts for the pitcher’s actual K rate (as with hitters, the more Ks, the fewer opportunities for hits) and his xBABIP. What we end up seeing is that good pitchers end up leaving more runners on base (Tim Lincecum: 75.6 percent) while bad pitchers let more score (Jeremy Sowers: 68.1 percent) than league average (71.9 percent).

**R/HR and xR/HR**: HR/FB has become a common stat for measuring a pitcher’s luck with home runs, but it doesn’t tell us everything. For example, a pitcher can have a seemingly lucky 4 percent HR/FB but could actually have experienced bad luck with HRs if he was unfortunate enough to have given up all of his HRs while the bases are loaded. On average, about 1.4 runs score per HR, but not all pitchers allow them at this rate (some justifiably, some as a result of luck). R/HR tells us how many runs actually scored per home run allowed while xR/HR tells us how many runs should have scored (the process for this is a little complicated, but I’d be happy to explain for anyone interested).

**Home Run Runs per Fly Ball (HRR/FB) and expected Home Run Runs per Fly Ball(xHRR/FB)**: Absolutely my favorite of this new crop of stats. A mixture of HR/FB and R/HR, HRR/FB tells us how many runs scored on home runs per outfield fly. xHRR/FB, naturally, tells us how many should have scored. You can consider this a **super-powered HR/FB** since it not only accounts for how many HRs are allowed but also the total damage done by the HRs, which is what truly matters. Ten solo home runs do just as much damage as five two-run homers, which is something HR/FB doesn’t capture on its own.

**Run Support (RS) and xRun Support (xRS)**: These two stats are just what they sound like. Run Support is the number of runs that a starting pitcher’s offense scores in games that he pitches. xRun Support is the number of runs per game the pitcher’s team scores in all games during a season. Since pitchers have little influence over how well their offense performs in games that they pitch, we should expect the offense to perform at its usual level each time the pitcher takes the mound.

**Bullpen Support (BS) and xBullpen Support (xBS)**: Very similar to the Run Support stats. BS measures how well the pitcher’s bullpen performs in games he pitches and xBS measures the bullpen’s performance during all games.

**xWins (xW)**: While many fantasy analysts call Wins a fickle stat — and they’re right — they aren’t wholly unpredictable. Axioms like “don’t chase wins” or “draft skills” are thrown around often, and while one can be successful by simply following this advice, I feel as though we can do a little bit better. And if we **can** do better, why shouldn’t we?

Essentially, xW uses Bill James’s Pythagorean Theorem to estimate the expected number of games a pitcher should have won. Using this formula, I plug in the pitcher’s LIPS RA (weighted by his IP per game), his xBS (weighted by the IP the starter doesn’t pitch per game), and his xRS.

This gives us the number of games the SP’s team will win on days he pitches, and from there we calculate the percentage of those games he should get credited for the Win based upon how deep into games he goes (pitchers who last into the eighth inning are far more likely to receive a win than those who only last four or five innings — there’s more time for his offense to score runs. The small problem here is that unlucky pitchers won’t go as deep into games as they should, and visa-versa for lucky pitchers, but I haven’t accounted for this yet).

### Concluding thoughts

Now I’m not saying that all of these stats are perfect, and they all assume randomly sequenced events (which may or may not be a 100 percent fair assumption) but I do think that they largely serve our purposes and are certainly better than making mental estimations (as we all currently do) or simply assuming everyone will be league average. Again, if you have any questions, absolutely feel free to let me know. Tomorrow, be on the lookout for an article centered around Ricky Nolasco that will make use of these stats, so you can see them in action.

### Prior work done on the subject

**EDIT:** Thanks to Will Larson for bringing to my attention that prior work has been done on some of these topics. Will created his own versions of xW, xBABIP, and xLOB% that can be found here.

THTF’s own Paul Singman also did work on the link between BAA and LOB% here.

David Appleman also created a basic xBABIP formula here.

I’ve been waiting a long time for those first three, especially. That said, I’m waiting for further refinement of HR/FB% rather than assuming that all pitchers regress to the mean equally.

Formulas would be *super*.

Derek,

I missed “Advanced Baseball Stats 101”… do you have a similar guide, or short summary page, for stats such as BABIP, HR/FB, etc?

Will these stats be posted/updated somewhere?

Awesome. Can’t wait to see what you do with this.

Very interesting, Derek. I very much like where you’re going.

Also, I heartily second Josh’s request for formulas.

Once again Derek you make this one of the best sites, great post !!

It seems as though park factors vary tremendously from place to place. Even between the hardball times’ article ( http://www.hardballtimes.com/main/blog_article/hr-fb-park-factors/ ) and the espn list ( http://espn.go.com/mlb/stats/parkfactor/_/sort/HRFactor ) there is tremendous variation. Anyone have a good idea as to why?

@bender:

The THT one is HRs per fly ball, the ESPN is just home runs.

So the THT one is the one closer to what Derek is identifying here with xHR/FB.

Derek, we exchanged emails about this last year. I asked you if you wanted me to write about these stats and you declined. You need to cite my work, especially with regard to xLOB, xBABIP, and xW. I shared my work with you in good faith.

Google “luck adjusted pitching” and you’ll see my work as the top search result.

See the spreadsheet of last years’ stats at http://www.williamlarson.com/baseball_spreadsheet.pdf

I am familiar with Will’s work, use it on occasion, and recall a conversation I had with him some months ago regarding his lengthy correspondence with you. This is gross plagiarism and an obvious theft of intellectual property.

Actually, our conversation wasn’t via email. It was on a THT comments thread at http://www.hardballtimes.com/main/fantasy/article/for-those-who-still-dont-believe-fip-is-poor-for-fantasy-analysis/

My email correspondence was with Victor Wang here at THT

Ah, re-reading your post, Dan, it looks like I misunderstood your point about Yankee Stadium – that it would alter the ‘Away’ environment for all the other stadiums. This is certainly true, although I think 5/3/2/1 seems awfully heavy to account for this. I’d need to look into it more, but I think the effects would be negligible, or close to, for most parks.

“Now I’m not saying that all of these stats are perfect….”

Aren’t statistics by definition imperfect?

interesting stuff

Can you post the formulas used?

The problem is that most of these don’t really have formulas, per se. I’ve create them with a series of SQL statements that draw from a bunch of different databases and tables. Instead of trying to simplify them into statements, I think it’ll be easier just to run them myself for players (especially since a lot of the necessary data comes from Retrosheet, which is not the easiest thing in the world to work with). I’ll have some more info on how I’ll be releasing the data soon.

Derek,

Some of us have Retrosheet databases ourselves, and it would be good to have others check and attempt to improve on the work.

A file with the SQL queries, or an outline of how they are generated would be really useful.