## Tool: Basically Every Pitching Stat Correlation

In doing my research, I often like to take a look at correlations to get an idea about whether factors might be connected.  At the end of this season, I put together a spreadsheet to help me with that.  Well, I haven’t finished the research yet (FG+ subscribers will probably soon find out what’s been keeping me from it), but in the meantime, I thought I’d share what I hope will be a pretty handy tool for whomever out there might be interested in what lies a little beneath the surface of all these stats on FanGraphs.  And I do mean all of them.  Any pitching-related stat on FanGraphs should be represented in this tool.  You can compare one stat to another, or to itself in a different year.  Or, what the heck, you can even compare a stat to a different stat in a different year.  And, for you sticklers out there, it will even give you a confidence interval on these correlations (by default, it gives you the range of correlations that the true correlation has a 95% chance of being within).

What can you do with this?  Well, let’s say you want to see whether a stat is predictive of the next year’s ERA.  You could, for example, set Stat 1 to K% (after selecting the correct white box, type it in, or select from the drop-down list via the arrow to the right of the box), with the year set to 0 (meaning the present year), then set Stat 2 to ERA, with the year set to 1 (meaning the next year).  If you don’t change the IP or Season filters, you should see a correlation of -0.375.  That shows there’s a pretty decent connection between the two stats, in that if a pitcher has a high strikeout percentage in one season, he’ll likely have a low ERA the next (relative to the rest of the pitchers in the comparison).  If you change the year under ERA to 0, you’ll see the correlation gets stronger, whereas if you change it to 2 or 3, you’ll see it gets weaker.  That has a lot to do with the unpredictability of K%, and especially of ERA.  You’ll notice if you compare year 0 K% to year 1 K%, the correlation is a very strong 0.702, whereas if you do the same for ERA, it’s a moderate-to-weak 0.311.  Hopefully the graph will give you an idea of how strong those connections really are.

About those filters: 30 innings pitched (in a season) is the minimum for this data set.  If you change either the minimum or maximum IP setting, it will apply to whichever seasons are in question.  However, the Season filters will only affect the range of Year 0s.  It’s assumed that you’re going to set either or both of Stats 1 and 2 to Year 0, and doing otherwise will limit your sample a bit (by cutting out those who didn’t pitch in Year 0).

If you’re wondering, the “PU%” you see as one of the default stats is what I’ve been calling “Popup Percentage,” which I define as IFFB/Batted Balls[Edit: I made wOBA against the default instead of PU%, to show you all this new addition to the spreadsheet].  You might think IFFB% would cover that, but IFFB% is actually IFFB/FB.  Batted Balls can be calculated by adding the three main batted ball types together (FB+GB+LD).  PU% is sort of weakly anticorrelated with the next year’s BABIP, but it’s actually a stronger correlation than BABIP has with itself, year-to-year.  It gets stronger when you raise the IP minimum, which helps weed out a bit of the randomness.

Some more about correlations, in case anybody is unclear: a -0.5 correlation is not weaker than a +0.5 correlation; they’re the same strength, only in opposite directions.  In a +0.5 correlation, when one stat gets higher, the other also tends to; in -0.5, when one goes up, the other goes down.  Notice I said “tends to”; in a +1.00 correlation relationship, when one goes up, the other does go up, by a very predictable amount.  But, unless you’re correlating a stat with itself (same season), you probably won’t see anything like a +1.00 correlation.

Here are some ideas of things to try:

• Clutch, year 0 vs. Clutch year 1.  You should see pretty much no connection at all — just a circle of points on the scatterplot, pretty much.
• WPA, year 0 vs. WPA, year 1.  Now, at least there’s a little bit of a correlation, and it’s more of an oval.
• vFA (pfx), year 0 vs. vFA (pfx), year 1.  This is four-seam fastball velocity.  It’s almost a straight line.  You’re not going to find many stats more consistent than this one, year-to-year.
• ERA, year 0 vs. BB%, year 0.  The correlation is surprisingly low(~0.2).  Now set ERA to year 1 — pretty much no correlation at all, at least until you raise the IP minimum a bit.
• ERA, year 0 vs. K%, year 0.  OK, this is a pretty good correlation (~0.5).  When you set ERA to year 1, the correlation weakens, of course, but it’s still pretty decent (~0.37… actually, it’s stronger than ERA’s correlation to itself between years).
• K-BB% (something I added at the end) year 0 vs. ERA year 0.  This is even better than K%, at around a 0.56 correlation.  However, when comparing it to next-year ERAs, it’s pretty much as predictive as simple K%, if not a little lower.  I also added kwERA, which correlates the same as K-BB%, but in the opposite direction (the formula used is 5.40 – (12*((K-BB)/PA))) ).

At the end of the drop down stat lists, I added some bonus stats:

• Foul%, which Christopher Carruthers brought up in a very interesting Community article, as did Russell Carleton/Pizza Cutter before him.  I think there’s some potential for this stat.  It seems to have a better connection to HR/FB than just about anything, for one (though it’s still a weak one).
• TIPS, the ERA estimator from Christopher’s article linked above
• BERA, my ERA estimator from last offseason.  It was designed to match long-term ERA as well as to be predictive of next-season ERA
• SBERA, another of my ERA estimators.  It was purely aimed at being predictive of next-season ERA.
• pFIP, by Glenn DuPaul, which is a differently-weighted FIP meant to be predictive of the next season.
• (Late addition): MBRAT, by Dan Greenlee, which includes the pitcher-related fielding stats rSB and rPM, also late additions to the spreadsheet

So if you’re interested, you can take a look at all these ERA estimators (including FIP, xFIP, SIERA, and tERA), and see how they compare to ERAs of surrounding or same years, with different IP or season ranges.  Here’s a comparison (out-of-sample) of the ERA estimators, matching each pitcher’s 2012 stat against their 2013 ERA (30 IP minimum):

ERA Estimator Correlation Low High
SBERA 0.394 0.295 0.484
BERA 0.361 0.260 0.454
SIERA 0.356 0.254 0.449
MBRAT 0.335 0.232 0.430
pFIP 0.327 0.224 0.423
-K% 0.313 0.208 0.410
xFIP 0.311 0.206 0.408
TIPS 0.297 0.191 0.395
tERA 0.292 0.186 0.390
FIP 0.290 0.184 0.389
kwERA 0.281 0.175 0.381
ERA 0.238 0.130 0.341

Here are the next-season ERA correlations over the entire sample (2007-2013, 30+ IP):

ERA Estimator Correlation Low High
SBERA 0.433 0.395 0.470
SIERA 0.417 0.378 0.455
BERA 0.408 0.369 0.446
MBRAT 0.402 0.362 0.440
pFIP 0.397 0.357 0.435
-K% 0.375 0.335 0.414
xFIP 0.372 0.331 0.411
kwERA 0.367 0.326 0.406
FIP 0.358 0.317 0.398
tERA 0.355 0.314 0.395
TIPS 0.334 0.293 0.375
ERA 0.311 0.269 0.352

The low and high correlation estimates are all at 95% confidence.  There’s quite a bit of overlap between the 95% ranges of all of these, so it’s not exactly conclusive which is best, but it’s pretty clear that a lot of them are better predictors of next-year ERA than is ERA itself.

Well, I’ll leave you guys to it.  Let us all know if you discover anything interesting!

• HBP% (hit by pitches per batter faced)
• SV% (saves/(saves + blown saves))
• RA9 (runs allowed per 9 IP)

Just for the heck of it, I also added ShO% and CG%, or shutouts and complete games per game started.  I also replaced wins and losses with W/(W+L).  To make room for these new stats, I had to cut out a lot of superfluous counting stats and rarely used pitch types (eephus, knuckle curve, knuckler, etc.).  I’ll try to accommodate other requests… well, if I think they’re interesting, anyway.

Edit #2: I added wOBA against as a stat.

Print This Post

Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?

### 93 Responses to “Tool: Basically Every Pitching Stat Correlation”

You can follow any responses to this entry through the RSS 2.0 feed.
1. Brandon Firstname says:

This is wonderful Steve.

I’ve always been interested in the predictiveness of underlying pitching statistics, and this gives me an opportunity to quickly reference some information for just that.

• Thanks!

• tz says:

I finally had to log off an hour ago.

Ran out of excuses why I had to keep going back to my laptop. My wife has promised she’ll be shopping for an internet porn blocker first thing after she wakes up.

Thanks for this awesome toy Steve.

2. Chummy Z says:

Damn.

This is all sorts of awesome. Thanks a ton, especially for including TIPS! That community article was super interesting in its relevance to relievers, so I’m glad to have an interactive tool here to work with.

3. Matthew S says:

If stats like SBERA and SIERA are the most predictive, why aren’t we using them for a WAR system?

• Pale Hose says:

I believe it’s because WAR is meant to be descriptive.

then, for pitchers, why use FIP, instead of ERA? ERA is what actually happened, with luck included.
This argument has come up before many times, but WAR is inconsistent on this account; it’s descriptive for batters, and meant to be semi-predictive for pitchers.

Why not split WAR into two, a descriptive stat (dWAR) and a predictive stat (pWAR)?

• Pale Hose says:

Well, FIP is just strikeouts, walls and home runs all of which actually happened.

• Summertime says:

You could just look at Fangraphss and Baseball Reference’s respective WARs. Plus, descriptive can mean different things. Getting the same groundball could be the same description even if one results in a hit because Yuniesky Betancourt is at short and the other results in an out because Andrelton Simmons is at short.

“Well, FIP is just strikeouts, walls and home runs all of which actually happened.”

uh huh. That’s true, but the point stands. If it’s meant to be descriptive, why not use the maximally descriptive statistic–runs allowed.

Conversely, if it’s meant to be predictive, why not use the maximally predictive statistic–SIERA (or xFIP or whatever newfangled stat is actually maximally predictive).

Right now pitcher WAR exists in a bizarre intermediate world where it is neither completely descriptive nor totally predictive.

to Summertime’s question:
Neither B-Ref nor Fangraphs has “sieraWAR” (to my knowledge–if you have found it let me know).

Anyway, I think Matthew S’s question is a good one. If WAR is meant to be luck-subtracted, then why not use SIERA? If WAR isn’t meant to be luck-subtracted, then why not use runs allowed?
(this is what I take “descriptive” to mean–with the influence of luck. Part of that luck is who stands behind the pitcher, Andrelton Simmons or Yuniesky Betancourt)

• gump says:

RA9-WAR works pretty well as a descriptive WAR for pitchers…

• Brandon Firstname says:

I’m in the process of writing up a community article on my very own DIST-WAR right now, which aims to be more descriptive than anything else (although it will be all kinds of not predictive).

• Bip says:

I’ve made this exact argument before.

I think the idea of FIP is not to be predictive, but to give pitchers credit for only the things they could control while not counting things they can’t control in while evaluating their past performance. The problem I see if that the difference between what a pitcher “can” and “can’t” control is a big gray area.

How much control does a pitcher have over whether he gives up a hit? Not a lot. But definitely some. He has more control over whether he gets a strikeout. How much more? No one knows. Is there any useful, measurable event that a pitcher has complete control over? No.

Basically FIP is saying that pitchers don’t have enough control over their BABIP or LOB% for us to count it while assessing their performance, so we will assumed league-average BABIP and LOB% while calculating FIP. However, it also says that they do have enough control over their HR/FB rate to count that against them, so it does not substitute in league average HR/FB. Same for BB and K, which a pitcher has “enough” control over.

I get your discomfort, and I wonder if there is an explanation of why the line was drawn where is was. Why not credit pitchers with a league-average HR/FB, when the same is being done for BABIP? I also think the difference between stats for evaluating past performance and those for predicting future performance is not well defined. The idea that FIP “measures what actually happened” is clearly not true.

• AC_Butcha_AC says:

You gouys really miss the point about FIP and not one of you have nailed it.

FIP is just context-neutral. That means a HR is always a HR is always a HR and always is the pitcher blamed for that HR (that actually and in real life left the park).

FIP doesn’t make a difference between sequence A:
BB, BB, K, K, HR, K

and sequence B:
HR, BB, K, K, BB, K

what happened? what has the pitcher done? he has given up a jack, struck out three and walked two guys. In sequence A he would be charged with 3 runs in ERA, in sequence B he would be charged with 1 run in ERA.

This is exactly how offensive WAR for hitters is calculated… context neutral. There are other context dependant stats that tell you a HR is worth more with the bases loaded than with the bases empty. But do you want to give the batter credit for his teammates? Welcome to evaluating players via the RBI again.

So in this way pitcher-WAR and hitter-WAR is created equally on FG…

FIP only uses the outcomes which are known a pitcher has the most control over. Is wOBA predictive or descriptive? Does it make more sense to evaluate a player by the “standard value” of an offensive event or do you want to become context dependant?
Does a grand slam really tell you more than just the information of a HR?

Think about that and you will see that FIP the way it is used in WAR makes perfect sense and is coherent with our ideas about assigning value to players.

#### +6

“This is exactly how offensive WAR for hitters is calculated… context neutral.”
Context, but not luck neutral.

If a hitter causes contact with the ball, the ball goes somewhere. Where? That depends on how he hit it–hard, and it has a decent chance of falling for a hit. Really hard, maybe a home run. Pretty hard, on a flat plane, probably a hit. Not that hard, into the ground, probably a groundout. That stuff matters, and it’s been proven time and time again to matter. And yet–for whatever reason–when somebody has a ridiculous year where somehow all of their balls happen to fall between outfielders and their BABIP is inflated, we calculate the WAR as though the hitter is entirely responsible for those hits.

Flip it, now. Let’s consider the pitcher. We know that contact matters because of research done on hitters. Let’s say a certain pitcher doesn’t allow hitters to get good contact (repeatedly, year-on-year-on-year; there are such pitchers). Nearly every time a hitter makes contact with one of their pitches, the ball comes off the bat soft and straight in the air. The BABIP is low, because popups have very low BABIP (being almost impossible to field incorrectly). Does the pitcher get credit? Nope. Why? “FIP says so.”

“Is wOBA predictive or descriptive?”
Descriptive. It doesn’t take into account whether a hitter’s hits would be home runs with different fielders or in a different park. It only takes into account what happened, regardless of if its repeatable. If a hitter gets a tiny bloop single that a lumbering, bad-fielding 3B like Miguel Cabrera can’t field, it DOESN’T PENALIZE THE HITTER. Even though the hitter doesn’t control who plays the field.

“Does it make more sense to evaluate a player by the “standard value” of an offensive event or do you want to become context dependant?”

BOTH. We can do both. It’s numbers on a spreadsheet–it’s not that hard to provide both a context-adjusted and a context-subtracted stat. And why not? If Pete Kozma next year turned in an incredible playoff run whereby he somehow managed to hit .400 with 10 HR that just barely passed over the fence, that would be pretty amazing. I don’t think anyone would then look at Kozma and say, “yep, that’s repeatable”, and we could have a stat that said that nope, Kozma isn’t a great hitter, not really. But the regular stats, like wOBA, might rightly point out that somehow Kozma had an amazing run that resulted in tremendous value to his team, whether or not it was repeatable.

“Think about that and you will see that FIP the way it is used in WAR makes perfect sense and is coherent with our ideas about assigning value to players.”

No. If you want to assign skill to players, subtract out ALL of the sources of luck, and keep ANYTHING that players have the ability to control, weighted by how much they can control it (e.g. BABIP, somewhat–see LD%). Such a stat is useful for prediction and figuring out how much players are worth.

If you want to figure out how much a player contributed to his team’s wins as they actually happened, use a context-rich, luck-full stat. Such a stat is useful for description–as in, “Gee, Pete Kozma sure was AMAZING that one year, even though he isn’t really a very good player”, and in figuring out how luck influences outcomes.

Why not both? It’s probably not a terrible burden for Fangraphs’s servers to bear, and it would clarify what WAR means immensely.

• Erik says:

FIP is used for simple accounting reason. Pitchers get full credit for things they are responsible for fully (K’s, BB’s HR’s) and partial credit for everything else (IP). Defensive players are then given defensive credit, and all sequencing issues are left out.

That said, perhaps it is time to replace FIP with SIERRA. The only difference is that SIERRA takes into account batted ball type and weighs that accordingly (thought not necessarily accurately).

I have a feeling the reason FG doesn’t do this already is because the SIERRA page on FG says this about trying to calculate the stat yourself: “Good luck.”

To Erik:

SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)

BAM. Formula for SIERA (as per http://www.baseballprospectus.com/article.php?articleid=10027). Looks terrifying until you realize that it’s just addition, multiplication, and a few exponents. At any rate, it’s not like Jeff Sullivan and Dave Cameron are sitting in a shack somewhere with abaci and clipboards cranking away at these calculations to the 50th decimal; it turns out that computers are real good at that stuff, and they’d make quick work of a few hundred pitchers’ SIERA calculations per year. So I reject this explanation–it’s not computationally intensive, at least not for a computer.

• I’m probably taking you too literally, Erik, but I don’t think it’s fair to say that pitchers are fully responsible for BBs, Ks, or especially HRs — the batters have a lot of say in that. As the name FIP implies, however, they are fielding-independent (more or less).

• AC_Butcha_AC says:

Sorry, but you are still not really getting my point.

SIERA is superior to FIP in telling you what you EXPECT to happen in the future, not what actually happened.

You are right, that wOBA gives Kozma full credit for a 10HR playoff run. It does not tell you, that his HR/FB was unsustainable high or anything. Or that the context mattered (wOBA = context neutral). Common sense and a different approach would tell you that this was unsustainable. How would you do it? Probably HIT/fx or some kind of how hard was it hit in location x with hangtime y and so on. I think you get it. This can tell you he got lucky. Maybe all of his HRs had incredible aid from the wind so they just barely made it over the fence. wOBA will give him credit for that.

BUT wOBA doesn’t tell you the best what WILL happen in the future. Would you relly like WAR to be calculated for what WILL HAPPEN? If yes, then you SIERA-WAR is just like that. It is a stat that is superior in predicting the future, whereas FIP is also relatively good at predicting the future but tells you more than ERA, what the pitcher is responsible for.

And another point of yours: Hitter have been shown to have considerable control over their BABIP and HR/FB%. Pitchers have been shown to have very little control over their respective BABIP and HR/FB%

SIERA is predictive because it is though to better estimate a player’s “true talent”. In other words, it’s better at separating the luck part from the skill part than FIP. So if we want skill-based WAR, use SIERA. If we want results based WAR, use something else (like runs or win probability or who knows). FIP is somewhere awkward between those.

Pitchers have control over BABIP.
http://www.baseballprospectus.com/article.php?articleid=20140
The argument is simple. We know that pitchers control batted-ball type stably (there are groundball pitchers and flyball pitchers, for instance). We know that BABIPs of groundballs and flyballs is different. Ergo, pitchers control BABIP via batted-ball mix (and other stuff too).

For pitchers, we treat BABIP as though it is luck and necessary to be regressed out (even though it isn’t all luck; see above).
For batters, we treat BABIP as though it is skill and worth keeping in (for WAR or wOBA or what have you), even though we know some BABIP fluctuations are luck. By the way, while perhaps batters have some control over BABIP and HR/FB, certainly they experience lucky swings which are nothing more than fortune–and we can figure out which batters have luck-driven high BABIPs (e.g. slow, slap hitters) vs. which ones have skill-driven high BABIPs. It’s not an indefatigable mystery, it’s a place where we could more accurately hone in on player skill, if that was desired.

It’s a strange and confusing triple-standard; we regress out lucky things for one player type, pitchers, but not all the way (as far as we could, via SIERA), while we leave lucky things like BABIP and HR/FB in for batters. All I’m saying is, respectfully, that decision does not make sense.

• B N says:

Re: “FIP is just context-neutral.”

Are baseball parks not a context anymore? Because I can assure you, some balls are going to be a HR in some parks, but not in others. And given that each pitcher plays about half his games in one park, that would seem like a pretty big penalty for a guy playing in Yankee stadium instead of Petco, no?

Face facts: FIP does give pitchers credit for all sorts of context.

Context included in FIP:
– Batter interactions (Facing pitchers; AL vs. NL)
– Park effects
– Sequencing/Leverage (More likely to pitch around and give up a BB in certain situations)

Context included in wOBA:
– Pitcher interactions (but not as much league split)
– Park effects
– Sequencing/Leverage (Same issue as pitchers, plus shift effects)
– Lineup position (8-hole in NL gets more walks)

Worse are the issues of attributing no control over things a pitcher clearly does have some control over. These are the things that FIP sticks its head in the sand about:
– Batted ball types (particularly LD% and GB%)
– HR/FB% (Small but consistent differences exist)
– Stretch vs. Windup performance (impacts LOB%)

FIP is useful. It’s a great one-year quick stat, due to the variance it excludes. However, I’d certainly trust RA/9 as a more meaningful career stat. If I then wanted to factor out defense and park effects, I’d do it by scaling for the fielders and parks behind them. FIP is not “fielding independent.” It’s fielding “ignorant.” Factoring out defense would be making an independent stat. It’s important to know the difference.

For the record also, ERA is garbage. I can’t imagine why it’s the gold standard for comparison. It’s basically RA, with a near-arbitrary adjustment factor. Ground ball pitchers result in more fielding, which results in more errors, which results in more runs.

• AC_Butcha_AC says:

Okay this is like talking against a wall now…

Let me get some things clear:

First of all, FIP-WAR on FG is park and league adjusted. Period.

Secondly, FIP DOES INDEED give indirectly credit to low-BABIP pitchers. How? – Because IP are in the denominator.

Example: Let’s say Rivera has a true talent BABIP of .270. Since IP is included in the FIP-formula as a denominator he gets real credit for batted balls turned into outs. Not K’s or HR’s or BB’s just batted balls turned into outs. Flyouts, groundouts, popups… there you have it.

If you were to use SIERA for pitcher WAR one could also called it ASSUMPTION-WAR.

Luck is clearly a factor in both hitting and pitching. FIP and wOBA have the same approach via the linear weight and by being context neutral. wOBA gives the same credit for a HR scratching the wall and a mammuth shot. Who is likely the player with greater “true talent” probably the one with the mammoth shot. Does it matter? NO. Same with FIP. Exactly the same. No assumptions about how we would expect the wall scratcher to be caught 75% of the time and therefore giving the hitter only credit for 0.25 HR. Thta’s what SIERA-WAR would give you.
IMO it doesn’t make sense to use SIERA for WAR calculations. That is not what WAR wants to measure.

SIERA is a great stat for predicting the future. Your approach is more of a “prediction WAR”. It doesn’t tell you though what really happened but what we expect to happen going forward.

HR/FB can fluctuate greatly y-t-y for pitchers. But FIP doesn’t regress this by any means. We got xFIP for that. Do you want to take HRs out of WAR?

http://www.fangraphs.com/blogs/on-context-or-evaluating-hitters-and-pitchers-differently/

This is a great tool, and the fact that I’m about to suggest one improvement to it doesn’t diminish in the slightest my appreciation for your having put it there to begin with. Here’s that improvement: it would be really nice to be able to put in, on one of the axes, not merely a defined stat, but a simple arithmetic operation relating two stats. Example: for years I have wondered whether pitchers with unusually high HBP rates have other oddities in their performance, and whether high HBP rates track over the course of a career. (I’m pretty sure the answer to that second one is yes.) There is no HBP% stat in your list, but it’s easy enough to define one by just taking the ratio HBP/PA. Add the ability to take such ratios (or products) in defining the stats for the plot, and we’ve achieved analysis nirvana…

• Great idea! I think that would have to be more of a redesign than a quick change. In the meantime, you can just download the spreadsheet (green icon at the bottom of the Web App) and pick out some column on the “Data” sheet that you don’t care about. Then you just would enter a formula on that whole column (and change the name at the top of the column to whatever you want). But in a bit, I’ll add HBP% and whatever other suggestions I can accommodate.

5. Mac says:

Ctrl+Shift+B

6. TKay says:

WOW. Thank you!

7. Bill says:

This is really spectacular. I’ve been thinking of trying some modeling on pitcher success in year n+1, and these year to year correlations make picking between correlated stats (e.g. swinging strike, K/9, K %) a lot easier.

8. cass says:

I had forgotten about SIERA. Seems it holds up quite well even against the more recent estimators. Most predictive stat easily found on FanGraphs, it would seem.

Of course, this made me look up the 2013 leaderboard for SIERA with Matt Harvey atop it.

I do wonder, though. I notice that Drew Storen fares very well by SIERA (and xFIP), yet had a poor year last year. I saw an article showing the location of his pitches before and after coming back from demotion to AAA and it was clear that beforehand, he was throwing tons of fastballs over the heart of the plate. I can’t help but think this pitch f/x data could be used to make a better predictor. I know there’s been work with edge%, but what about just identifying terrible pitches likely to be clobbered in order to find some of these pitches who put up good peripherals but still allow a number of runs?

Article about Drew Storen’s pitch location in 2013: http://www.washingtonpost.com/blogs/nationals-journal/wp/2013/11/22/inside-drew-storens-turnaround/

• gump says:

most of the difference in ERA-xFIP for Storen comes from his unusually low LOB%

9. TKay says:

Contact% has a pretty strong correlation to next season ERA (.310)

I’ve been obsessing over Contact% because I think it’s a big factor in the Pirates signing of Volquez, and Liriano and Burnett before him.

• Indeed it does. Contact% and Z-Contact% also correlate [relatively] well with BABIP. That’s why I included Z-Contact% in those BERA formulas of mine.

10. Paul says:

If ERA is so hard to predict, then why not focus on predicting earned runs or innings pitched individually?

• Bip says:

Earned runs is function of both innings pitched and effectiveness (which is measured by ERA, and related rate stats.) So in order to guess ER, we have to have a guess for his ERA already.

Let’s put it this way: We predict a pitcher will give up 50 earned runs next year. Because he has had durability problems, we also think he will throw about 150 innings, for a 3.00 ERA. Next season he actually throws 210 innings and gives up 70 earned runs. So our IP estimate was way off, but we guessed his effectiveness perfectly. However, by missing on his durability, we also missed on his ER estimate.

Lets say the same guy pitched 200 IP with 50 ER. Our IP and ERA guesses were way off, but our ER guess was perfect. Did we do a good job of predicting? No, we made a guess of 50 ER and only by chance were we correct.

11. Man, “export” and “cor()” are going to get lonely on a lot of computers. Nice!

12. cass says:

A serious question: Why are we predicting ERA rather than RA?

Also: Why does Fangraphs feature ERA and not RA?

It’s known that to measure run prevention, RA is more reliable than ERA for a variety of known reasons. So why, coming up on 2014, are we even talking about ERA on a baseball analytics website? I know they’re close, but I’d much rather know RA than ERA. There is literally no reason to prefer ERA to RA. Can we just replace ERA with RA everywhere on this site? I’d be very happy with that.

• Summertime says:

The statement “there is literally no reason to prefer ERA to RA” is pretty tough to back up. You want to try that? Show how there is absolutely no way in which ERA if preferable?

• Brandon Firstname says:

It over-rates groundball pitchers, since there are roughly six times as many errors on ground balls than other batted ball types. I’ve always found this to be a flaw in ERA. Maybe I’ll examine this m closely later to get some exact effect numbers.

• Jay says:

2007-2013, MinIP 200:
469 pitchers
RA9-ERA
Top 10% FB% = .28
Top 25% FB% = .29
Average = .35
Top 25% GB% = .40
Top 10% GB% = .43

• Thanks for doing that, Jay! Definitely worth consideration, but not a drastic difference, I think.

I just did the same sort of thing for team RA9-ERA vs. team UZR (also 2007-2013):

Bottom 10% UZR = .41
Bottom 25% UZR = .39
Top 25% UZR: .30
Top 10% UZR: .32

The correlation between RA9-ERA and UZR was only -.38, btw, so it’s not an amazing predictor or anything. But the RA9-ERA vs FB% correlation in your sample is only -.28, so neither is that.

Anyway, it seems RA9’s ignorance of defensive differences is basically as big of a shortcoming as is ERA’s failure to appreciate the difference between GB and FB pitchers when it comes to errors.

• Would you guys be supportive of team UZR-adjusted RA9?

• Newcomer says:

The only problem with UZR adjustments is that you have to factor in team defense not just when the pitcher is on the mound, but when all of the pitchers are on the mound, but I still think there’s value in making the adjustment.

You can probably adjust RA9 with UZR and come up with a better number. It would just be inexact. I would probably prefer something like SIERA, which takes the batted ball distribution and weights it more or less to resemble a typical defense. There’s so much luck involved with turning BIP into outs (hence the very small spread of BABIP talent that takes such large samples to determine). But UZR-adjusted RA9 would be another angle to look at, and if it doesn’t agree with similar metrics, then you might see where something is up.

• Newcomer says:

The beauty of using FIP and xFIP is that they focus so precisely on the factors that pitchers really can control. With defense, sequencing, and the simple luck of whether the ball goes 3 feet to the left and thus within the range of your shortstop, there’s just so much noise in the RA9 (and ERA) data. Attempting to correct for some of the biases in that data and look for the signal within the noise is a good thing, but the vast majority of that signal is seen clearly, without so much noise, in the linear weights-based metrics that ignore what the defense actually does with the ball: FIP, xFIP, and SIERA (and the others).

I like RA9 as raw metric, with all of the noise in it, and then the other metrics for specificity. Any incongruities can then be examined more closely, and that’s where a UZR-adjusted RA9 could have a use. I wouldn’t replace RA9/ERA with uzrERA, but it could be a useful addition. I can’t remember, but isn’t that essentially what PZR is? I couldn’t tell you off the top of my head where you can find PZR…

• tz says:

Would you guys be supportive of team UZR-adjusted RA9?

Yes, absolutely. I’d also love to see a UZR-adjusted tRA, on the grounds that tRA seems to be the best descriptive stat for a pitcher’s influence on batted ball type, and UZR right now is the best descriptive stat on the fielders’ influence on those batted balls. The remainder of RA9 after those two items would be a descriptive stat of “sequencing plus noise”.

• Thanks tz. Ideally, it would just be the UZR of the team specific to when the pitcher was on the mound, I’m thinking. They talked about this idea a bit here: http://tangotiger.com/index.php/site/comments/war-for-pitchers-on-fangraphs

On the subject of the descriptiveness of tRA (or its ERA-scaled version, tERA), I thought I’d show the following list of same-year correlations with ERA:
wOBA 0.87
FIP 0.75
BERA 0.74
tERA 0.74
MBRAT 0.69
pFIP 0.68
SBERA 0.66
SIERA 0.62
xFIP 0.61
kwERA 0.56
K% -0.51
TIPS 0.45

I didn’t make it clear here, but batted ball profiles are central to my stats BERA and SBERA (as they are for MBRAT and SIERA). tERA doesn’t fare so well as far as prediction goes, though, as you can see in the tables in the article. I think that’s probably because LD%, while critical, has a ton of randomness to it — it’s less predictable than BABIP. Pitchers probably have only a tiny bit of influence over it (at least at the MLB level — MLB hitters could probably hit line drives off of me pretty much at will). Which brings us to the question — is the important thing what did happen, or what should have happened?

• cass says:

ERA relies on scorer judgement and does not describe what actually happened. If something deemed to be an “error” occurred, then everything after two outs is ignored even though it actually happened. The scorer basically makes guesses about what would have happened if that “error” hadn’t occurred.

And yet nothing happens if a fielder makes an extraordinary play. If a fielder robs a home run on an incredible catch, the pitcher still gets credit. So all the runners on base, the batter, and anyone who would have scored afterward all do not count as earned runs even though the pitcher had nothing to do with that.

Errors make the situation of separating fielding from pitching worse, not better. It’s always better to use RA than ERA. Often, the difference is minor, but it can add up over a season.

I’ll also flip this: Why on earth would you want to use ERA rather than RA? When expressing a preference for a convoluted stat that involves the judgment of the person scoring the game heavily, the onus is on the person preferring it. If we do not have a substantial reason to prefer ERA, then the default should be RA.

Also: I am rehashing things expressed many times by many people. There is a vast consensus in the sabermetric community that RA is superior to ERA. This is not something about which there is much disagreement at all. And yet we’re still using ERA. Why?

• You and Brandon make great points. Yes, there are big problems with using ERA, but I really don’t see how the problems with RA/9 aren’t just as big, if not bigger. I just don’t believe that the judgment of the scorers is so bad that we have to discount it entirely. Would the average of RA9 and ERA be an acceptable compromise? Perhaps some adjustment for fly ball and ground ball pitchers is also in order?

Really, I think what needs to happen is for a model involving HitF/X and/or Inside Edge fielding data (like in the new spray charts) to take precedence.

• Bip says:

I totally agree with cass. First of all, yes they both have the same problem, but they exist because ultimately runs allowed is what determines the outcomes of games. However, unearned runs count as much as earned runs in the game, so we should be trying to predict both.

Predicting ERA and RA will always be difficult because of factors like defense, but RA gives us a much more straightforward way of interacting with defensive metrics. Fielding metrics simply measure how efficient a defense is at converting balls in play into outs, and both errors and great plays are weighted appropriately with those measures.

If we predict a pitcher’s ERA, and then try to adjust it based on his defense’s expected defensive efficiency, you’re going to overestimate the ERA of players on defenses that are rangy and error-prone, and underestimate it for defenses that are sure-handed but not flashy.

• Newcomer says:

Steve, I don’t see any reason to include ERA in the discussion. ERA involves the scorer’s judgment only on those plays that are within the range of a fielder. If the fielder has poor range, then there is no “error opportunity.” Thus errors, and consequently ERA, are a flawed measure of defense and the interaction between fielders and pitchers. They introduce a systematic bias into the data, crediting pitchers who concede ground balls and pitchers who play in front of a rangy (read: good) infield. That’s before you even get into the discussion of whether the scorers are doing a good job a) judging whether or not a play would require “ordinary effort” and b) re-constructing the inning to ostensibly remove the influence of the error.

For more solid arguments, consult Tango’s and MGL’s sites (I’m not sure if MGL has yet written about this on his blog, but he’s definitely written about it in other places). The only reasonable pro-ERA argument is tradition and inertia, and if those are convincing arguments, then why bother with SBERA and xFIP when you can use pitcher Wins and RBIs?

• Newcomer says:

And because that sounded snarky, understand that this is good work you’re doing, and we appreciate it! Using ERA is not an egregious error, because it correlates so well to what we really want, which is RA9. It’s a bigger difference philosophically than it is numerically. It still does make a difference, however, especially over a larger sample (where the systemic bias will grow with the signal, as opposed to random noise that will be slowly overwhelmed by increased data).

So, good work! It’s a minor objection that doesn’t really detract from the quality of your contribution, but we hope to convince you (and the whole community) to just ditch ERA altogether. :)

• Thanks for your input, guys. Yes, ERA is not great, but neither is RA9, really. Clearly, some defenses legitimately do commit more errors on non-difficult plays than others, and it seems that should be taken into account in some way. Do you not think they could be combined in a way that partially covers each of their weaknesses?

For what it’s worth, the ERA estimators I tested just now (BERA, SBERA, SIERA, and FIP) all correlate very slightly better with RA9 than with ERA.

• Newcomer says:

Amen! RA9 is superior to ERA, which is a poor construct that tells us “how many runs the pitcher would have allowed, if an arbitrary selection of plays had been made instead of not made, and events that followed would have happened as the scorer guesses they would happen,” which is not something anyone should really want to know. I prefer “RA9″ instead of “RA” just to avoid confusion with the counting stat Runs Allowed.

13. Anthony says:

What is -K%? I’m surprised I’ve never seen this stat before consider how strong the correlation is (relative to other estimators).

• cass says:

It’s just negative K%. It’s to show that there is an inverse relationship between K% and ERA. As K% goes down, ERA goes up.

• Right, thanks cass. I should have explained that, sorry.

• Bip says:

It’s confusing, because many stats here are denoted with ‘-‘ to show that they are scaled so that 100 is average, and lower values are “better”. -K% seems like it would be that concept for K%.

• Yeah, I know what you mean. But the use of “-” in those types of stats doesn’t make a ton of sense to me, seeing as how “-” actually means something entirely different. Maybe stats like that need a symbol like “a” (for adjusted), or something uncommon, like “~”.

14. Pitnick says:

I’ve always wondered what the correlation would be if you picked a static number for all pitches (say, last year’s average ERA)

• someone says:

What do you mean? You mean set all pitchers to have an ERA of 4.00 and see how that correlates to their actual ERAs? In that case, mathematically it’s very easy to show that it would be 0.

• Pitnick says:

Really? How?

• Bip says:

I forget the math, but conceptually, a correlation measures a relationship. How much and how reliably do we see one observed value change in a predictable way relative to variation in our predictor. As SIERA goes up, we expect that observed next-year ERA will go up, in general.

In this case, 4.00 is our predictor, but it never changes, so you can’t find a relationship to it. The idea of correlation is to say that variation in a predictor explains a certain amount of variation in the result. Variation of 4.00 cannot be said to predict anything, because is does not vary.

• Pitnick says:

I see. Thanks.

• someone says:

Bip explained it well, but if you are interested in the math, look at the first formula for “r” here:
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#For_a_sample

In the numerator, let’s say league ERA is X. For every Xi you would have 4.00, but Xbar (the average of all the 4.00’s) would be the same value, hence Xi – Xbar is 0 for all i. Thus the value of r would necessarily be 0.

15. dang says:

This is the most awesome thing I’ve seen in a long time. I just spent hours playing with the different stats and just looking at correlations. Amazing. There’s a couple small things that are missing that I’d like to see – specifically SV%. SV and BS have a good correlation, and that makes sense because they are counting stats and the more save opportunities you allow someone to have, the more blown save opportunities they will have. I’d like to be able to compare SV% to things like FBv, k%, etc.

Regardless I could probably play around with this for days. Everything I’m doing is telling me a little bit more about the game which allows me to be more informed when I try to evaluate a pitcher. Really fantastic stuff, thanks.

• Thanks! Good idea!. I’m thinking I’ll upload a new version that replaces most of these counting stats with rate stats that make a lot more sense to compare.

16. Interested says:

Do you have any reading on SBERA? Can’t seem to find anything linked anywhere.

17. tangotiger says:

Runs allowed includes the contributions of the fielders. Therefore, it is not “descriptive” to the pitcher’s contributions, but rather, to the team’s contributions while the pitcher was on the mound.

FIP takes an agnostic position on all balls in park (as well as baserunning events, and sequencing of events).

WAR (at Fangraphs), by not taking an explicit position on all of that, can either be said to be:
(a) remaining agnostic
(b) presuming average for all players regardless of what happened

• tangotiger says:

We wouldn’t, for example, consider a pitcher’s Won-Loss record to be “descriptive” of his contributions would we? No, because it’s obvious how much of that is polluted by his team’s offense, fielding, and bullpen.

A pitcher’s ERA, while not as bad as his W/L record, still shares the same kind (if not degree) of pollution.

wRC+ also includes the contributions of the fielders (of the opposite team). It includes this primarily through BABIP, which we know has large year-to-year swings; while sometimes these swings are a result of batted ball profile changes (LD% and so on), other times these year-to-year swings are not predictive and a result of dumb luck (playing lots of teams with poor fielders, or balls just happening to fall in for hits, or whatever).

I know various commenters have been down this road before, and I am also well aware that I know much less about baseball than you, tango. So I’m probably wrong, but here goes.

It still strikes me as odd that Fangraphs attempts to disentangle the effect of pitcher ability vs. luck, but only somewhat (FIP is not as good at that as other stats [e.g. SIERA]), and only for pitchers (there’s no xBABIP-adjusted batter WAR).

The thing is, WAR isn’t agnostic–and no statistic could be. If WAR is an estimate of a player’s value, then we needs to consider what value it is estimating. Specifically, is it estimating the contribution of a player to a win as the game was played [sidenote: forget ERA, as everyone has noted, it is no good. Maybe some tweaked version of Win Probability Added would be better]–or is it estimating the contribution of a player if the game were replayed 1000 more times in controlled, neutral circumstances (under a frequentist paradigm). Descriptive WAR might be the former, predictive WAR the latter.

FIP estimates the pitcher’s value insofar as it is described by those events (H, BB, K). It exists in the space between those extremes. It subtracts out some of the events that are nonpredictive, like the fielding contributions of other players, but then fails to consider other things that are predictive (things which SIERA apparently does consider, because it’s more predictive).

I’m not advocating for a revolution, and I understand the idea that FIP-WAR is a useful tool, and all that. I just think it exists in a weird intermediate space where it regresses out some of the contributions of other players, but not others. If SIERA really is better at isolating a player from his surroundings, and we want to know specifically how good that player is independent of his surroundings, maybe we should use SIERA-WAR.

“WAR (at Fangraphs), by not taking an explicit position on all of that, can either be said to be:
(a) remaining agnostic
(b) presuming average for all players regardless of what happened”

in summary,
To a) I would say: the thing is, there’s no such thing as being agnostic–you are assigning skill and so how you partition the variance in outcomes to error as well as each player’s skill determines how we view the skill of each player. To say that we assume some amount of variance is due to error is implicitly to take that variance away from the player.

To b): we no longer have to assume that a player’s performance on balls in play were league average. We can actually look that up, and to some extent at least, isolate out how much the pitcher had to do with it (in terms of preventing line drives or hard fly balls, etc.) and how much was luck. Because of this, we can incorporate it into our model of WAR and thus get a better idea of whether a given sequence of events was due to the player or his teammates.

• tangotiger says:

How about pitch-framing skill? We know it exists. And yet, in the Fangraphs version of WAR, by not EXPLICITLY given Molina any value for it, it either
a) remains aganostic
b) presumes he’s league average at framing

We can do this about everything.

FIP does the EXACT same thing. By only considering the 30% of PA that excludes fielders, it’s saying “hey, Felix is great without his fielders”. And how about when he uses his fielders? Well, FIP remains agnostic.

WAR takes a position of either a or b, however, you, the reader, want to interpret that.

I choose to take it that it’s a, agnostic. WAR on Fangraphs simply says “I don’t know, and so, I’m not going to give him plus or minus on that particular performance; we don’t have that performance properly noted enough, and so, I’m not going to assume he was great, or terrible, with balls in play”.

Can’t we just leave it at that (until more evidence comes in)?

• Bip says:

WAR is not agnostic about how good the batters Felix faces are at working walks and avoiding strikeouts. It’s not agnostic about whether Felix has benefitting from pitch framing. It assumes he is 100% responsible for his walks and strikeouts, but he isn’t.

There is a line that has been drawn that says “pitchers doesn’t exert enough influence on anything on this side for them to be included in FIP/WAR, but they do exert enough influence on things on that side.” Obviously the line is not totally arbitrary. Obviously it is also useful, as drawing that line gives you a stat that is more useful than ERA. However, that line is usually taken for granted, and the elements of pitching are often oversimplified as being “in a pitcher’s control” or “out of a pitcher’s control.” I, for one, would like to see more discussion on the degrees of control, and some reasoning as to why the line was chosen where is was (ex. why is WAR not “agnostic” towards HR/FB rate?*)

*A little more about HR/FB rate. xFIP is the same as FIP, except xFIP is agnostic to HR/FB rate and FIP is not. However, FIP and xFIP are often represented quite differently here. FIP is said to be “descriptive” and xFIP “predictive”. How does becoming agnostic to BABIP and LOB% not make FIP predictive as well? How can FIP exclude some events and still be said to be a record of what happened, while xFIP cannot?

“Can’t we just leave it at that (until more evidence comes in)?”

I guess I would say no. I understand your opinion and can grasp the good reasoning behind it… But we know that pitch-framing is a skill. We know that pitchers+hitters generate lower BABIPs on pitcher’s counts. We know that, for example, Mr. Felix is good at getting batters into pitcher’s counts, and so some of his BABIP variation is attributable to that skill.

The evidence is here! Let’s bring it in. Let’s incorporate pitch framing and xBABIP and all the other neat stuff very smart people have discovered. Because SIERA is better at predicting pitcher ability than FIP. Because we can actually regress hitter BABIP to discover a hitter’s true ability.

I hate to keep banging this drum, but I think it’s important. When you regress the outcome of a play, there’s a certain amount of variance in that outcome (basically from K, the worst outcome, to HR, the best outcome). The variance can be ascribed to the batter, the pitcher, the fielders (including the catcher, let’s say Mr. Molina), and also luck (also known as the error term of a statistical model). When you ascribe the outcome of a certain kind of play only to luck, that is to say, the error term, you take away a certain amount of the variance from the players–specifically the pitcher and batter and fielders. It is a zero-sum game. Anything that isn’t luck as to be skill, and vice versa. And since we can now measure the luck vs. the skill — especially with pitch framing — let’s do it.

We know now that some of the stuff that FIP says is luck isn’t. For that reason, FIP is not agnostic. It is systematically underrating the awesomeness of King Felix, because he kicks ass in a way that FIP says is dumb luck. Because the variance in outcome is finite, FIP-WAR isn’t agnostic. It’s somewhere along the continuum of descriptive-vs-prescriptive, because it has to be. It can’t escape it by saying “I don’t know”; it explicitly assigns some things to error, like BABIP and pitch calls, which aren’t all error. Which we know aren’t all error. So I say–let’s improve these measurements.
____
(I’m totally surprised anyone responded to my wall-of-text rant, but thank you)

• Brandon Firstname says:

The note about Felix getting into pitcher’s counts is interesting. I wonder if we can look at which counts a pitcher allows balls in play in so that we can get smarter about what their babip should be.

This is definitely something that requires some further digging. Maybe we’ll finally be able to explain Matt Cain.

18. Charlie says:

Wow. Incredible. Because this is the season for greed, having one of these for hitters would be mind-blowing.

19. OK everybody, I just made an update that includes some of the stuff you’ve been asking for. Let me know if I just broke something, alright?

20. Tanner says:

Really awesome stuff Steve. Keep up the amazing work.

• Thank you very much! (Ditto for everybody else I forgot to thank earlier)

this tool is totally awesome and thank you.

22. reillocity says:

This is great, Steve. As neat as the one-to-one correlation tool is, the data file itself is awesome and essentially enables the downloaders to go even a step further and build their own multiple regression equations using Excel’s Regression feature in the Data Analysis tab.

23. teufelshuffle says:

Is there any way to include wOBA-against in the correlation? Why use different stats to credit pitchers and hitters for the same event, when runs scored is a zero-sum game between them?

• Yeah, I’ve always wanted to see wOBA against, too. I just now stuck it in there (and made it one of the defaults). Somebody may want to double check the numbers on it, because I calculated them myself.

Notice FIP correlates better with next-year ERA than does wOBA against. I imagine that has a lot to do with the unpredictability of doubles rates for pitchers. Of course, wOBA’s correlation to same-season ERA is extremely high, as it’s more descriptive.

24. Stephen Brown says:

Hey Steve,
Great work on this!
You may have already done this, but I made a macro for this workbook that when given a list of stats, spits out a sheet of all the correlations for those stats, instead of having to go through them one-by-one. You just input a stat for your “Stat 2″ cell, a respective year for that stat, and the list of stats you want, and it does the rest. If you want it just send me an email and I can send you either the file or just the code.

• Thanks Stephen!

You know, for my own use, what I tend to do is just insert rows above the data, and on each row I’ll show the correlation of a particular stat to whichever stat is in the same column. I just use fixed references for the aforementioned “particular stat” and copy it all the way across. Then I’ll sometimes copy all those correlations somewhere else and sort them in order of their absolute value. So it’s more efficient for looking at a lot of stats, but less efficient for customizing one particular comparison than what I did here. Plus it’s probably more boring, with no graph and a lot of numbers to comb through.

Anyway, yeah, I’d love to see your macro code, and I imagine others would as well, if you feel like posting it here. I’ll e-mail you, though.

• Stephen Brown says:

Per request:

Sub Stats_to_Include()
i = 2
Do
Sheets(“Main”).Cells(3, 3).Value = Sheets(“Loops”).Cells(i, 1).Value
‘Changes the variable to ith variable in the set
Sheets(“Main”).Cells(3, 3).Copy
‘Copies the variable name
Sheets(“Loops”).Cells(i, 2) _
.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
‘Pastes the variable name to the ith row of the first column
Sheets(“Main”).Range(“B11″).Copy
‘Copies the correllation value
Sheets(“Loops”).Cells(i, 3) _
.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
‘Pastes the correllation value to the ith row of the second column
i = i + 1
Loop While i < Sheets("Loops").Range("F1").Value
End Sub

You'll have to add a sheet named "Loops" with Row Headings as follows:
Stats to Include, Stats, Correllation

In cell F1 of this sheet add this formula:
=COUNTA(A:A)+1

Then just pick the stat and year you want to compare everything to on the main sheet, add the stats you want to include under the heading, and run the macro.

• Anthony says:

Hey Stephen, I’m very interested in your Macro, would you be able to help out with a problem I’m having? I set up the Loops Sheet included the headings, input the macro code, but I keep getting a syntax error in the code saying compile error: syntax error, any idea on how to get around this? Do I need to add the stats that I want first in the loops datasheet to have the macro work?

25. TedWilliams8 says:

Hey Steve – Great work, I really enjoy reading and learning from all your posts.

I was just wondering if you were feeling abitious if you could think about including xxFIP as introduced by Chris Carruthers at:
http://www.breakingblue.ca/2013/12/18/one-estimator-to-rule-them-all-xxfip-part-3/

Thanks again!

• Thanks! Well, I gave it a shot, but I can’t get my results to match up with his. I’m assuming he meant FB%, not FB in his formula — otherwise the results end up way out of whack. He uses two constants that it seems he doesn’t disclose (“c” and “C”), so I tried to infer those, but nothing really works. So, one of us messed up, or we’re using slightly different data, I guess. Anyway, the xxFIP I’m getting for his formula (darn, I came up with a stat I was going to call by the same name, btw) has a 0.367 correlation to the next year’s ERA (2007-2013), for what it’s worth. Same as kwERA, better than TIPS, maybe a little better than FIP, but probably worse than most of the other estimators.

26. GWR says:

would it be possible for the pitch f/x x movement values to be relative to the pitcher pitching hand so that negative means the pitch breaks towards the pitcher’s arm side while positive would means it breaks towards a pitchers glove side (this way pitches from right-handed and left-handed would look more similar). In the current state the horizontal movement is essentially being multiplied by a dummy variable for handedness. this i very obvious when looking at the correlation between the horizontal movement of a pitchers fast ball and there curve ball which is -.675 which looks like they are closely related but they are not.

• Great point. Yes, it’s a problem. I think in the not-too-distant future, I’ll make a version of this tool that deals with handedness splits, so I’ll try to include your suggestion on that. For the version here, though, the file size is pretty close to maximum capacity — I don’t think it can handle that many more stats.