“Pitchers who have higher fly ball rates allow fewer home runs per fly ball.”
I imagine that ground-ball machines like Tim Hudson have higher than average HR/FB rates because if they don’t keep the ball down on a certain pitch, it’s likely to go a long way. That’s why his xFIP and SIERA easily beat his FIP in 2010.
In yesterday’s article when you introduced the idea that a high K rate correlates to a lower BABIP and HR/FB, I didn’t see any mention of SwStr% or BB% as an additional factor. A simple thought experiment leads me to believe (as you’ve shown) that a pitcher’s ability to control the ball and induce swings and misses correlates with his ability to induce weak contact and to limit HR on FB. My question is, instead of using K rate, would you gain a more accurate projection of BABIP and HR/FB if you used the components that drive K rate, and not K rate itself?
It’s interesting that for a fair number of pitchers who outpitch their peripherals, like Jair Jurrjens, SIERA actually makes them look WORSE than xFIP does.
Matt, you really confused me with this sentence: “Since SIERA only looks at ERAs for high-strikeout-rate pitchers like Lincecum, it gives them credit for the run-prevention effects of the strikeout and for the lower BABIPs and HR/FBs they generate.”
Are you saying that SIERA actually factors in raw ERA numbers for certain pitchers? That doesn’t make sense at all to me.
Not on an individual level, but the formula is based on best fit of over 3,000 pitchers, but since pitchers with strikeout rates have (on average) lower ERAs than they would with league average BABIPs, the group of high-strikeout pitchers get good SIERAs.
There’s a mixture of guys who outpitch their peripherals. Jurrjens does better with xFIP, Wakefield does better with SIERA, Cain does a tiny bit better with SIERA. It all depends on the pitcher.
I like the idea that SIERA is pursuing, but I have to take issue with assuming that the correlations are linear or quadratic. For example, in the extreme case where a pitcher has a SO/PA = 1, his ERA would be 0. Rather than having a good SO/PA ratio contribute “negative” ERA and subtract from a constant, a SO/PA ratio of less than 1 should add to ERA. A more meaningful fit would obey distinct boundary conditions.
That being said, it’s likely that the current method is a good approximation of the “true” correlation equations for common ratios. I sincerely hope for the best for SIERA and I look forward to seeing how it stacks up against the other ERA estimators.
I kind of rushed this post, could’ve made it more clear:
Instead of deducing or projecting BABIP and HR/FB from K rate, why not use more granular composite factors, things that make up K rate, to get a more accurate projection?
I agree, but I’d imagine that the K rate data is easier to obtain and work with than the components that drive it. Furthermore, you have to make sacrifices at some point by limiting your number of parameters or else your fit becomes meaningless.
I’ve found that things like swinging strike rate are useful in small sample sizes if you want to deduce strikeout rate (so they’re perfect for analyzing the first half performances of rookies for example, or checking if a pitcher is injured or something), but they stop telling you more than strikeout rate itself tells you directly as you get more data on a pitcher. Strikeout rate predicts strikeout rate better than swinging strike rate predicts strikeout rate, once you have a year or two of data under your belt.
Yeah, any of these estimators will fall apart at the boundary. FIP, xFIP, and SIERA all are negative if you strike out 100% of batters faced. In case it wasn’t clear, strikeout rate still decreases ERA at the extremes, just by less when strikeouts are already super high and there are fewer runners on base.
Yes, but from your model it appears that a strikeout rate of .9 is actually better than a strikeout rate of 1. I understand that these ratios are nothing like anything we ever see and that the model behaves appropriately in the common range. I excuse xFIP because it’s purpose is to be extremely simple to calculate. If you’re going to toss simplicity aside and go completely for accuracy, the model should be meaningful over the whole spectrum of possible values.
To be clear, I’m not saying that’s what SIERA should do – you obviously have to take shortcuts at some point. My point is that a choice other than a quadratic fit (perhaps something logarithmic) would have more meaning. I would hope that this extra meaning would imply better predictive powers.
Still, this is good work and I’m looking forward to parts 3-5.
if you think about it, it’s the logical corollary to “The more ground balls a pitcher allows, the easier they are to field.”
basically, for pitchers who are at the extremes — either extremely good at getting batters to hit above the ball (GB%) or extremely good at getting them to hit below the ball (FB%) — the “distribution” of their outcomes will skew to the “easy outs”, compared to pitchers with skills closer to the median.
it makes total sense to me — an extreme flyball pitcher will not only induce MORE fly balls, but more WEAK fly balls. And IMHO it’s one of the biggest failings of more linear estimators like xFIP which throw out a chunk of data…. they are going to work very well as a predictive tool for the center of the distribution, but start to break down more severely with outlier types.
Re: relievers and their HR/FB rates— how about Carlos Marmol. Guy has given up 5 home runs since the start of 2009, and only ~2.7% of his flyballs have left the park.
Ah, yeah, that’s true. If you have BB=0 & GB=FB, then SIERA would technically start going up at 85% Ks. But anything above 85% Ks and SIERA is negative anyway, so like you said, minor issue but fair point.
I have a question for anyone that wants to help out. I feel like SwStr% should be exactly the same as 100% minus Contact%. Because wouldn’t the number of pitches that a batter doesnt make contact on be the same as the number of pitches that a batter swings and misses? This is just something that i cannot seem to sort out in my head. Thanks to anyone that answers this.
Interesting. That I can definitely believe – that previous K rate is a better of predictor of K rate when your sample is big enough. But I still think the essence of what we are trying to capture – the ability to induce the batter to put the NON sweetspot of the bat on the ball, or if connected on the sweetspot, a propensity for the contact point to be a high/low miss radially on the barrel (grounders/popups) – is deeply connected to: making the batter swing and miss, and controlling the baseball. Now, maybe K rate really is the best, most practical summary of those two skills (control and whiff), it’s obviously a very good one, but I just have the feeling that these more granular components can be measured with the tools we currently have that would allow us more insight on BABIP and HR/FB.
Thank you. That makes it much easier… SwStr% is basically what percentage of strikes were due to swing and misses. And contact% is what percentage of swings that result in contact.
Hey Matt- What’s the justification for suddenly adding in BB^2 and SO/BB^2? They weren’t statistically significant in the BP version, what’s the p-values on those terms now? And what’s the theoretical justification as well now?
Yeah, the conventional wisdom is that when a ground-ball pitcher misses up, it goes into the seats (often as a line drive); when a fly-ball pitcher misses up, it becomes a pop-up (and thus an out).
BB^2 had a p-value of about .06 or something IIRC. BB*SO wasn’t significant, but the sample size would limit it. They both came from the fact that I realized what was generating most of the non-linearity was the fact that base-runners are more damaging when you have more of them on base. So high K means less damage from an additional BB. More BB means another BB is more damaging.
Yeah, this was bothering me as well. Do you have a theoretical or empirical basis for using a power of two in your non-linear components, rather than say 1.8 or 2.2? The Pythagorean RS/RA equation is known to be better approximated by something other than squares, but people use a power of two to keep a simple calculation simple. That’s clearly not a consideration in SIERA, so a suspiciously simple integral power seems open to questioning.
The way I sorted it out in my head is
SwStr% is “pitches in the strike zone that are actually swung at”
Contact% is “pitches swung at that are actually hit”
The goal is to make the effect non-linear; in other words, to make it such that ground balls can have different effects for pitchers who throw sinkers for example. A quadratic effect simplifies it to the extent that it makes the derivative linear. I’m not sure there is much value in messing with the exponent.
Have you tested any of the independent variables or even the dependent (SIERA itself) to see if they’re normally distributed? Have you thought about increasing the accuracy by either modeling a possible skew stable model to increase R^2 further, or non-parametrically with kernel regression or a neural network? Even at a sample of 3000, you would have ample sample size to test non parametrically, with increased R^2, although no specific coefficients obviously…
“Pitchers with more strikeouts have lower BABIPs.”
While statistically this may appear true, I have to wonder if GB% has a higher correlation with BABIP then K%. Strikeout pitchers tend to allow more fly balls, which tend to produce lower BABIP’s, while low strikeout guys tend to be more groundball guys, which produce higher BABIP’s. I wonder if it’s really fair to give a guy a BABIP benefit for K%, as opposed to giving him a BABIP bonus for having a lower groundball rate (which may be more accurate).
“Pitchers who have higher fly ball rates allow fewer home runs per fly ball.”
Would it be more accurate to treat OFFB, and IFFB as seperate entities? and use HR/OFFB? A high flyball pitcher who posts lowish IFFB rates (which means more fly balls in the outfield, and thus more home runs/fly ball) shouldn’t be getting the same benefit that a pitcher who regularly posts high IFFB rates should.
I’m guessing that those who have a high FB rate also have a disproprtionately high IFFB rate. That is, many of their additional FBs are infield fly balls which are turned into outs 99.9% of the time. This, in turn, will have an effect on their HR/FB rate since IFFBs become HRs 0% of the time.
Slash – the reason high K relates to BABIP is because it suggests a level of stuff (ability to move the ball off the barrel of the bat) to induce a large number of weak-average strength groundballs (which are easier to defend).
A guy like Halladay makes his living off of being able to allow guys to hit the ball, but taking away nearly all their power.
Guys that don`t strike batters out and have a lot of ground balls against run the risk of the Josh Towers effect (being able to make good pitches which induce ground balls but having them all go through because the batters are still getting solid wood on the ball).
Thanks for the suggestion. Most baseball statistics are normally distributed if you limit the IP or PA restriction enough. ERA is pretty much normally distributed over 40 innings, enough that this would be a go.
I only have some exposure to non-parametric regression, but I’m curious what about this data makes you think kernel regression or anything non-parametric would be necessary. I figure that ERA should be monotonic with respect to each of GB, BB, and SO, but I admit that I don’t know enough about non-parametric regression to know. I do know that QERA by Nate Silver effectively assumes a log-normal distribution of ERA in its modeling and is less accurate. That’s not non-parametric, but maybe in the direction of what you’re thinking?
Yes, that would do better if I was trying to model same-year ERA, but what I’m trying to pick up with the net ground ball term ((GB-FB)/PA) is just a general angle off the bat skill tendency. Some pitchers are better at inducing IFFB/FB but those pitchers are often the ones that get higher FB% in general, so in general I’m trying to pick up a skill level as shown by angle off the bat.
I’m not sure how ERA or SIERA is actually distributed, so I don’t know how much more effective any nonparametric model might be. Although with a large enough sample size, you could probably increase R^2, although you would lose confidence intervals and coefficients for the specific variables- NP is essentially a black box, where you’d just get the output.
If you run the SIERA results through a residuals histogram, that will give you a sense if your results are distributed normally across the regression (basically bell curve) or not. If they’re not, then you may get different effective results at farther ends of the spectrum. In which case, a nonparametric model might help.
Even if that’s not the case, non parametric essentially goes with the grain of the wood, as it were, so with a large enough sample size (which it sounds like you may have) you might still get an increased R^2 over trying to superimpose a normal distribution (which it may be close to, but not exact)… You’re doing some excellent work, I was just curious if you’d thought to try it out that way… your inputs seem great, this is just a possible tweak in the modeling, which you may not need…
Okay. I just checked and SIERA, ERA, and (SIERA-ERA) are all pretty much normally distributed with at least 40 IP. Definitely a very sharp unimodal distrubtion, albeit not completely symmetric I don’t think. I guess since I’m not really trying to max out R^2 directly as much as develop something that would work out of sample (but in a similar run environment), it’s probably not worth the investment of learning non-parametric estimation, but I do appreciate the out-of-the-box idea.
Kinda wonder about Glavine in terms of BABIP too. He was a moderate GB pitcher, but still had a .285 BABIP on a team (not league) with a .291 BABIP.
He gave up a higher % of LD than an average MLB pitcher, but his BABIP on GB was much, much better than average…what you would expect from an extreme groundballer.
With lots of LD, few Ks and a good, but not great GB rate, he should have had a terrible BABIP, but he still saved 70-80 hits compared to his mates.
I wonder how many “exceptions” there are to these guidelines. Enough to go back to the drawing board or too few to worry about changing the construct.
Increasing the R^2 should increase the out of sample results as well. But it totally sounds like your #’s are pretty close to normal, seems like you’re doing it right anyways, your results are coming in better than QERA and XFIP. I’m loving your series, really glad someone’s going through the process and explaining the steps. Thanks for the amazing efforts!
I did some research on Glavine. He threw a ton of change-ups and most of them low and away..both things that tend to coincide with weak contact. Of course he gave up his fair share of FB’s too, which is surprising. In fact, despite having a good career GB rate, there were a few years in which he gave up more FBs than GBs.
But the biggest clues seems to be in his splits: Glavine’s BABIP was about the same as his mates with nobody on, with batters leading off innings, when ahead in a game, when the game was a blow-out, etc. All the times in which it makes sense to go right after hitters.
His BABIP was far, far better when he had little run support, had runners in scoring position, was behind, and when the game was close. These were all situations in which his BB rate famously and drastically increased too. Not a coincidence.
Seems pretty obvious that he expanded his zone in these situations, causing batters to make weaker contact. Glavine’s BABIP seems very situational based, which explains why he had a ho-hum BABIP in normal situations and “only” 5-6 points better than his mates overall..
I love finding the mysteries behind the outliers.
Maddux’s BABIP compared to mates, on the other hand, just seems to be a case of an extreme GBer who’s BABIP on GB was much, much better than expected from a GB pitcher.
I don’t like the use of examples, because single examples are meaningless. I guess you are trying to use them to explain what you are saying, which is ok, but they are not evidence to support the claims.
• Pitchers who have higher fly ball rates allow fewer home runs per fly ball.
Example: Matt Cain
With a career 45% fly ball rate, Matt Cain is among the best at keeping his fly balls in the yard. He gives up mostly infield flies and shallow outfield flies, which is why his career HR/FB is just 6.8%. SIERA assumes that pitchers who allow more fly balls have below-average HR/FB rates, and that’s exactly what happens.
I thought average HR/FB was 10%? Cain is a FB pitcher, and his rate is 6.8%. Should it read “SIERA assumes that pitchrs who allow more fly balls have ABOVE-AVERAGE HR/FB rates, and that’s exactly what happens”?
As I was typing that I realized you might mean below average in number, not in rank.
Several previous posters have mentioned IFFB’s. I’d like to bring them up again but suggest a different approach. No one has seriously questioned that IFFB’s should be included in FB’s. I’d like to propose that they are not merely FB’s that are extremely easy to field but that they actually have more in common with strikeouts than they do with OFFB’s. Certainly that’s true in so far as the result is concerned. I would argue that a ball hit almost straight up in the air reflects as much of a failure on the hitter’s part as a strikeout does. So why not treat IFFB’s as K’s and keep them out of your FB data entirely? Then the HR/FB stat would be much more meaningful. More importantly, BABIP* would become less GB/FB dependent because, just as every GB has a chance to become a hit, so too would every FB. In short, every ball would really be “in play”, excluding the virtually automatic outs.
See my response above to slash12 on the same topic. I’ll pose a question back– what is the goal of doing an ERA Estimator in the xFIP/SIERA mold instead of FIP or just using regular ERA? If you don’t include HR as a variable, you shouldn’t include IFFB either. They’re just the two extremes on the “fieldability of fly balls” spectrum.
For murdering the offense in baseball, I hope you get arrested by banditos who say things like “We don’t have to show you no stinkin’ badges!” So I will not “treasure” SIERA.
To think that I endorsed Eric’s book. You guys should be sentenced to play Strat-o-matic with nothing but dead ball era teams for the rest of your natural lives. Have fun matching Mordecai Brown against Christy Mathewson. Enjoy all the popouts and GBAs.
I really like all these articles matt, but did you find that flyball pitchers might have had a survivor bias though? What I mean by this is that if a pitcher had a really high flyball rate, to remain as a major league viable pitcher wouldn’t they have to have a lower than average hr/fb rate and most likely higher than average k rate a la matt cain to remain in the major leagues? because if not, their era’s could jump from a little bit above league average to probably replacement level, no? also could you see this effect also maybe play out in groundball pitchers as well… thank you for your time
While I am in general a big fan of linear regression, I look at some of the variables used to arrive at SIERA, and I cannot help but think there must be a great deal of co-linearity, that is variables that are correlated at .7 or higher though there is no hard and fast rule, in your variables. Typically, this is viewed as problematic for linear regression, since it essentially uses the same variable twice.
Did you check for this problem? If so, what were the results?
Jacob Smith says:
July 19, 2011 at 1:09 pm
Excellent work, look forward to the rest of it.
Ryan says:
July 19, 2011 at 1:11 pm
“Pitchers who have higher fly ball rates allow fewer home runs per fly ball.”
I imagine that ground-ball machines like Tim Hudson have higher than average HR/FB rates because if they don’t keep the ball down on a certain pitch, it’s likely to go a long way. That’s why his xFIP and SIERA easily beat his FIP in 2010.
Telo says:
July 19, 2011 at 1:12 pm
In yesterday’s article when you introduced the idea that a high K rate correlates to a lower BABIP and HR/FB, I didn’t see any mention of SwStr% or BB% as an additional factor. A simple thought experiment leads me to believe (as you’ve shown) that a pitcher’s ability to control the ball and induce swings and misses correlates with his ability to induce weak contact and to limit HR on FB. My question is, instead of using K rate, would you gain a more accurate projection of BABIP and HR/FB if you used the components that drive K rate, and not K rate itself?
Tom B says:
July 19, 2011 at 1:26 pm
I’d like to see some analysis of the following:
AJ Burnett’s ERA is almost 2 runs LOWER when he walks 3 batters than when he walks less than 3. Discuss.
RC says:
July 19, 2011 at 1:34 pm
“Pitchers who allow less contact see weaker contact from hitters. ”
Less homeruns != weaker contact.
AustinRHL says:
July 19, 2011 at 1:42 pm
It’s interesting that for a fair number of pitchers who outpitch their peripherals, like Jair Jurrjens, SIERA actually makes them look WORSE than xFIP does.
Matt, you really confused me with this sentence: “Since SIERA only looks at ERAs for high-strikeout-rate pitchers like Lincecum, it gives them credit for the run-prevention effects of the strikeout and for the lower BABIPs and HR/FBs they generate.”
Are you saying that SIERA actually factors in raw ERA numbers for certain pitchers? That doesn’t make sense at all to me.
Person says:
July 19, 2011 at 1:43 pm
I’m wondering what some people might say about using Lincecum and Cain to demonstrate points about HR/FB…
Matt Swartz says:
July 19, 2011 at 1:45 pm
They also allow lower BABIP though.
Person says:
July 19, 2011 at 1:46 pm
Your first point applies to John Lannan as well, and as you suggested many others as well.
Not sure what it all means though.
Matt Swartz says:
July 19, 2011 at 1:48 pm
Not on an individual level, but the formula is based on best fit of over 3,000 pitchers, but since pitchers with strikeout rates have (on average) lower ERAs than they would with league average BABIPs, the group of high-strikeout pitchers get good SIERAs.
There’s a mixture of guys who outpitch their peripherals. Jurrjens does better with xFIP, Wakefield does better with SIERA, Cain does a tiny bit better with SIERA. It all depends on the pitcher.
Jon says:
July 19, 2011 at 1:55 pm
His curveball is nastier but more wild? So he gives up less hits but a few more walks?
novaether says:
July 19, 2011 at 1:55 pm
I like the idea that SIERA is pursuing, but I have to take issue with assuming that the correlations are linear or quadratic. For example, in the extreme case where a pitcher has a SO/PA = 1, his ERA would be 0. Rather than having a good SO/PA ratio contribute “negative” ERA and subtract from a constant, a SO/PA ratio of less than 1 should add to ERA. A more meaningful fit would obey distinct boundary conditions.
That being said, it’s likely that the current method is a good approximation of the “true” correlation equations for common ratios. I sincerely hope for the best for SIERA and I look forward to seeing how it stacks up against the other ERA estimators.
Telo says:
July 19, 2011 at 1:57 pm
I kind of rushed this post, could’ve made it more clear:
Instead of deducing or projecting BABIP and HR/FB from K rate, why not use more granular composite factors, things that make up K rate, to get a more accurate projection?
novaether says:
July 19, 2011 at 2:00 pm
I agree, but I’d imagine that the K rate data is easier to obtain and work with than the components that drive it. Furthermore, you have to make sacrifices at some point by limiting your number of parameters or else your fit becomes meaningless.
Matt Swartz says:
July 19, 2011 at 2:03 pm
I’ve found that things like swinging strike rate are useful in small sample sizes if you want to deduce strikeout rate (so they’re perfect for analyzing the first half performances of rookies for example, or checking if a pitcher is injured or something), but they stop telling you more than strikeout rate itself tells you directly as you get more data on a pitcher. Strikeout rate predicts strikeout rate better than swinging strike rate predicts strikeout rate, once you have a year or two of data under your belt.
Matt Swartz says:
July 19, 2011 at 2:05 pm
Yeah, any of these estimators will fall apart at the boundary. FIP, xFIP, and SIERA all are negative if you strike out 100% of batters faced. In case it wasn’t clear, strikeout rate still decreases ERA at the extremes, just by less when strikeouts are already super high and there are fewer runners on base.
Hoof says:
July 19, 2011 at 2:11 pm
“Pitchers who have higher fly ball rates allow fewer home runs per fly ball.”
I wonder if this is because any pitcher with a high fly ball rate and a high HR/FB ratio would not be in the Major Leagues in the first place.
novaether says:
July 19, 2011 at 2:18 pm
Yes, but from your model it appears that a strikeout rate of .9 is actually better than a strikeout rate of 1. I understand that these ratios are nothing like anything we ever see and that the model behaves appropriately in the common range. I excuse xFIP because it’s purpose is to be extremely simple to calculate. If you’re going to toss simplicity aside and go completely for accuracy, the model should be meaningful over the whole spectrum of possible values.
To be clear, I’m not saying that’s what SIERA should do – you obviously have to take shortcuts at some point. My point is that a choice other than a quadratic fit (perhaps something logarithmic) would have more meaning. I would hope that this extra meaning would imply better predictive powers.
Still, this is good work and I’m looking forward to parts 3-5.
batpig says:
July 19, 2011 at 2:19 pm
if you think about it, it’s the logical corollary to “The more ground balls a pitcher allows, the easier they are to field.”
basically, for pitchers who are at the extremes — either extremely good at getting batters to hit above the ball (GB%) or extremely good at getting them to hit below the ball (FB%) — the “distribution” of their outcomes will skew to the “easy outs”, compared to pitchers with skills closer to the median.
it makes total sense to me — an extreme flyball pitcher will not only induce MORE fly balls, but more WEAK fly balls. And IMHO it’s one of the biggest failings of more linear estimators like xFIP which throw out a chunk of data…. they are going to work very well as a predictive tool for the center of the distribution, but start to break down more severely with outlier types.
Jack Nugent says:
July 19, 2011 at 2:21 pm
Re: relievers and their HR/FB rates— how about Carlos Marmol. Guy has given up 5 home runs since the start of 2009, and only ~2.7% of his flyballs have left the park.
Matt Swartz says:
July 19, 2011 at 2:23 pm
Ah, yeah, that’s true. If you have BB=0 & GB=FB, then SIERA would technically start going up at 85% Ks. But anything above 85% Ks and SIERA is negative anyway, so like you said, minor issue but fair point.
fakdaddy says:
July 19, 2011 at 2:24 pm
I have a question for anyone that wants to help out. I feel like SwStr% should be exactly the same as 100% minus Contact%. Because wouldn’t the number of pitches that a batter doesnt make contact on be the same as the number of pitches that a batter swings and misses? This is just something that i cannot seem to sort out in my head. Thanks to anyone that answers this.
Telo says:
July 19, 2011 at 2:29 pm
Interesting. That I can definitely believe – that previous K rate is a better of predictor of K rate when your sample is big enough. But I still think the essence of what we are trying to capture – the ability to induce the batter to put the NON sweetspot of the bat on the ball, or if connected on the sweetspot, a propensity for the contact point to be a high/low miss radially on the barrel (grounders/popups) – is deeply connected to: making the batter swing and miss, and controlling the baseball. Now, maybe K rate really is the best, most practical summary of those two skills (control and whiff), it’s obviously a very good one, but I just have the feeling that these more granular components can be measured with the tools we currently have that would allow us more insight on BABIP and HR/FB.
batpig says:
July 19, 2011 at 2:39 pm
Matt — How does SIERA handle pitchers who split innings between the bullpen and rotation? Corey Leubke is an obvious example for this season.
novaether says:
July 19, 2011 at 2:40 pm
SwStr% = strikes swung and missed / strikes
Contact% = strikes swing and hit / strikes swung at
They have different denominators. In other words, SwStr% includes strikes looking.
fakdaddy says:
July 19, 2011 at 2:50 pm
Thank you. That makes it much easier… SwStr% is basically what percentage of strikes were due to swing and misses. And contact% is what percentage of swings that result in contact.
Temo says:
July 19, 2011 at 2:52 pm
Probably the case.
Josh says:
July 19, 2011 at 2:52 pm
Hey Matt- What’s the justification for suddenly adding in BB^2 and SO/BB^2? They weren’t statistically significant in the BP version, what’s the p-values on those terms now? And what’s the theoretical justification as well now?
Matt Swartz says:
July 19, 2011 at 3:07 pm
This is done by %, so Corey Leubke would get .367*(24 IP as SP)/(63 total IP) added.
joser says:
July 19, 2011 at 3:08 pm
Yeah, the conventional wisdom is that when a ground-ball pitcher misses up, it goes into the seats (often as a line drive); when a fly-ball pitcher misses up, it becomes a pop-up (and thus an out).
Matt Swartz says:
July 19, 2011 at 3:10 pm
BB^2 had a p-value of about .06 or something IIRC. BB*SO wasn’t significant, but the sample size would limit it. They both came from the fact that I realized what was generating most of the non-linearity was the fact that base-runners are more damaging when you have more of them on base. So high K means less damage from an additional BB. More BB means another BB is more damaging.
joser says:
July 19, 2011 at 3:18 pm
Yeah, this was bothering me as well. Do you have a theoretical or empirical basis for using a power of two in your non-linear components, rather than say 1.8 or 2.2? The Pythagorean RS/RA equation is known to be better approximated by something other than squares, but people use a power of two to keep a simple calculation simple. That’s clearly not a consideration in SIERA, so a suspiciously simple integral power seems open to questioning.
joser says:
July 19, 2011 at 3:22 pm
The way I sorted it out in my head is
SwStr% is “pitches in the strike zone that are actually swung at”
Contact% is “pitches swung at that are actually hit”
Matt Swartz says:
July 19, 2011 at 3:22 pm
The goal is to make the effect non-linear; in other words, to make it such that ground balls can have different effects for pitchers who throw sinkers for example. A quadratic effect simplifies it to the extent that it makes the derivative linear. I’m not sure there is much value in messing with the exponent.
Josh says:
July 19, 2011 at 4:01 pm
Matt-
Have you tested any of the independent variables or even the dependent (SIERA itself) to see if they’re normally distributed? Have you thought about increasing the accuracy by either modeling a possible skew stable model to increase R^2 further, or non-parametrically with kernel regression or a neural network? Even at a sample of 3000, you would have ample sample size to test non parametrically, with increased R^2, although no specific coefficients obviously…
novaether says:
July 19, 2011 at 4:03 pm
fakdaddy – correct
joser – not quite
When you say, “pitches in the strike zone that are actually sung at”, you’re describing Z-Swing%
slash12 says:
July 19, 2011 at 4:22 pm
“Pitchers with more strikeouts have lower BABIPs.”
While statistically this may appear true, I have to wonder if GB% has a higher correlation with BABIP then K%. Strikeout pitchers tend to allow more fly balls, which tend to produce lower BABIP’s, while low strikeout guys tend to be more groundball guys, which produce higher BABIP’s. I wonder if it’s really fair to give a guy a BABIP benefit for K%, as opposed to giving him a BABIP bonus for having a lower groundball rate (which may be more accurate).
slash12 says:
July 19, 2011 at 4:29 pm
“Pitchers who have higher fly ball rates allow fewer home runs per fly ball.”
Would it be more accurate to treat OFFB, and IFFB as seperate entities? and use HR/OFFB? A high flyball pitcher who posts lowish IFFB rates (which means more fly balls in the outfield, and thus more home runs/fly ball) shouldn’t be getting the same benefit that a pitcher who regularly posts high IFFB rates should.
Phils Goodman says:
July 19, 2011 at 5:12 pm
Makes sense to me. See ( http://www.hardballtimes.com/main/fantasy/article/introducing-hr-offb-park-factors/ ) for more on this.
chuckb says:
July 19, 2011 at 5:51 pm
What else is going on here? Does that mean his stuff is nastier and, therefore, he’s also striking out more when he walks more?
chuckb says:
July 19, 2011 at 5:55 pm
I’m guessing that those who have a high FB rate also have a disproprtionately high IFFB rate. That is, many of their additional FBs are infield fly balls which are turned into outs 99.9% of the time. This, in turn, will have an effect on their HR/FB rate since IFFBs become HRs 0% of the time.
Brian says:
July 19, 2011 at 6:25 pm
Slash – the reason high K relates to BABIP is because it suggests a level of stuff (ability to move the ball off the barrel of the bat) to induce a large number of weak-average strength groundballs (which are easier to defend).
A guy like Halladay makes his living off of being able to allow guys to hit the ball, but taking away nearly all their power.
Guys that don`t strike batters out and have a lot of ground balls against run the risk of the Josh Towers effect (being able to make good pitches which induce ground balls but having them all go through because the batters are still getting solid wood on the ball).
Matt Swartz says:
July 19, 2011 at 7:14 pm
Thanks for the suggestion. Most baseball statistics are normally distributed if you limit the IP or PA restriction enough. ERA is pretty much normally distributed over 40 innings, enough that this would be a go.
I only have some exposure to non-parametric regression, but I’m curious what about this data makes you think kernel regression or anything non-parametric would be necessary. I figure that ERA should be monotonic with respect to each of GB, BB, and SO, but I admit that I don’t know enough about non-parametric regression to know. I do know that QERA by Nate Silver effectively assumes a log-normal distribution of ERA in its modeling and is less accurate. That’s not non-parametric, but maybe in the direction of what you’re thinking?
Matt Swartz says:
July 19, 2011 at 7:17 pm
Yes, that would do better if I was trying to model same-year ERA, but what I’m trying to pick up with the net ground ball term ((GB-FB)/PA) is just a general angle off the bat skill tendency. Some pitchers are better at inducing IFFB/FB but those pitchers are often the ones that get higher FB% in general, so in general I’m trying to pick up a skill level as shown by angle off the bat.
Matt Swartz says:
July 19, 2011 at 7:18 pm
The point is that for a given GB%, the higher the K%, the lower the BABIP.
Josh says:
July 19, 2011 at 8:06 pm
I’m not sure how ERA or SIERA is actually distributed, so I don’t know how much more effective any nonparametric model might be. Although with a large enough sample size, you could probably increase R^2, although you would lose confidence intervals and coefficients for the specific variables- NP is essentially a black box, where you’d just get the output.
If you run the SIERA results through a residuals histogram, that will give you a sense if your results are distributed normally across the regression (basically bell curve) or not. If they’re not, then you may get different effective results at farther ends of the spectrum. In which case, a nonparametric model might help.
Even if that’s not the case, non parametric essentially goes with the grain of the wood, as it were, so with a large enough sample size (which it sounds like you may have) you might still get an increased R^2 over trying to superimpose a normal distribution (which it may be close to, but not exact)… You’re doing some excellent work, I was just curious if you’d thought to try it out that way… your inputs seem great, this is just a possible tweak in the modeling, which you may not need…
Matt Swartz says:
July 19, 2011 at 8:25 pm
Okay. I just checked and SIERA, ERA, and (SIERA-ERA) are all pretty much normally distributed with at least 40 IP. Definitely a very sharp unimodal distrubtion, albeit not completely symmetric I don’t think. I guess since I’m not really trying to max out R^2 directly as much as develop something that would work out of sample (but in a similar run environment), it’s probably not worth the investment of learning non-parametric estimation, but I do appreciate the out-of-the-box idea.
jim says:
July 19, 2011 at 9:50 pm
loving this series
Matthew Cornwell says:
July 19, 2011 at 10:14 pm
Kinda wonder about Glavine in terms of BABIP too. He was a moderate GB pitcher, but still had a .285 BABIP on a team (not league) with a .291 BABIP.
He gave up a higher % of LD than an average MLB pitcher, but his BABIP on GB was much, much better than average…what you would expect from an extreme groundballer.
With lots of LD, few Ks and a good, but not great GB rate, he should have had a terrible BABIP, but he still saved 70-80 hits compared to his mates.
I wonder how many “exceptions” there are to these guidelines. Enough to go back to the drawing board or too few to worry about changing the construct.
Josh says:
July 19, 2011 at 10:35 pm
Increasing the R^2 should increase the out of sample results as well. But it totally sounds like your #’s are pretty close to normal, seems like you’re doing it right anyways, your results are coming in better than QERA and XFIP. I’m loving your series, really glad someone’s going through the process and explaining the steps. Thanks for the amazing efforts!
Matthew Cornwell says:
July 19, 2011 at 10:53 pm
I did some research on Glavine. He threw a ton of change-ups and most of them low and away..both things that tend to coincide with weak contact. Of course he gave up his fair share of FB’s too, which is surprising. In fact, despite having a good career GB rate, there were a few years in which he gave up more FBs than GBs.
But the biggest clues seems to be in his splits: Glavine’s BABIP was about the same as his mates with nobody on, with batters leading off innings, when ahead in a game, when the game was a blow-out, etc. All the times in which it makes sense to go right after hitters.
His BABIP was far, far better when he had little run support, had runners in scoring position, was behind, and when the game was close. These were all situations in which his BB rate famously and drastically increased too. Not a coincidence.
Seems pretty obvious that he expanded his zone in these situations, causing batters to make weaker contact. Glavine’s BABIP seems very situational based, which explains why he had a ho-hum BABIP in normal situations and “only” 5-6 points better than his mates overall..
I love finding the mysteries behind the outliers.
Maddux’s BABIP compared to mates, on the other hand, just seems to be a case of an extreme GBer who’s BABIP on GB was much, much better than expected from a GB pitcher.
evo34 says:
July 19, 2011 at 11:33 pm
It’s called selecting a cutoff after the fact, a.k.a. “noise.”
jim says:
July 20, 2011 at 12:54 am
bizarre, but like evo says, that sounds like noise
Joe says:
July 20, 2011 at 12:16 pm
I don’t like the use of examples, because single examples are meaningless. I guess you are trying to use them to explain what you are saying, which is ok, but they are not evidence to support the claims.
Scott G says:
July 20, 2011 at 12:22 pm
Am I misinterpreting this, or is it a typo?
• Pitchers who have higher fly ball rates allow fewer home runs per fly ball.
Example: Matt Cain
With a career 45% fly ball rate, Matt Cain is among the best at keeping his fly balls in the yard. He gives up mostly infield flies and shallow outfield flies, which is why his career HR/FB is just 6.8%. SIERA assumes that pitchers who allow more fly balls have below-average HR/FB rates, and that’s exactly what happens.
I thought average HR/FB was 10%? Cain is a FB pitcher, and his rate is 6.8%. Should it read “SIERA assumes that pitchrs who allow more fly balls have ABOVE-AVERAGE HR/FB rates, and that’s exactly what happens”?
As I was typing that I realized you might mean below average in number, not in rank.
HELP??! haha
Matt Swartz says:
July 20, 2011 at 12:27 pm
Yeah, I meant low/good rates for HR/FB. Thanks for clarifying.
Evan Bruschini says:
July 20, 2011 at 12:36 pm
You can’t explain that.
Scott G says:
July 20, 2011 at 12:53 pm
Yea, I figured that’s what you meant after some thought. Thank YOU for clarifying.
Matthew Cornwell says:
July 20, 2011 at 4:49 pm
I am trying to find an exception and figure out how he beat the “system.” I am not disagreeing with the overall concepts or “claims” of anyone.
Cyril Morong says:
July 20, 2011 at 5:21 pm
“Almost immediately after rolling out the initial version in 2010, run scoring began to decline in baseball.”
So you guys caused the drop in scoring by teaching teams how to improve pitching. Thanks alot. But seriously, very interesting work
James M. says:
July 20, 2011 at 8:16 pm
Several previous posters have mentioned IFFB’s. I’d like to bring them up again but suggest a different approach. No one has seriously questioned that IFFB’s should be included in FB’s. I’d like to propose that they are not merely FB’s that are extremely easy to field but that they actually have more in common with strikeouts than they do with OFFB’s. Certainly that’s true in so far as the result is concerned. I would argue that a ball hit almost straight up in the air reflects as much of a failure on the hitter’s part as a strikeout does. So why not treat IFFB’s as K’s and keep them out of your FB data entirely? Then the HR/FB stat would be much more meaningful. More importantly, BABIP* would become less GB/FB dependent because, just as every GB has a chance to become a hit, so too would every FB. In short, every ball would really be “in play”, excluding the virtually automatic outs.
Matt Swartz says:
July 20, 2011 at 9:03 pm
See my response above to slash12 on the same topic. I’ll pose a question back– what is the goal of doing an ERA Estimator in the xFIP/SIERA mold instead of FIP or just using regular ERA? If you don’t include HR as a variable, you shouldn’t include IFFB either. They’re just the two extremes on the “fieldability of fly balls” spectrum.
Matt Swartz says:
July 20, 2011 at 9:04 pm
LOL, and thanks :)
Cyril Morong says:
July 20, 2011 at 9:49 pm
For murdering the offense in baseball, I hope you get arrested by banditos who say things like “We don’t have to show you no stinkin’ badges!” So I will not “treasure” SIERA.
To think that I endorsed Eric’s book. You guys should be sentenced to play Strat-o-matic with nothing but dead ball era teams for the rest of your natural lives. Have fun matching Mordecai Brown against Christy Mathewson. Enjoy all the popouts and GBAs.
sleepingcobra says:
July 22, 2011 at 4:56 pm
Ball gets thrown, ball gets hit.
Kyle Boddy says:
July 23, 2011 at 8:12 pm
Yes, exactly. This is precisely what I wondered. Is Matt controlling for park effects?
Jonathan Comack says:
July 27, 2011 at 5:03 pm
I really like all these articles matt, but did you find that flyball pitchers might have had a survivor bias though? What I mean by this is that if a pitcher had a really high flyball rate, to remain as a major league viable pitcher wouldn’t they have to have a lower than average hr/fb rate and most likely higher than average k rate a la matt cain to remain in the major leagues? because if not, their era’s could jump from a little bit above league average to probably replacement level, no? also could you see this effect also maybe play out in groundball pitchers as well… thank you for your time
Omikron says:
July 28, 2011 at 3:37 am
While I am in general a big fan of linear regression, I look at some of the variables used to arrive at SIERA, and I cannot help but think there must be a great deal of co-linearity, that is variables that are correlated at .7 or higher though there is no hard and fast rule, in your variables. Typically, this is viewed as problematic for linear regression, since it essentially uses the same variable twice.
Did you check for this problem? If so, what were the results?