# FanGraphs Baseball

1. really interesting; i’d love to see this on an xfip, sierra or tera basis, too. not sure how involved all of that is, from a math standpoint.

Comment by mike — October 6, 2011 @ 11:44 am

2. If you break down 2010 stats, a lot of the same pitchers have high xK’s that didn’t materialize.

I looked at the 92 pitchers, presumably that were innings qualified, and here were the top 10:

Name 2010 diff
Randy Wells -6.92%
Brett Cecil -6.34%
Hiroki Kuroda -6.23%
R.A. Dickey -6.03%
Clay Buchholz -5.99%
Carl Pavano -5.69%
Shaun Marcum -5.68%
Jaime Garcia -5.25%
Edwin Jackson -5.05%
Johan Santana -4.84%
Francisco Liriano -4.78%

Comment by Brian — October 6, 2011 @ 11:59 am

3. sorry. hit enter before I finished.

The “2010 diff” is the difference between 2010 K% and “expected K%” using the y = 2.2619x + 0.0163
formula from the author’s analysis.

Not sure what that means… I can’t think of similarities between Kuroda, Liriano, Pavano, Marcum, Wells, Jaime Garcia, etc.. Just thought I’d throw it out there.

Comment by Brian — October 6, 2011 @ 12:03 pm

4. A) I really liked this post. I thought it was well written and funny, and all around put together well.

B) Would a team philosophy of “pitch to contact”make a difference here? I’m only familiar with the Twins program, and they definitely have that strategy. Is it a coincidence that Liriano and Pavano (two pitchers that didn’t come up in the Twins system but have pitched there for 6 and 2 seasons respectively) are near the top? Could anyone else who knows say if the pitchers near the top of the list are from programs with similar philosophies?

Comment by TFINY — October 6, 2011 @ 12:09 pm

5. While the differential in expected and actual K% is interesting in and of itself, it seems to me the cause of the heteroskedasticity is actually the more interesting question. You talk about “fixing it” statistically (which of course you can do), but wouldn’t it be more interesting to predict it? You could run a heteroskedastic regression model, which doesn’t make any assumptions about the error structure and allows you to include predictors of both the outcome (K%) and the variance. I wonder what would predict it? Perhaps pitch selection, like “pitching backwards”? Perhaps some guys throw their highest SwStr% pitch early in the count and one of their lower SwStr% pitches once they get two 2 strikes? Something that could probably be easiest checked with the proper methodology.

Anyhow, interesting stuff.

Comment by Jeff — October 6, 2011 @ 12:09 pm

6. Seems to me the biggest factor you missed was BB%. Imagine a pitcher who took every count to 3-2, but had a league average SwStr% rate. He’d be an elite K% pitcher, but he’d also walk a boat load of dudes.

There should be a “nibble” factor for pitchers, which would complete the picture here.

Comment by Slartibartfast — October 6, 2011 @ 12:38 pm

7. I prefer this method as it differentiates between SwStr out of and in zone.

http://www.draysbay.com/2009/7/21/956509/updated-expected-strikeouts-based

The formula:
K%=(ClStr%*.9)+(Foul%*.5)+(InPly%*-.9)+(InZSwStr%*1.1)+(OZSwStr%*1.5)

Adjusted r^2 of 91.4 is extremely strong. I’ll be updating this for major starters at some point in the offseason as it’s a very nice look at which guys can be expected to take a leap forward next season.

Comment by Sandy Kazmir — October 6, 2011 @ 12:51 pm

8. I assume foul balls count as swinging strikes. Is there data out there for total foul balls and fouls/PA? Just looking at the list, I know Hamels gets a ton of foul balls because he rarely puts anyone away with the heater, using it to set up his change piece, which induces swinging strikes of its own. I’d want to cross check this list to a fouls/PA leader board.

Comment by DD — October 6, 2011 @ 1:21 pm

Peep this

Comment by Sandy Kazmir — October 6, 2011 @ 2:21 pm

10. This is a very interesting graph / list of players.

I look at the upper section of the list as guys who I would bet on improvement for next year… once I factor out all of the guys that throw less than 91-92ish?

The bottom of the list I would expect to not be quite as good as they were this year.. all the way up to -.20 roughly…

Comment by Jim Lahey — October 6, 2011 @ 4:18 pm

11. No way Nolasco on a should improve next year list…

Comment by Adam — October 6, 2011 @ 4:53 pm

12. Do the majority of sinkerballers reside in the upper half (excluding ultimate sinkerballer Justin Masterson)?

Comment by Choo — October 6, 2011 @ 7:42 pm

13. I don’t think its “pitch to contact” as is so often said, but more “don’t walk people.” But yes, I think it could make a difference.

Also, the philosophy of featuring a change-up as a strikeout pitch could have something to do with it. Obviously Liriano’s best pitch is his slider, but I believe (I haven’t looked this up) he threw more changeups this year.

Comment by Steven Ellingson — October 7, 2011 @ 12:38 am

14. Yes, this sounds awesome. Someone with more time than me should do this.

I have a feeling that the type of pitch getting the swinging strikes will be a significant variable, with the change-up leading to the higher variance.

Comment by Steven Ellingson — October 7, 2011 @ 12:40 am

15. I see a lot of Cardinals and Twins on here. Both teams which emphasize control. A logical philosophy in pitch selection might lead to pounding the zone with a sinker on a 2 strike count rather than a true ‘strikeout’ pitch, which is more likely to be a ball or fouled off, extending the at-bat.

Just a thought, anyways.

Comment by Voxx — October 7, 2011 @ 5:24 am

16. Having used SwStk% a lot in my articles, I could tell you that a lot of the pitchers who appear to be due for a K% spike are below average in getting called strikes. Since a strikeout counts the same whether it is the result of a swinging or called strike, I would think this is important. I know Kuroda specifically has a lower than average called strike rate each season, which likely is the primary cause of his lower than expected K%.

Statcorner.com has the ClStk% (called strike percentage) stat.

Comment by Mike Podhorzer — October 7, 2011 @ 9:37 am

17. 2 twins starters in the top 5 I now have reason for hope

Also your wife tolerates this? Does she have a sister…

Comment by adohaj — October 7, 2011 @ 11:26 am

18. WOW I just posted this on the forums, and their taking the credit

Comment by Carlcrawfordisawesome — October 12, 2011 @ 7:51 pm

19. Have you been drinking again, Jim?

Comment by eric — November 17, 2011 @ 6:20 am

20. One problem is that this should be plotted on a log vs log scale.
Why? Because a strikeout rate of 10% is 2X worse than 5% and 2X better than 20%. Thus the unit between 5% to 10% should be the same as 10% to 20%. I suspect that may eliminate the heterodescosity.

The bigger problem, however, may be the variance in this correlation. A high R^2 indicates a linear relationship. But if the variance is high, the predictability is more problematic, which is why the precision is lacking. By eye, I’m estimating a 1/2 log unit (10X) spread on the data. In other words, with a 10% swinging strikeout rate, the K-rate appears to show a 95% Confidence Interval from 12% to 24%. a 12% K-rate is a whole lot different from a 24% K-rate.

Comment by Matt Lee — March 8, 2012 @ 1:01 am

21. Also, the equation provided for the orange line is clearly not correct. At x=10% (the swinging K-rate), y=22.6353 (the K-rate), but the orange line is at around a 15% K-rate with a 10% swinging K-rate.

Comment by Matt Lee — March 8, 2012 @ 1:24 am

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: `<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> `