Community - FanGraphs Baseball


RSS feed for comments on this post.

  1. Cool stuff. A few questions

    Are you sure you aren’t reading too much into the 84 mph velocity issue? Is there any chance some of those may actually be misclassified sliders?

    Also, did you consider summarizing for all pitchers and using a multiple regression to determine which effects (and interactions) are significant? For horizonatal location / movement, you could use the absolute values to account for differences between left- and right-handers.

    If you used a multiple regression, you could also use relative importance metrics to tease out which terms are not just statistically significant but are actually meaningful, which could be interesting to see as well.

    I like what you’ve done here and hope to see more soon.

    Comment by jessef — September 12, 2012 @ 6:24 pm

  2. Nice research done here. Short question: where did you get data about pitch types?
    For instance here at Fansgraph i take some random batter, here goes his play log for 2011:

    There is no data about pitches. I can only have it for the whole game:

    But that’s not enough to analyze! The fact he batted 79% fastballs 2011-09-28 @LAA doesn’t say anything about HRs and swinging strikes among those. I’m totally lost here. Maybe that stats is available at fangraphs+?

    I would much appreciate if you could clarify this.

    Comment by mcuni — September 13, 2012 @ 5:38 am

  3. jessef,

    In the 84 mph group there were 184 pitches swung on and missed and only 3 hit for home runs. Of all the velocity groups I graphed, it did have the least number of pitches, so sample size might be an issue. But in terms of misclassification, I don’t think it matters. Regardless of how the pitches were classified, they were similar enough for mlbgameday to classify them together so it looks like there was an 84 mph pitch last season that was very effective.

    I have considered doing a multivariate regression analysis (and am still considering it). Logistically, the problem is the relationships between the predictors and outcome would probably have to be modelled piecewise rather than linearly (especially the horizontal movement). Its not a trivial problem, but I assure you I do see the value in a regression analysis and am working on it.


    I got all my data from the mlb gameday website:

    MLB gameday classifies each pitch thrown based on an artificial neural network that is trained during spring training (each year I think/hope). It also gives a confidence value to tell you how confident it is that it has classified the pitch right.

    In terms of data acquisition, I wrote a quick script that mined the xml files for all the games from 2008-2011. I just used the 2011 season for this analysis. Then I wrote a second script that parsed out the data i was interested from the game xml files. Took a few days to get everything working right, but all together, it wasn’t that bad.

    Hope that helps!

    Comment by Thomas Karakolis — September 13, 2012 @ 4:50 pm

  4. The right hander’s more horizontally breaking curve is effective to right handed hitters when well located, and meat when thrown to left hander’s–it has no intimidation effect and is coming right into their power. Which is why you do not see it thrown much to lefties. The vertical curve is effective to either side because a vertically breaking pitch is not in the plane of a normal swing and is therefore awkward to hit, unless it is inside and can be golfed. Which as you note it often is, about 140 yards.

    Still and all, a pitcher may make his living with 60-90% fastballs, but the same could never be said of curveballs. Familiarity will eventually get you creamed.

    Comment by james wilson — September 17, 2012 @ 2:32 am

  5. Thomas,

    I’d think that either 1) using absolute value of horizontal movement or 2) separating pitchers out by handedness should go a long way to solving the multiple regression problem.

    Comment by jessef — September 28, 2012 @ 7:05 am

  6. jessef,

    Using absolute value of horizontal movement would not be a correct approach since it would in effect make it seem as though a curveball that moves in on a right handed hitter is the same as a curveball that moves out on a right handed hitter. I have heard this suggestion before, and seen it posted elsewhere too, and could not disagree with it more strongly. Movement (and location for that matter) in the horizontal direction should be considered as a spectrum. Just because the frame of reference (or datum, or zero, or whatever you want to call it) is set as a “straight” pitch, does not mean you can for example treat pitches with -2 inches break the same as +2 inches break.

    Comment by Thomas Karakolis — September 29, 2012 @ 9:07 am

  7. …also, separating out pitchers by handedness is a probably a good idea. Unfortunately, we are dealing with a relatively small sample size of curveballs (especially compared to when I looked at fastballs). I think what I lose by including pitchers of both handedness in this analyzes, I more than gain back in increased statistical power.

    Comment by Thomas Karakolis — September 29, 2012 @ 9:10 am

  8. Wondering why you have completely ignored called strikes. Seems like half the power of the pitch might come from there and yet success might not stem from the same factors as it does for swinging strikes.

    Comment by Someanalyst — October 24, 2012 @ 3:36 pm

  9. … because that is Part II… my apologies…

    Comment by Someanalyst — October 24, 2012 @ 3:37 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *

Close this window.

0.137 Powered by WordPress