PitchFx
We’ve put up some PitchFx data on the site under a special PitchFx section for each pitcher. This is FanGraphs first real foray into PitchFx, so we’ll definitely be taking suggestions on this section and we’ll do our best to implement the ones we think make sense. We’d really like to make our PitchFx section as useful as possible.
A couple things to make note of:
-The averages are based on handedness. This was necessary for the horizontal and vertical movement averages.
-”FA” or fastball includes both the FF pitch type classification and the FA classification to keep things relatively steady from year to year.
-”FT” or 2 seam fastball includes both the SI and FT classifications for the same reasons.
David Appelman is the creator of FanGraphs.

19


David:
Another great addition to the site. Thanks much.
If this is the thread for PitchFX requests/suggestions, one thing I’d love to see is the ability for users to “tag” specific outings (or innings) with scouting or injury information.
The idea would be that if on a 4/18/09 Pitcher X left the game in the 6th inning with an oblique strain (or whatever), a user on the site could go in and note that information, and it would be displayed in some form in future graphs of velocity or movement.
Thanks again.
Hmm… I like this idea. This might be something I could look at implementing on a broader scale, not just for pitchfx.
I also agree with Kent. Are splits and situational stats out of the question for the regular stats?
But I like the pitch f/x section and used it today.
Splits are very much in the works. There are some regular stats where it just can’t happen like UZR, but that’s about it.
Great stuff as always, David!
Man, that is cool. I also like Kent’s suggestion — maybe our comments would have to go through an administrator’s approval first before being posted, so erroneous reports would be filtered out.
This is simply fantastic. Thanks.
Josh Kalk’s pitch f/x player cards had a lot of great info before the took them over to the Rays’ org. (Gotta be happy for him, but I really miss his work.) I don’t know what would be feasible so I’ll just list the first ones that come to mind.
-More accurate pitch type classifications than the raw pitch f/x data (for example, Johan’s “2-seamer” this year is a bizarre composite of his slowest fastballs and fastest change-ups.)
-Frequency of certain pitch types in certain counts.
-Pitch type frequency vs LHB or RHB.
More accurate pitch type classifications are probably not going to happen, unless someone wants to step in and write a new algorithm, because I will definitely not be doing that.
The counts thing is easy enough to do and so is the LHB/RHB. I think those two are probably some of the first things to get done.
The problem is, their pitch classification can be so far off as to become worthless. Here’s Ramon Ramirez so far:
48 FF classified 46 FF, 2 SI
9 possible variant FT classified 9 FF
4 egregiously obvious FT classified 4 FF
32 CH classified 8 CH, 1 FF, 23 SI
23 SL (actually a slutter of sorts) as 11 SL, 8 CH, 2 FF, 2 SI
I’m beginning to think there is no perfect one-size-fits-all algorithm, although I do think the best possible such algorithm could do a better job than theirs (there’s no excuse for not recognizing a pitch at 94.6 with a 238 spin axis as a 2-seamer)
I think the way to go is with algorithms for each pitcher. I think you can get very high accuracy (95%-98%) with about 5-10 minutes work on each pitcher. Getting to 99%+ accuracy takes a lot more time and is something that you’d want to do each offseason.
After kind of glancing at a bunch of players, the 2009 algorithm appears to be very different or at least more granular than the 2008 one, which really irks me because it makes it difficult to compare between seasons without doing some manual labor. For pitch types I really do prefer the BIS data since it’s at least mostly consistent.
For the pitch classification algorithms, I’m not opposed to it, it’s just something I’d need help with.
E-mail me re possible help.
Here’s an example of why the data you do have may be worse than none at all.
According to their algorithm, Justin Masterson’s platoon splits are:
FF: 16 to RHB, 44 to LHB
SI: 21 / 22
CH: 9 / 8
SL: 22 / 35
And the fascinated user would say, how about that, Masterson throws both his 2-seam sinker and his changeup equally as often to RHB and LHB.
The actual splits:
FF: 17 R, 57 L
SI: 27 / 9
CH: 3 / 10
SL: 21/ 33
In reality, he’s throwing the changeup mostly to LHB, like everyone else, and he’s been avoiding throwing the sinker to LHB with just about the opposite proportion.
how about a result from pitch types. where we could see the %s of balls and strikes, hits and outs recorded for the different pitch types. and sortable by left and right handed batters.
Max, actually, Santana DOES throw a 2-seamer and there are several articles with direct quotes from him discussing his developing and usage of the pitch. I have an article on him for BPro this week which ironically discusses in part how the 2-seamer looks like the mutant child of his four-seamer and changeup, but it is in fact a separate pitch, something I further confirmed from watching tape.
Cool. Didn’t realize that.
Still, it’s being mischaracterized by pitch f/x. Check his last start (4/18) at Broooks Baseball. It lists 18 2-seamers at an average of 86.92mph.
If you look at the speed/horizontal movement chart. You’ll see he didn’t throw a single fastball (4 or 2-seam) under 88 or a changeup over 83.
Checking the velo histogram for the “FT” pitch shows that it’s taking into account some of Santana’s faster changeups.
I’m guessing his real 2-seamer lies in the 88-90 range right?
Oh and looking forward to that article. Keep up the good work.
Release point. Make an over time graph based on release point. Helps with injuries.
I agree, this would be an important addition.
The thing I don’t see enough of with PitchFX is using it for batters to determine their responses to pitches. What pitches cause a batter to swing and miss? Where in the zone does he connect for power? etc.
Also, it would be great if we could get breakdowns of a pitcher’s individual pitches and plate discipline stats against those pitches. Does this pitch produce called strikes, balls, balls in play, swinging strikes, etc?
I agree that it doesn’t seem to be used for batters too often. Some of this sounds a lot like the stuff I used to do a few years ago with the BIS data occasionally for batters, but you can add movement into the equation with the PitchFx.
I do fully expect to have PitchFx sections for batters as soon as I feel good about the pitcher’s section.
Completely agree with what philosofool said. Picture graphs don’t say as much as number charts such as swing and miss percentage, called strike percentage and the such.
One thing I would like to see with release point is have it broken down by pitch type. So for each pitcher, you will have their release point by pitch type.
Bronson Arroyo throws his fastball at 6.4 feet from the ground but his breaking pitches at 5.8 ft, the largest difference of any pitcher. Pretty interesting to know.
I am thrilled to see this stuff here. It definitely fills a void left by Kalk’s blog’s disappearance. Things I’d like to see:
Pitches by count
Splits by pitches
Release points
Pitcher similarities
Also, I’m not sure if “two seam fastballs” should be called that. “Two seam” describes a grip, but what the differentiates “two seam fastballs” and “four seam fastballs” in the algorithm is movement. Perhaps “two seam fastballs” could be called “sinkers.”
Will this info be integrated into the leader boards? It would be great to see who has the most break or gets the most swinging strikes on their curveball, for instance. Another thing that might be cool to show is the velocity difference between a pitcher’s fastball and changeup.
Awesome.
Thank you for this addition. Is it possible to also add release point data as well?
Oops, I was a little too slow. :X
What exactly is the “IN” pitch that Johan threw a few times last year?
IN stands for Intentional Balls in the pitch fx data
Of course, thanks
Any chance you could show the release point for certain pitches?
Great work!
I’ve been playing around a little with pitch f/x myself, and one important thing (that is, if someone is up to the task, along with the better pitch classification algorithm) is to normalize for park and game, since there are obvious differences between the measurements in different ballparks and games (I know this for sure about 2007 and 2008, don’t know what’s up in 2009 yet).
If this is done, then the pitch f/x data could be incorporated into the leaderboards. That could also be done now, but it would mean less…
Anyway, great job Dave!
Neat tool, highly inaccurate, but probably of some use when looking at large samples. Danger time when using it to determine “how” a pitcher ought to be using his pitches. Too much error in location and the aforementioned labeling problems might lead one who can’t identify pitches (and their effectiveness) through observation to some poor conclusions. The seemingly premature predictions of Lincecum based on 2 starts come to mind.
The yearly pitchfx is excellant, I haven’t found another site that offers that. Would it be possible on the yearly graphs of horizontal/vertical movement to make it so you could deselect/select each pitch. My problem at the current moment is that I am trying to ready the chart, but the lines/dots of the other pitches are too condensed and make it hard to get a clear reading.
I’m curious about how you are handling Pitch FX’s classification of e and 2 seam fastballs.
If you are directly pulling your PFX data from MLB Gameday’s XML datastream, what do you do with players like Jon Lester whose 2 seam fastballs are lumped into the “FF” category with his 4 seam fastballs?