Gameday-PITCHf/x changes for 2010

Every year brings changes and improvements to MLBAM’s Gameday application, and many of them have some bearing on PITCHf/x and related analysis. Let me share with you the differences I’ve noticed so far in 2010.

First, Cory Schwartz of MLBAM notified some of us in February that some redundant information was going to be removed from the directory structure.

Folks, just wanted to give you a heads-up that we are deprecating the individual batter and pitcher .xml files published under these directories:
http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/batters/
http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/pitchers/

If you’re using any data in those files you should be able to get it from other files in the gd2 directories, but we no longer need or use these for any of our internal purposes or products. In addition, we are deleting the 2008 and 2009 files from our servers to free up the disc space for other content.

Fortunately, Dan Brooks has been able to adapt his site to the new structure for 2010.

Ross Paul shared that he would be deploying pitcher-specific neural nets for MLBAM’s pitch classification.

The Gameday PITCHf/x data also has a few new fields this year. In the at bat element, there is a new field called “start_tfs”. This is a time stamp in the Eastern Time Zone. It matches up more closely with the accurate actual time than does the sv_id time stamp, which can be a few minutes off. Cory tells me that this field wasn’t intended for analysis and is used internally by MLBAM. Speculation is that this field may be used for syncing up the Gameday data with other data sources, such as video. Since it’s there in the data, I wouldn’t be surprised if someone finds an analytical use for it, too.

The pitch element has three new fields: “nasty”, “zone”, and “cc”. The zone field appears to correspond to the location of the pitch based on the boxes into which the Gameday app divides the strike zone for its hot/cold zone graphics. The “cc” field is a comment field that appears to my highly-trained eye to be auto-generated, probably also based on the hot/cold zone information that MLBAM tracks. Here are some examples of the sparkling wit and insight produced by the auto-commenter:

A.J. Burnett didn’t read the scouting report; Adrian Beltre loves four-seam fastball in that zone.
A.J. Burnett didn’t read the scouting report; Jacoby Ellsbury loves sinker in that zone.
A.J. Burnett didn’t read the scouting report; Victor Martinez loves four-seam fastball in that zone.
Tim Lincecum has thrown 75 pitches; he holds opposing hitters to a .000 average in the first 75 pitches and .000 after that.
Vicente Padilla didn’t read the scouting report; Jeff Clement loves curveball in that zone.
Vicente Padilla didn’t read the scouting report; Lastings Milledge loves four-seam fastball in that zone.

(Apparently Ted Williams was right.)

The “nasty” field is presumably a crude attempt to calculate how hard to hit a particular pitch was, on a scale of 0-100. My initial cursory look at the data indicates that they are calculating the “nasty” factor mostly based on the location of the pitch, a linear calculation of how close it is to the edges and away from the heart of the zone. For the fastball, MLBAM does not appear to be including anything related to the movement or speed of the pitch into the “nasty” factor. For the curveball, they appear to be rating sweeping curveballs as significantly more nasty than 12-to-6 curveballs. Anyway, I’m not sure that any of this matters as more than a curiosity. As a sabermetric community we have much better approaches available for measuring the nastiness of a pitch.

A.J. Burnett knuckle curve grip
A.J. Burnett throws a knuckle curve against the Angels in Game 5 of the 2009 ALCS. (Icon/SMI)

Finally, the MLBAM pitch classification have introduced a new bucket this year: KC, the knuckle curve. I’m not sure why they did this. I suspect it has something to do with the scouting data they got for their training data, although I haven’t asked Ross about it. For my own classifications, I do not classify the knuckle curve separately from other curveballs. I don’t generally classify pitch types separately based on grip unless the grip differences actually produce substantial spin movement differences (e.g., two-seam and four-seam fastballs). I don’t classify palmballs, forkballs, circle change-ups, three-finger change-ups, and Vulcan change-ups separately. I do occasionally classify hard curves and slow curves separately when they are two distinct pitch types for the same pitcher, as they are for Roy Oswalt, for example. But the knuckle curve, also called the spike curve, moves just like other curveballs.

A.J. Burnett’s curve is the only pitch that I’ve noticed so far that MLBAM is labeling a knuckle curve, which handily gives me an excuse to include an image of a pitcher’s grip, one of my favorite topics.


Print This Post
Sort by:   newest | oldest | most voted
Lucas A.
Guest
Lucas A.

The pitch classifications have been pretty crude so far.  Even more so than usual, it seems.  For example, over the first two games, every David Robertson fastball (which sometimes cuts slightly) that had positive horizontal spin deflection has been misclassified as a curveball.  Personally, I don’t mind going back and working on the classifications myself, but just out of curiosity, do you know if the algorithm was changed at all this year?  In the early going, it’s been pretty rough.

Josh
Guest
Josh

Lucas – Ross is working on custom neural nets for each pitcher. The results look promising, but I have no idea how far along in the process he is. Mike posted a link in the article.

Mike Fast
Guest
Mike Fast

Lucas, my impression also is that the classifications have taken a step back this year.  The examples that Ross posted at the Book blog on his new classification nets looked good, but there are some pretty bad/obvious mis-classifications going on in the first couple days of 2010.  I’ll leave it at that lest I get another “open letter” directed my way. smile

Detroit Michael
Guest
Detroit Michael

So where can one access the pitch-by-pitch data from 2008 and 2009?  Just in case one wants to look at the raw data.

Mike Fast
Guest
Mike Fast

Detroit Michael, what Cory meant in his email was that the redundant directories were not only being removed going forward but that they were also being removed from the old data.  The pitch-by-pitch data is still present for 2007-2009 in the inning/inning_?.xml files.

For example:
http://gd2.mlb.com/components/game/mlb/year_2009/month_10/day_22/gid_2009_10_22_nyamlb_anamlb_1/inning/

Lucas A.
Guest
Lucas A.

Mike, would you be able to expand on the “zone” function?  I’m not sure if I understand what it does.

Mike Fast
Guest
Mike Fast
Lucas, Gameday Premium has a feature where you can see hot/cold zones for each hitter based on batting average in that zone.  I can’t bring it up from work, but it divides the strike zone into boxes, nine I think, and then there are some zones outside the strike zone. It’s a very crude thing, and I wouldn’t use it for anything other than red and blue entertainment for your eyes.  It’s similar to the kind of drivel you occasionally see on national baseball telecasts where they show you a hitter’s hot and cold zones.  It’s nothing up to the… Read more »
Mike Fast
Guest
Mike Fast
To expand on my point about the “nasty” score being useless and wrong as MLBAM is calculating it, here are how the different pitch results stack up according to their average “nasty” score. “Nasty”  Result   41.6   In play, out   41.3   In play, run/no out   40.7   Swinging strike   40.4   Called strike   40.1   Foul   38.3   Ball Remember this is on a scale of 0-100, with an average score of 40 and a standard deviation of 16.  It’s the worst kind of junk stat—purporting to measure something really cool but actually… Read more »
Josh
Guest
Josh

Mike,

At least attention is being paid to things like classification. Ross hasn’t gotten to all starters yet or any relievers.

Also, the metrics like nasty are made for their entertainment products, not for serious analysis. Since they make the data freely available, folks like you and Dave Allen can go the more rigorous route which is awesome.

I’m more than willing to cut them yards of slack on the latter point alone.

Mike Fast
Guest
Mike Fast
Josh, I wouldn’t want to be a Premium customer and be told that the numbers they are giving me are meaningless and may as well be coming from a random number generator. The reason it irks me so much is that Gameday is the face of PITCHf/x for the majority of the baseball world.  If they are communicating the message that PITCHf/x is just a junk toy for entertainment but rife with errors and bad conclusions when it comes to analysis, that is the message I have to fight against when trying to show that the data can be used… Read more »
Alan Nathan
Guest
Alan Nathan

Regarding the “nasty” metric, it is further amusing that it is quoted to 3 significant figures.  Or, equivalently, we are rating nastiness on a 0-1000 scale.  I doubt we can classify anything in baseball to that precision.

Cory Schwartz
Guest
Cory Schwartz
Mike, upon reading this I was discouraged by your continued willingness to mock and criticize what we do with the Pitch-f/x data, when you aren’t even willing to take the time to even ask us about it first. However, given that you’ve already demonstrated your willingness to show off how uninformed and presumptuous you are in this regard (http://www.insidethebook.com/ee/index.php/site/comments/open_letter_from_cory_schwartz/), I shouldn’t be surprised that you’re doing it again here. Your snide and dismissive comments reveal that your true motivation is not to evaluate or inform but simply to criticize, which does not serve anyone in any way. New and informative… Read more »
Mike Fast
Guest
Mike Fast
Cory, I am sorry that you feel I am being snide and dismissive.  You guys at MLBAM do a phenomenal job with communicating the game information and experience.  At the same time, the sabermetric community tears apart every stat that’s ever been made, and that’s what I did here with the Nasty Factor.  If a film critic pans a scene in a movie, does that mean he hates the director or needs to get the director’s input before he publishes his review? I love the movie.  Just don’t like a couple scenes.  And I wouldn’t be reviewing at all if… Read more »
Dan Brooks
Guest
Dan Brooks

I, for one, learned about the knuckle-curve. It is now added to my site! =)

Now, if there was only an additional flag (KH) for “knuckle-head”, that was reserved for when pitchers threw to first when no one was actually on the bag or outfielders threw the ball into the seats after only 2 outs.

Alan Nathan
Guest
Alan Nathan

Cory…evidentally, you are not following this thread at Tango’s blog:
http://www.insidethebook.com/ee/index.php/site/comments/phil_on_rodney_fort_and_academia/
If you were, then you would not be calling me “Dr. Nathan” smile.

Alan Nathan
Guest
Alan Nathan

Cory….a more serious comment:

It might actually be helpful for all the baseball analysts out there if we had some knowledge of how you arrive at the “nasty” factor.  …alan

wpDiscuz