During the 2006 playoffs, Major League Baseball Advanced Media enhanced its Gameday application with PITCHf/x tracking in real time. By leaving the data freely available to anyone who wanted it, MLBAM launched a wave of analysis, and analysts, that is still rippling through the game. Each season has seen new features, and 2009 will be no exception. As with the 2007 season, some features will be rolled out gradually as the season moves on.

2008 and the offseason

Last year was a busy one for Gameday. A 3D interface was introduced early in the season and the seeds of HITf/x were being planted. At the First Annual PITCHf/x Summit, the community got together and saw the first pieces of HITf/x, courtesy of Peter Jensen. Our own Mike Fast talked about his data processing and general dweebery was rampant.

At the summit in May, Ross Paul of MLBAM presented some of the inner workings of his neural network, which is what produces the real time pitch IDs in Gameday. Based on feedback from other attendees, Ross improved his pitch IDs, something you may have noticed during the season. Now, the neural net has some new training—49,000 pitches identified by Fast.

The resulting improvements in pitch classification are already evident, but Paul will be the first to tell you that targeted, post hoc analysis still will defeat his general purpose method. The results are a notable improvement, and an example of the level of collaboration between the independent analysts and MLBAM. Not to mention SportVision, which partners with MLBAM in this endeavor and also supports, and utilizes, the analyst community.

Pitch classifications for 2009

The system uses the following classifications to describe pitches, which should look oddly familiar, thanks to Fast:

FF Four-seam fastball
FT Two-seam fastball
FA Fastball (generic, usually when the pitcher is not well known)
SI Sinker, same as two-seam but the label is used for some pitches (Derek Lowe had his two-seam fastball consistently marked SI, for example)
FS Splitter
CH Change-up
CU Curveball
SL Slider (real time pitch classifications often mix curves and sliders)
FC Cutter
KN Knuckleball
SC Screwball

One important input is direct knowledge of a pitcher’s stuff, which can be used to make or adjust some classifications—the sinker label is one example.

New and premium features

For the first couple of weeks of the 2009 season, MLBAM is making the new “Premium” version of Gameday available for no charge. Here are the features promoted when you enter Gameday, from a screen shot:


Each “situation” ends up in an XML file in a new “premium” folder (see References and Resources for a link to the root MLB games directory). In the Gameday application itself, the strike zone is split up nine ways, plus four regions for pitches out of the zone. If you look at the actual data, 14 zones are defined. Zone 10 is skipped, leaving 11-14 as the non-strike regions.

For each situation and zone, a variety of values are stored. For hitters, home runs, RBI and batting average are stored and displayed. OPS is also in there, along with a “rank” value. The rank is based on the OPS, which is what the “hot” and “cold” zones are based on. Hitters have pitches they “love” or “hate,” also based on OPS. Pitchers have an “out” pitch and that is based on a batting average. I’m not sure if it is BABIP or for all pitches of that type.

Right now, there aren’t a lot of data being used for the hot zones and tendencies, so the outputs are flaky. For example some hitters “love” and “hate” the same pitch. Once more data are accumulated and/or leveraged, the tendencies and hot zones will become much more useful. Casual fans certainly will find it enjoyable, and even PITCHf/x guys like me can use it as a springboard or measuring stick for our own endeavors.

Some of the new features already are working well, but you have to click one of the premium charts on the play-by-play screen to see them. In addition to the details on the hot zones for the pitcher and the hitter, you get four pitcher tendencies. Release point and speed tracking already are adding value to the experience. You can see if a pitcher mixes his release points, or if he’s dropping his arm or losing velocity on his fastball. Movement trends also are tracked during games, although I feel that is an even harder tea leaf to read. The pitch selection pie graph is another “ready now” feature.

Something new that’s not in Gameday itself has appeared in the Gameday XML data files—spin rate and spin direction. The spin direction data appear to be corrected for release point, so the values reflect spin from grip and release action rather than arm slot. Combined with start speed and/or RPM, these values can be of great importance in post hoc classification. Many PITCHf/x aficionados already calculate and use these values on their own, but now they are available for everyone.

The demise of Questec

PITCHf/x is the basis for the new umpire strike zone evaluation system. Major League Baseball employs technology in evaluating and training its umpires, and those needs are now being met by PITCHf/x “plus.”

Both systems were used in 2008, and now the system of record is based on the same data made available to the public. “Based” is the key word. There are post-game processes that get things into shape that we don’t get to see. I’m not sure if it is related, but the strike zone has returned to the Gameday interface in 2009, with intentionally fuzzy borders.

What’s next

Both the Mets and Yankees played exhibition games in their new parks, but PITCHf/x data were not available through Gameday. At this writing, I’m unsure of when we’ll see PITCHf/x in NYC again. Soon, I hope. Meanwhile, there are two features to left to talk about.

Gameday Audio soon will be integrated with the premium version—it’s listed as a feature in the pop-up shown above, but it is not on yet. That will be a convenient combo, and a pretty good deal considering Gameday Audio is a subscription service you can buy separately.

The most anticipated new feature has to be HITf/x. Using the same cameras that track pitches, the initial trajectories and characteristics of most batted balls can be captured. Speed off bat and launch angles will add a new dimension to understanding the quality of contact and how it relates to pitches, locations and balls in play. The system will not capture the full flight of the ball, but there are efforts underway to use the available data and develop new collision models to estimate spin and landing spot. I’ll defer to Mike Fast and others on that topic. I’ll just sit back and wait for the goods to arrive.

References & Resources
Major League Baseball Advanced Media is a subsidiary of Major League Baseball
Gameday can currently be accessed via the scoreboard.
XML Data Archives

Mike Fast provided valuable feedback and insight. Dan Brooks joined me in the online journey through the XML files.

