Will FIELDf/x Go Public? Should It?

For those among you out there who read FanGraphs regularly, chances are you have a copy of the Hardball Times Baseball Annual 2011. If so, pace around your mother’s basement, take your dog-eared copy off that book shelf, and flip to page eleventy-one with me (or do yourself a favor and purchase a copy here). Take a couple of minutes to (re)read Rob Neyer’s article documenting his giddiness of the potential of FIELDf/x, a new player-tracking system by Sportvision. Fully operational FIELDf/x camera systems will be installed in five stadiums by the end of this season and hopefully all 30 by 2012. Here’s an excerpt from Rob’s article describing FIELDf/x:

FIELDf/x will manifestly and forever revolutionize the evaluation of defense. In fact, I will venture that the defensive metrics in use today, whether by John Dewan or Sean Smith or David Pinto or Mickey Lichtman or anyone else, will in five years seem nearly as primitive as range factor does today. Because with FIELDf/x, we’ll know not just (approximately) where the baseball went and whether it was caught and who caught it (or didn’t). We’ll know exactly where the ball went and exactly how long it took a fielder to arrive and exactly how he got there. All the talk about range and getting a good jump and taking a good route — it won’t be just talk anymore. There will be cold, hard data for every bit of it.

What Rob touched on about taking a good route, our very own FanGraphBot Dave Allen investigated. Last summer, the Sideburn King gave a presentation at the annual PITCHf/x Summit on using FIELDf/x to assess fielders’ routes to fly balls. Sportvision had released sample data to several baseball analysts, and Dave took that data to determine the speed of an outfielder as they pursued a fly ball. He looked at the starting points of each fielder at the time of contact and how efficient the fielder was in getting to the ball by comparing the distance traveled by the player against the shortest distance possible. You can view this presentation over at Sportvision’s PITCHf/x Summit website. You’ll also get a great view of Dave Allen’s sideburns and crazy hairdo at work.

I’ve become increasingly interested in FIELDf/x myself, dreaming of potential analysis of the data. Several months ago, I wrote an article brainstorming ways we could potentially use FIELDf/x to analyze how Carl Crawford would fit in left field versus right field at Fenway Park.

The problem is, all I (and those of you who commented) could do was speculate. The initial samplings of the data from FIELDf/x’s first home AT&T Park were largely unavailable to the public, save for a few analysts who were invited to analyze its potential. MLB Advanced Media has released raw data from Sportvision’s PITCHf/x system for free to the public for teams, FanGraphs, The Hardball Times, Brooks Baseball, TexasLeaguers.com, you, me, and anyone with a computer and an Internet connection to use. But it’s looking more and more like that won’t be the case for FIELDf/x.

Yes, the potential of the data is massive. Yes, the Dave Allens, Mike Fasts, Harry Pavlidises, Jeremy Greenhouses, and other big-time PITCHf/x analysts of our world could unlock the location of the Holy Grail with full access to that data. Yes, in the public’s eyes and as fans of the game of baseball, we would love to see state of the art analysis and research done in the blogosphere.

But We The Fans will not decide the fate of FIELDf/x and its transparency. There are other stakeholders of the data who have their reasons not to make FIELDf/x data available to the public.

1) Competitive advantage vs. free labor
The release of PITCHf/x to the public allowed statheads to develop cutting edge baseball research, analyzing hitters’ and pitchers’ tendencies and weaknesses. In a few cases, careers were made out of the pool of PITCHf/x analysts who propped up since PITCHf/x first came on the public scene in 2007. Josh Kalk of the Tampa Bay Rays is one publicized example, who you can read about in Jonah Keri’s The Extra 2%. PITCHf/x analysts on sites such as this one, Baseball Prospectus, The Hardball Times, etc. are widely read by executives and analysts within baseball — some of whom scan the blogosphere in order to obtain “free labor” and analyses. For such teams, maybe it makes sense for them to advocate for public release of FIELDf/x data.

At the same time, those several teams known to invest heavily in their analytics departments do so in order to gain a competitive advantage over other teams, especially over those that ignore cutting edge PITCHf/x analysis or settle for free analysis. The teams who invest in analytics (which at this point are probably slightly more than half) hire programmers and analysts who develop proprietary software, databases, and system applications to varying degrees.

For these teams, it’s not in their best interest for FIELDf/x data to be public. The goal of an organization is not to develop state of the art analytics systems — the goal is to win games and championships. Yes, free release of the data opens the doors for dozens and hundreds of freelance analysts, who collectively would be much more progressive than a handful of analysts working for teams. But the more freelance analysts out there (or here, i.e., you and me) get a hold of freely available real time data, the less of an advantage the most analytically-minded organizations have over all the others.

2) Data deluge
I explained the sheer massiveness of the data before in my Crawford/FIELDf/x article:

FIELDf/x records high resolution shots 15 times a second, identifying every human on the field with each shot assigned to a time stamp. It also records events, such as when the pitcher releases the ball, the batter hits the ball, the fielder gains possession of a ball, and the fielder throws the ball. Whereas PITCHf/x gives us about 250 pitches per game, there may be up to 1 million FIELDf/x data entries recorded a game. This comes out to over 2.4 billion lines of data for each season describing the locations of fielders, baserunners, and umpires sorted by game and time stamp, scaling the petabyte level in memory.

I mentioned later in the post that much of the data will need to be filtered out — we don’t need to track a player when he’s warming up or jogging to and from the dugout between innings. Initially, the most useful parts of player tracking are when the ball is actually in play, which only occurs on a fraction of all pitches. The raw FIELDf/x data needs a lot of cleaning up to do, and it’s going to take more than a personal laptop computer with an Internet connection to be able to handle all of that data.

Sportvision and MLBAM will have the necessary resources to clean up the data — but even then, release of ready and massaged FIELDf/x data may not bring the same analyses that PITCHf/x produced because of the size of the data sets. What’s more likely is that current PITCHf/x analysts or independent baseball analytics consultants will be allowed to get a peek of the data like they have been — if those analysts aren’t already hired away by teams. I’m not sure if the general public would be able to do much with the data as few will have the databasing skills and computing power to handle such large data sets.

3) We The Fans
As hard as it is for me to say this and to stomach it, I am not sure that We The Fans need this data. At least, for the purposes of entertainment enhancement. ESPN, FOX, and local broadcasts can buy sets of summarized data (as opposed to bulk data dumps more useful to teams) from Major League Baseball to feed the public’s insatiable desire for baseball information. MLB Gameday can replay every fielding play in baseball, showing a replay of fielders’ positions, speeds, and movements during a play. We The Fans will still be able to enjoy the fruits of FIELDf/x through enhanced broadcasting experiences.

One idea is that MLBAM could assemble a team of programmers in order to centralize all of the FIELDf/x data mish-mashing and adjusting. Unlike the Major League Scouting Bureau (which centralizes scouting reports and rankings), these programmers would manipulate the data so that it is presentable and more useful to teams, so that teams can spend most of the analyst work hours on analyzing data and producing their own proprietary rankings rather than making the data workable.

That still begs the question of where teams get their FIELDf/x analysts if they don’t know who out there is able to analyze it. Well, we already have PITCHf/x, do we not? Of course, FIELDf/x analysis will be very different from PITCHf/x analysis, but I would argue that most PITCHf/x analysts have the technical skills and the baseballing knowledge to be able to transition well into FIELDf/x analysis. That’s why Sportvision went to independent PITCHf/x analysts like Allen, Fast, Pavlidis, Greenhouse, etc. with sample FIELDf/x data, right?

Overall, I am very excited for the potential of FIELDf/x. According to this Bloomberg article, FIELDf/x is now installed at Yankee Stadium and PETCO Park in addition to AT&T Park, with Kauffman Stadium and Tropicana Field next in line. By 2012, the hope is that all 30 stadiums have fully operational FIELDf/x systems, with the fruits of the data enhancing what is seen on broadcasts and web applications by 2013. I may never get the opportunity to see the raw data first hand, but sure am thankful that motion capture technology has evolved so that a significantly heightened baseball fan experience is just around the corner.

For more thoughts and ideas on FIELDf/x by much smarter people than me, see Tango’s blog post about it here.


It’s been a wild ride the past senior year for me at Northwestern. And while I feel like I wouldn’t be writing for the best and certainly the coolest baseball blog on the Internet without my impassioned love for baseball and PITCHf/x, I most definitely would not have received this opportunity if not for the people in my life who have advised me and led me along the way. I feel so blessed to have been given this opportunity by David Appelman and Dave Cameron back in September, who have been, up to this point in my young life, the best bosses I have ever worked for. I am forever indebted to both of you for giving me the opportunity to write in these spaces, for dealing with my incessant emails, and for organizing quite possibly the most awesome spring training trip ever.

In particular, I’d also like to thank Carson, Niv, Eno, Dave (Allen), Eric, Jack, Chris, Paul, Robert, and all the other FanGraphs authors for making me feel at home, who have become more friends than co-workers during my time here. I can’t thank you guys enough for editing and advising my work and for all that I learned from you guys. I would also like to thank the readers and commenters who have kept me accountable on my writing and drove me to be at the top of my game.

In the meantime, I’ll be pursuing other opportunities in my life, and I believe my experiences at FanGraphs have done their part in preparing me for the real world. To quote Admiral Adama from Battlestar Galactica, “Gentlemen, it’s been an honor.”

Print This Post

Albert Lyu (@thinkbluecrew, LinkedIn) is a graduate student at the Georgia Institute of Technology, but will always root for his beloved Northwestern Wildcats. Feel free to email him with any comments or suggestions.

36 Responses to “Will FIELDf/x Go Public? Should It?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Eddie says:

    As a fan of both the Cubs and fangraphs, I hope this info is released publicly.

    Vote -1 Vote +1

  2. Telo says:

    “As hard as it is for me to say this and to stomach it, I am not sure that We The Fans need this data.”

    We don’t need anything at all relating to advanced baseball statistics, but it’d be amazing to have it.

    Vote -1 Vote +1

    • Telo says:

      Per Tango:

      “In any case, how much do you think the average team is worth? 300MM$? 400MM$? Something like that. If MLBAM were to go public, they’d get a valuation 10 times that much. A MLB team is a very small potatoes entity (Yanks notwithstanding). MLBAM is a monster. And they are not a monster by keeping things private for their 31 owners (commish owns 10%). They are a monster because they feed the starving public.

      Hang time on batted balls, run time on steals and close plays, throw times by catchers and outfielders… are you kidding? That’s f-cking gold right there. They’ll show it on Gameday, ESPN/Fox will demand it. Do you really think some small potatoes team worth 400MM$ that doesn’t put any more money into MLBAM is going to influence anything?

      Think big picture.”

      It’s going to happen, and it’s going to be awesome.

      Vote -1 Vote +1

  3. RC says:

    “At the same time, those several teams known to invest heavily in their analytics departments do so in order to gain a competitive advantage over other teams, especially over those that ignore cutting edge PITCHf/x analysis or settle for free analysis. The teams who invest in analytics (which at this point are probably slightly more than half) hire programmers and analysts who develop proprietary software, databases, and system applications to varying degrees.

    For these teams, it’s not in their best interest for FIELDf/x data to be public.

    I disagree with this strongly. I don’t think it makes a difference to these teams whether or not its public. The teams that don’t believe in it aren’t going to use it, whether or not the public is compiling it. Some people are still going to believe that Jeter is a good defender, and the Twins are still going to believe that Betancourt can play Shortstop.

    The only difference with this stuff being public, is that the Saber-crowd will have more evidence that some teams aren’t run well.

    Vote -1 Vote +1

    • Rob says:

      Actually it clearly isn’t in the teams best interest, knowledge doesn’t grow by keeping things hidden, but it clearly is in the analysts’ best interest. Having no way to measure whether or no what they are doing is better or worse than what anyone else is doing leads to a nice life.

      Vote -1 Vote +1

    • B N says:

      Moreover, I would hazard to say that organizations with the resources and ability to to put a full team of full time programmers and statistical modelers to analyzing PitchFX data could produce results much faster and more effectively than simply browsing fan-driven or freelance work:

      1. Organizations with their own departments can look into specific questions in real-time. They get to select the problems they look at, based on their needs. That’s pretty huge.

      2. They can build more advanced stats, in particular, team effectiveness stats. I have yet to see a single defensive stat that is able to even crudely estimate the total defensive capability of putting a whole squad of guys on the field (i.e. you can afford to put a guy with limited range in RF, because your CF can cover so much ground). As a team, these would be the FIRST things I would throw money and research at.

      3. If you don’t have a culture of using advanced stats, no level of availability will help you use such stats. Short having a stat that says “Buy this guy for you team, up to this price” there’s no direct correlation between any performance stat and how to mange baseball decisions. For that, you need to be able to put that data into your organizational decision-making (no small feat, for most organizations).

      I mean, let’s imagine for a moment that you are an organization with no advanced stats analysis department. Your organization now can look on the internet on sites like this and find all sorts of nice advanced stats. However, you have no one qualified to even interpret the stats in an actionable way.

      Basically, there’s a divide between descriptive models and decision models. You can give an organization all the descriptive models in the world, but if they don’t have any way to interpret, integrate, and apply that information- it’s just going to sit there. As someone who has made quite a few models, working with a client how to interpret and use the results from a new model is basically half the battle.

      Vote -1 Vote +1

    • Bryz says:

      You’re mixing up your examples. Do you mean Royals and Betancourt?

      Vote -1 Vote +1

  4. Xeifrank says:

    I would put the Fans Scouting Report up there with any of the current defensive metric systems.
    vr, Xei

    Vote -1 Vote +1

    • Telo says:

      I agree. But if/when this data gets released, even the crudest implementation will destroy UZR and FSR.

      Vote -1 Vote +1

      • Llogan says:

        UZR and +/- are not the problem. The information used in calculating them is. Just swapping our BIS for FieldFX would be huge because there is a huge amount on uncertainty in the stringer data.

        Vote -1 Vote +1

      • phoenix2042 says:

        we will finally have a definitive fielding rating that will hopefully stabilize in a season and give us an accurate view on the true value of fielding, and therefore any given player. it might even lead to advancement in the evaluation of pitchers because one can definitively know what should and should not be an out and adjust a pitcher’s value accordingly. i’ll be interested to know if crawford really is worth 2 wins on defense and if jeter really does sacrifice a whole win by playing the field. and exciting time to be a baseball fan!

        Vote -1 Vote +1

  5. Barkey Walker says:

    I’ve always thought the pitch f/x was a little light on data. What if I want to know when a pitch breaks? This is obviously most important for a cutter, but still. It’d be nice to have position and first derivative every 10 feet. But you don’t even get the first derivative at the plate.

    They could also do batted ball angle and speed of exit but just don’t.

    Vote -1 Vote +1

    • Peter Jensen says:

      They could also do batted ball angle and speed of exit but just don’t.

      What you describe is Hit F/x. It is currently being collected in all 30 stadiums and is available to the teams. It also has been computed for all the previous years of Pitch F/x video. Hit F/x data will probably take some years before it is released to the public as well. Hit F/x was the subject of the 2009 Sportvision Summit and those presentations should still be available at the Sportvision web site.

      Vote -1 Vote +1

  6. Steven Ellingson says:

    If this gets released, what will we have to argue about?

    Oh yeah, Matt Cain.

    Vote -1 Vote +1

    • Small Sample Goodness says:

      Are you suggesting that Cain is actually a better fielder than his advanced defensive metrics would imply?

      Vote -1 Vote +1

  7. Lewie Pollis says:

    Thanks for all your work, Albert! You will be missed.

    +7 Vote -1 Vote +1

  8. COL Tye says:

    Thanks Albert, good luck!

    Vote -1 Vote +1

  9. Choo says:

    The “best route” to a fly ball isn’t always the shortest distance to its point of intercept.

    For example, if the situation involves one or more baserunners and less than two outs, the best route to a fly ball is usually the route that produces the most efficient forward momentum toward the fielder’s anticipated throwing target. In other words, what might appear to be a dreaded “banana route” could be an outfielder effectively defending an unoccupied base. Outfielders even run momentum routes on routine flyballs with the bases empty to maintain good fundamentals and/or lower the overall degree of difficulty of the catch.

    Other times when the best route is not the most efficient:

    – Wind. There are swirls and gusts unique to each environment. A bad looking route on paper might actually be the smart route on a windy day.

    – Visual impediments. Some outfielders, particularly veterans at home, run certain unorthodox routes in order to avoid specific stadium lights and press box glare.

    – Hooks and slices. An outfielder who runs a straight line to a slicing flyball before making an off-balanced catch as it whistles behind his head should not be scored higher than an outfielder who, instead, rounds off his route a bit in order to meet the ball under control.

    Don;t get me wrong. I love defensive metrics and can’t wait to see the results. Just wondering how or if certain issues will be handled.

    Vote -1 Vote +1

    • Bryz says:

      I think it’s clear that FIELDf/x still won’t be perfect, but it will be much better than what we currently have for rating defense.

      Vote -1 Vote +1

    • SF 55 for life says:

      Just like any statistic it works better with a large sample reinforcing it. Any one event may not be that important, but over an entire season, it will probably make a lot more sense than UZR could ever hope to be.

      Vote -1 Vote +1

  10. Ahhh, we’ll miss you, Albert. Best of luck to you in all of your future endeavors.

    Vote -1 Vote +1

  11. hglman says:

    Its really not that much data, as far as lots of data goes.

    Vote -1 Vote +1

  12. delv says:

    Does this mean that pitchf/x will be improved, too? —‘Cause it clearly has problems itself.

    Vote -1 Vote +1

  13. Boris says:

    What if the data shows that normal defensive scouting is fairly accurate and this is a huge waste of funds?

    Vote -1 Vote +1

  14. Jason says:

    Yes it should go public and David Appelman should pay for it. It’s the least he can do with his Bleacher Report money.

    Vote -1 Vote +1

  15. Lebowski says:

    Isn’t there something being a little missed here? Sure there will be lots to be learned from the fielding part, but won’t the more (or at least equally) interesting also be able to be mined from how someone hits? Is someone who always hits fly balls in the same general place, but always hits liners somewhere different then set up as being easier to defend against? You cover the liner, but stay in a place which means you’ll still be able to reach any fly ball. What I could see teams doing with this is radically altering their defensive alignments against some batters who consistently hit the ball to the same general vicinity against a certain pitcher. Giving outfielders in particular a heat map type thing about where a ball is most likely to be hit would be liable to improve defence a lot.

    Of course there will be a serious issue with small sample sizes but against divisional rivals I could see this altering the game significantly within a couple of years of the data being available.

    Vote -1 Vote +1

  16. dorasaga says:

    Not to take anything away from this article, Lyu, but “baseballing knowledge” is not a word. You just need “baseball knowledge,” or even “base ball-knowledge” (G.Orwell style) to address your thought. The combining of two nouns is not a sin in English.

    I agree with point 1 and 3. Fans want to be entertained. Too many data can be, you know, too much. The time and effort needed now to turn 24 billion digits into useful language are not entertaining anymore. Pitch f/x looks more readily condensed already. Even a small potato like I can analyze it by just reading it on gameday.

    Though, I don’t think it’s right to leave the data to just a selected few. Whoever is knowledgeable or capable of analyzing baseball, that’s not up to MLB officials and their hired firms, such as Sportvision, to decide.

    Remember Bill James, and what happened before his first publication. Baseball was in a Dark Age, because the ones “in” could easily point fingers on the ones “out.” We don’t need that again with FIELD-f/x.

    By the way, Eddie, as both a Cubs fan and a fan of baseball, I don’t think GM Hendry and his advisers need another guy like, say, McCracken, who didn’t work his way up a baseball organization (does Gary Hughes still felt offended?: danagonistes.blogspot.com/2005/01/scouting-vs-sabermetrics.html )

    Vote -1 Vote +1

    • RC says:

      “The time and effort needed now to turn 24 billion digits into useful language are not entertaining anymore.”

      Thats not a lot of data.

      Vote -1 Vote +1

  17. tdotsports1 says:

    Good luck Albert, you went out with a bang though, solid story.

    Vote -1 Vote +1

  18. FansCommish says:

    Time divided by distance is the number we need for defense

    Vote -1 Vote +1