Projection Systems

A lot of the statistics we use in baseball summarize and analyze the past. If you want to know how well a batter performed last season, you can look at his wRC+ and have a good sense of the season he had. But sometimes you want to know about the present and the future. For that, you need a projection. Simply put, projections are forecasts about the future.

A projection, in general, is the best estimate of a player’s true talent for a given period of time. Conceptually, your beliefs about how good a player is qualify as projections. In the world of advanced stats, we talk about projection systems, which are statistical models that take players’ past stats, age, and other factors to predict how a player will perform in the next game, upcoming season, or entire career career.

Projection systems are certainly imperfect. They’re estimates. Projecting a .400 wOBA doesn’t mean you would make a $1,000 bet on that player running a .400 wOBA exactly, it means that’s the best guess for how that player is going to perform. On average, some players will do better and some players will do worse. Projection systems cannot account for luck and random variation (even if they perfectly captured the player’s talent), so they are never 100% accurate. Instead, these systems are best viewed as an estimate of a player’s current, underlying true talent level. If a player is projected to hit for a .370 wOBA, that suggests the system believes he has a talent level of a .370 wOBA hitter.

Why Projections:

You want to make decisions about the future based on every single piece of relevant data and you want to weigh that data by its importance. Steamer projected Miguel Cabrera would have a .407 wOBA in 2015. What that means is that Steamer, based on everything it knew about Cabrera’s history and the way players typically age, we should have expected a .407 wOBA. Steamer knew that Cabrera had a “down” year in 2014, but it also knows he had a great 2013 and that hitters of his caliber usually age in a certain way. It’s all built in. You don’t just care how a guy did last year or how he did in his career, you care about the entire body of work and the underlying factors that are driving it. Cabrera finished with a .413 wOBA in 2015.

Think about it like the weather. You want to know if it’s going to rain today. How would you go about predicting whether or not it will rain? You would obviously pay some attention to the recent weather, but you would also look at historical weather patterns, and then you would look at the conditions in and around your area. It rained to your west last night: When that happens, how likely is it that the rain will come your way? There is a certain mix of pressure and air flow, what does that usually lead to? It’s all relevant information.

The same is true for baseball players. You care how Cabrera has hit for the last 600 PA. Those are super important data points, but they aren’t the only ones. You also care about the 600 PA before that. And before that. The older the data, the less important, but it never becomes useless. Additionally, you don’t just care about performance, you care about the underlying numbers.

If a player has a .400 wOBA with a.390 BABIP, you know most of their great season is predicated on getting lots more hits on balls in play than average. You wouldn’t automatically expect that .390 BABIP to continue, so you need to determine the typical BABIP regression for players of this type based on everything else you know about them.

You never want to make a decision based on a player’s simple past. You want to use that data to make a valid inference about the future and the process of doing so constitutes a projection. There are all sorts of different methods. Some are as simple as taking a couple years of data and weighing them by recency. Some like ZiPS, Steamer, Oliver, etc use much more advanced methodologies to estimate how well they think a player will perform using all sorts of information about that player and similar players of years past.

There is no ideal system, but the idea of projection is ideal. You care that the Royals won X number of games last month, but that doesn’t mean they’ll win X games this month. The last month is relevant, but it isn’t the whole story. Baseball is volatile and unpredictable any one sample of data is going to deviate from the true, underlying skill of a player. You want to do your best to make the best guess you can about their future and then use that to make decisions. That’s projection.

We like projections at FanGraphs. They’re useful for approximating current true talent levels and they help us predict which teams will be successful and which teams won’t. You could guess who was going to win the divisions based on the previous year’s player performances, but those players are going to perform differently this year and you want to account for that.

Many people are turned off by the idea of projection because projections seems like a black box. If you see a guy is hitting .380 wOBA this year but the projection says he’s a .340 wOBA hitter, you can’t easily internalize that a .340 wOBA hitter has produced a .380 wOBA to date. It’s human nature to assume the outcomes we observe are measures of truth, when in reality, they are influenced by randomness.

So when a stat-geek says they don’t want Player X to hit because they aren’t great, even though the player has a .350 wOBA during the last 400 PA, it doesn’t make sense. They have hit well, so they are good. But that isn’t exactly right. Their last 400 PA matter, but they don’t tell the whole story. A projection is trying to tell the whole story.

The systems aren’t perfect and the nature of the beast means they won’t get very many players exactly right, but they do a better job predicting the future than the last six weeks or six months of data will.

But it all comes down to the question. You might not care very much about predicting the future or approximating true talent. If you only care about past value, you can stick to the raw stats. But if you want to say something about how well a player is going to perform and what their true talent is, you want a projection. FanGraphs houses many of these each year and you can follow along, not only with the preseason numbers, but how they change based on the data of a new season.

How To Use Projections:

Projections are pretty simple to use. All you have to know is the forecasted performance and the time frame. You can then use that information to make an inference about how good a player is and how they will play. Should your team sign this player to a big contract? Was the trade your GM made a good one? Who do you want on your fantasy team next year?

There are a couple of important things to remember beyond that. First, we generally have more confidence in projections when we have more data. A player with 200 MLB PA probably has a less certain projection than a player with 2000 MLB PA because we can draw on more data to make inferences about the second player.

You also want to remember that projections are estimates of true talent. But true talent can change. Players learn new pitches and change their swings. Overall, projections do a good job estimating how players will perform, but the systems will miss on certain players every year. The systems aren’t designed to perfectly predict the future, they are designed to give the best estimate we have about how good a player is.

At FanGraphs we house a number of different systems such as Steamer, ZiPS, and FAN. You might also see something called “Depth Charts,” which is an average of Steamer and ZiPS with our staff’s custom playing time estimates. We use Depth Charts projections to power our projected standings and playoff odds.

Finally, the most important thing to keep in mind is that we only display the median expected outcome. So a player with a .350 projected wOBA is equally likely to hit better than .350 and worse than .350. The system might say the player has a 25% chance to hit above .375, for example, but that’s hard to communicate. You can find those kinds of things broken down on Steamer and PECOTA’s websites, although most systems calculate those odds in some way.

Projection Systems:

Here are some systems you might see around the game. This likely isn’t a full list and teams typically have their own systems as well. We house Steamer, ZiPS, Depth Charts, and Fans on the site. Marcel and Oliver have been available on the site in the past, while KATOH is the product of one of our writers and is often used in our prospect coverage. If you want historical projection data, the Baseball Projection Project is a great resource.

● Marcel – Developed by Tom Tango.

● Steamer Created by Jared Cross, Dash Davidson, and Peter Rosenbloom.

● ZiPS – Created by Dan Szymborski at BTF and ESPN.

● Depth Charts – ZiPS/Steamer average with FanGraphs custom playing time.

● KATOH Created by Chris Mitchell.

● Fans – Crowdsourced by FanGraphs.

● Bill James – Created by Baseball Info Solutions and Bill James.

● Oliver – This system was created by Brian Cartwright.

● CAIRO – A system developed by the folks at Revenge of the RLYW.

● PECOTA – Developed by Nate Silver and Baseball Prospectus.

● CHONE – Developed by Sean Smith.

Links for Further Reading:

Looking at Baseball Projection Systems – FOX Sports

Rich Hill and 50th Percentile Projections – FanGraphs

2009 Forecast Evaluations – Steamer Projections

What Exactly is a Projection? – FanGraphs