## How Well Did the FanGraphs Playoff Odds Work?

One of the more fan-accessible advanced stats are playoff odds [technically postseason probabilities]. Playoff odds range from 0% – 100% telling the fan the probability that a certain team will reach the MLB postseason. These are determined by creating a Monte Carlo simulation which runs the baseball season thousands of times [10,000 times specifically for FanGraphs]. In those simulations, if a team reaches the postseason 5,000 times, then the team is predicted to have a 50% probability for making the postseason. FanGraphs runs these every day, so playoff odds can be collected every day and show the story of a team’s season if they are graphed.

Above is a composite graph of the three different types of teams. The Dodgers were identified as a good team early in the season and their playoff odds stayed high because of consistently good play. The Brewers started their season off strong but had two steep drop offs in early July and early September. Even though the Brewers had more wins than the Dodgers, the FanGraphs playoff odds never valued the Brewers more than the Dodgers. The Royals started slow and had a strong finish to secure themselves their first postseason birth since 1985. All these seasons are different and their stories are captured by the graph. Generally, this is how fans will remember their team’s season — by the storyline.

Since the playoff odds change every day and become either 100% or 0% by the end of the season, the projections need to be compared to the actual results at the end of the season. The interpretation of having a playoff probability of 85% means that 85% of the time teams with the given parameters will make the postseason.

I gathered the entire 2014 season playoff odds from FanGraphs, put the predictions in buckets containing 10% increments of playoff probability. The bucket containing all the predictions for 20% means that 20% of all the predictions in that bucket will go on to postseason. This can be applied to all the buckets 0%, 10%, 20%, etc.

Above is a chart comparing the buckets to the actual results. Since this is only using one year of data and only 10 teams made the playoffs, the results don’t quite match up to the buckets. The desired pattern is encouraging, but I would insist on looking at multiple years before making any real conclusions. The results for any given year is subject to the ‘stories’ of the 30 teams that played that season. For example, the 2014 season did not have a team like the 2011 Red Sox, who failed to make the postseason after having a > 95% playoff probability. This is colloquially considered an epic ‘collapse’, but the 95% probability prediction not only implies there’s chance the team might fail, but it PREDICTS that 5% of the teams will fail. So there would be nothing wrong with the playoff odds model if ‘collapses’ like the Red Sox only happened once in a while.

The playoff probability model relies on an expected winning percentage. Unlike a binary variable like making the postseason, a winning percentage has a more continuous quality to the data, so this will make the evaluation of the model easier. For the most part most teams do a good job staying around the initial predicted winning percentage coming really close to the prediction by the end of the season. Not every prediction is correct, but if there are enough good predictions the predictive model is useful.

Teams also aren’t static, so teams can become worse by trading away players at the trade deadline or improve by acquiring those good players who were traded. There are also factors like injuries or player improvement, that the prediction system can’t account for because they are unpredictable by definition. The following line graph allows you to pick a team and check to see how they did relative to the predicted winning percentage. Some teams are spot on like the Pirates, but there are a few like the Orioles which are really far off.

The residual distribution [the actual values – the predicted values] should be a normal distribution centered around 0 wins. The following graph shows the residual distribution in numbers of wins, the teams in the middle had their actual results close to the predicted values. The values on the edges of the distribution are more extreme deviations. You would expect that improved teams would balance out the teams that got worse. However, the graph is skewed toward the teams that become much worse implying that there would be some mechanism that makes bad teams lose more often. This is where attitude, trades, and changes in strategy would come into play. I’d would go so far to say this is evidence that soft skills of a team like chemistry break down.

Since I don’t have access to more years of FanGraphs projections or other projection systems, I can’t do a full evaluation of the team projections. More years of playoff odds should yield probability buckets that reflect the expectation much better than a single year. This would allow for more than 10 different paths to the postseason to be present in the data. In the absence of this, I would say the playoff odds and predicted win expectancy are on the right track and a good predictor of how a team will perform.

## Run Distribution Using the Negative Binomial Distribution

In this post I use the negative binomial distribution to better model the how MLB teams score runs in an inning or in a game. I wrote a primer on the math of the different distributions mentioned in the post for reference, and this post is divided to a baseball-centric section and a math-centric section.

The Baseball Side

A team in the American League will average .4830 runs per inning, but does this mean they will score a run every two innings? This seems intuitive if you apply math from Algebra I [1 run / 2 innings ~ .4830 runs/inning]. However, if you attend a baseball game, the vast majority of innings you’ll watch will be scoreless. This large number of scoreless innings can be described by discrete probability distributions that account for teams scoring none, one, or multiple runs in one inning.

Runs in baseball are considered rare events and count data, so they will follow a discrete probability distribution if they are random. The overall goal of this post is to describe the random process that arises with scoring runs in baseball. Previously, I’ve used the Poisson distribution (PD) to describe the probability of getting a certain number of runs within an inning. The Poisson distribution describes count data like car crashes or earthquakes over a given period of time and defined space. This worked reasonably well to get the general shape of the distribution, but it didn’t capture all the variance that the real data set contained. It predicted fewer scoreless innings and many more 1-run innings than what really occured. The PD makes an assumption that the mean and variance are equal. In both runs per inning and runs per game, the variance is about twice as much as the mean, so the real data will ‘spread out’ more than a PD predicts.

The graph above shows an example of the application of count data distributions. The actual data is in gray and the Poisson distribution is in yellow. It’s not a terrible way to approximate the data or to conceptually understand the randomness behind baseball scoring, but the negative binomial distribution (NBD) works much better. The NBD is also a discrete probability distribution, but it finds the probability of a certain number of failures occurring before a certain number of successes. It would answer the question, what’s the probability that I get 3 TAILS before I get 5 HEADS when I continue to flip a coin. This doesn’t at first intuitively seem like it relates to a baseball game or an inning, but that will be explained later.

From a conceptual stand point, the two distributions are closely related. So if you are trying to describe why 73% of all MLB innings are scoreless to a friend over a beer, either will work. I’ve plotted both distributions for comparison throughout the post. The second section of the post will discuss the specific equations and their application to baseball.

Runs per Inning

Because of the difference in rules regarding the designated hitter between the two different leagues there will be a different expected value [average] and variance of runs/inning for each league. I separated the two leagues to get a better fit for the data. Using data from 2011-2013, the American League had an expected value of 0.4830 runs/inning with a 1.0136 variance, while the National League had 0.4468 runs/innings as the expected value with a .9037 variance. [So NL games are shorter and more boring to watch.] Using only the expected value and the variance, the negative binomial distribution [the red line in the graph] approximates the distribution of runs per inning more accurately than the Poisson distribution.

It’s clear that there are a lot of scoreless innings, and very few innings having multiple runs scored. The NBD allows someone to calculate the probability of the likelihood of an MLB team scoring more than 7 runs in an inning or the probability that the home team forces extra innings down by a run in the bottom of the 9th. Using a pitcher’s expected runs/inning, the NBD could be used to approximate the pitcher’s chances of throwing a no-hitter assuming he will pitch for all 9 innings.

Runs Per Game

The NBD and PD can be used to describe the runs scored in a game by a team as well. Once again, I separated the AL and NL, because the AL had an expected run value of 4.4995 runs/game and a 9.9989 variance, and the NL had 4.2577 runs/game expected value and 9.1394 variance. This data is taken from 2008-2013. I used a larger span of years to increase the total number of games.

Even though MLB teams average more than 4 runs in a game, the single most likely run total for one team in a game is actually 3 runs. The negative binomial distribution once again modeled the empirical distribution well, but the PD had a terrible fit when compared to the previous graph. Both models, however, underestimate the shut-out rate. A remedy for this is to adjust for zero-inflation. This would increase the likelihood of getting a shut out in the model and adjust the rest of the probabilities accordingly. An inference of needing zero-inflation is that baseball scoring isn’t completely random. A manager is more likely to use his best pitchers to continue a shut out rather than randomly assign pitchers from the bullpen.

Hits Per Inning

It turns out the NBD/PD are useful with many other baseball statistics like hits per inning.

The distribution for hits per inning are slightly similar to runs per inning, except the expected value is higher and the variance is lower. [AL: .9769 hits/inning, 1.2847 variance | NL: .9677 hits/inning, 1.2579 variance (2011-2013)] Since the variance is much closer to the expected value, hits per inning has more values in the middle and fewer at the extremes than the runs per inning distribution.

I could spend all day finding more applications of the NBD and PD, because there are really a lot of examples within baseball. Understanding these discrete probability distributions will help you understand how the game works, and they could be used to model outcomes within baseball.

The Math Side

Hopefully, you skipped down to this section right away if you are curious about the math behind this. I’ve compiled the numbers used in the graphs for the American League for those curious enough to look at examples of the actual values.

The Poisson distribution is given by the equation:

There are two parameters for this equation: expected value [λ] and the number of runs you are looking to calculate [x]. To determine the probability of a team scoring exactly three runs in a game, you would set x = 3 and using the AL expected runs per game you’d calculate:

This is repeated for the entire set of x = {0, 1, 2, 3, 4, 5, 6, … } to get the Poisson distribution used through out the post.

One of the assumption the PD makes is that mean and the variance are equal. For these examples, this assumption doesn’t hold true, so the empirical data from actual baseball results doesn’t quite fit the PD and is overdispersed. The NBD accounts for the variance by including it in the parameters.

The negative binomial distribution is usually symbolized by the following equation:

where r is the number of successes, k is the number of failures, and p is the probability of success. A key restriction is that a success has to be the last event in the series of successes and failures.

Unfortunately, we don’t have a clear value for p or a clear concept on what will be measured, because the NBD measures the probability of binary, Bernoulli trials. It’s helpful to view this problem from the vantage point of the fielding team or pitcher, because a SUCCESS will be defined as getting out of the inning or game, and a FAILURE will be allowing 1 run to score. This will conform to the restriction by having a success [getting out of the inning/game] being the ultimate event of the series.

In order to make this work the NBD needs to be parameterized differently for mean, variance, and number of runs allowed [failures]. The NBD can be written as

where

So using the same example as the PD distribution, this would yield:

The above equations are adapted from this blog about negative binomials and this one about applying the distribution to baseball. The Γ function used in the equation instead of a combination operator because the combination operator can’t handle the non-whole numbers we are using to describe the number of successes.

Conclusion

The negative binomial distribution is really useful in modeling the distribution of discrete count data from baseball for a given inning or game. The most interesting aspect of the NBD is that a success is considered getting out of the inning/game, while a failure would be letting a run score. This is a little counterintuitive if you approach modeling the distribution from the perspective of the batting team. While the NBD has a better fit, the Poisson distribution has a simpler concept to explain: the count of discrete events over a given period of time, which might make it better to discuss over beers with your friends.

The fit of the NBD suggests that run scoring is a negative binomial process, but inconsistencies especially with shut outs indicate elements of the game aren’t completely random. I’m explaining the underestimation of the number of shut outs as the increase use of the best relievers in shut out games over other games increasing the total number of shut outs and subsequently decreasing the frequency of other run-total games.

All MLB data is from retrosheet.org. It’s available free of charge from there. So please check it out, because it’s a great data set. If there are any errors or if you have questions, comments, or want to grab a beer to talk about the Poisson distribution please feel free to tweet me @seandolinar.

## Pirates Do Not Need Help Against Left-Handed Pitching

Stats in this post are current up to right before the July 31, 2014 PIT-ARZ game.

The MLB non-waiver trade deadline just passed. I’m not interesting in debating what teams should or should not have done except to say the price for quality players was very high this year. The whole supply & demand, free market thing really worked in the favor of teams that were already out of the post season race. It was suggested that the Pirates needed a right-handed batter (RHB), since they don’t do well against left-handed pitching (LHP). I had my doubts this was really true believing adding an additional RHB won’t improve the team much. MLB teams generally do better against LHP, since most batters are RHB and the RHB/LHP split favors the batter.

Before getting into this, LHP make up only 21% of the Pirates’ season-to-date plate appearances, out of all the problems the Pirates could have making a roster move to address this isn’t necessary unless you are looking to platoon. More on that later.

Looking at the team batting splits, the Pirates have an overall .722 OPS and a LHP .670 OPS. On the surface, it appears they are performing worse against LHP, and I will concede the argument the Pirates HAVE performed worse against LHP so far in 2014, but this shouldn’t continue going forward.

The Pirates have 4,152 plate appearances racked up thru July 30th, but only 867 of them have occurred against LHP (~21%). To put this in perspective, that is equivalent to less than one month of games. How accurate are batting statistics at the end of April? They aren’t. Put simply the Pirates ‘struggles’ against LHP can mostly be attributed to a small sample size.

I went and laid out all the outcomes (1B, BB, 2B, etc.) in a vector of plate appearances and had the computer randomly draw 900 samples from the entire Pirates season and computed the OPS 1000 different times. Then I plotted them below.

Due to the central limit theorem the mean should hover around .720 (the overall OPS) and the data should be normally distributed. Because of this I constructed the normal distribution curve and then used that to calculate the probability that a 900 plate-appearance sample can be drawn from the Pirates’ total plate appearances. It turns out 9% of the time the program will select plate appearances that total a < .670 OPS. 9% isn’t that likely, but it is not outrageous to conclude the Pirates’ low vsLHP OPS is due to small sample size.

This is not just applicable to LHP vs overall splits, but any low-percentage split including RISP. I wrote about this previously and came to a similar conclusion.

The composite distribution curves below illustrate what happens when sample size increases and why small small sizes are problematic. The vertical line is the .670 OPS mark. On the 900-sample distribution (vs LHP) there is a 9% probability of drawing a .670 OPS from the Pirates’ total plate appearances. This is the area underneath the curve to the left of the red line. Using the 3000-sample distribution curve, it’s 0.0016%. There is barely any area under the 3000-PA curve at that point, and this is a huge difference. (3000 samples are approximately how many the team has had against RHP.)

One more graph! This is a histogram of the differences between the LHP OPS and the overall OPS. The Pirates are on the low end of it. Not great, but there’s a lot of variation there.

Switching from statistics to baseball, the Pirates have the second-fewest plate appearances against LHP in MLB. They are 11-9 in games started by a LHP. That alone should discount the poor-performance-against-LHP argument, but obviously the team batting stats suggests that they are and it has been woven into a narrative.

Looking closely at the Pirates’ roster there are many solid RHBs, McCutchen (their best hitter), Martin, Marte, Sanchez, and Mercer/Harrison are pretty good against lefties. Now, some of these player are underperforming against LHP this year, but this is where the small sample size comes in again. You wouldn’t determine any of these batters lost their platoon advantage after only 80 plate appearances. Going forward almost all of these bats should regress to their normal platoon splits.

Pedro Alvarez, Gregory Polanco, Ike Davis. Their platoon splits are pretty atrocious both for 2014 and career-wise. For example, Alvarez has a .787 OPS vs RHP and a .517 OPS against LHP this year. I don’t want to get into analyzing what’s wrong with the Pirates’ left-handed bats, except to say they are terrible against LHP. The argument should change from the Pirates don’t do well against LHP to the Pirates’ left-handed batters are terrible against LHP.

What can be done about this? The simple answer is to get better left-handed batters. Since that’s not really possible, the next best option would be platooning the left-handed batters. Ike Davis is already platooned with Gaby Sanchez, and Pedro Alvarez is barely starting any games. Polanco has regressed from his debut, but I think the best idea is for him to play everyday and deal with LOOGY relievers. I also don’t know how many fans actually want to see or are suggesting that he’s should be platooned. With all this in mind I’m not quite sure what acquiring a right-handed bat would accomplish. The Pirates are already trying to find a place for RHB Josh Harrison to play. He’s been having a good season, no matter what you think about Harrison. Furthermore, the Pirates have a guy who’s been killing LHP this year and has decent splits against them for his career. And that’s Jose Tabata.

Bottom line, adding a RHB wouldn’t help much because the team splits are still a small sample size against LHP. Beyond the statistics, the two big left-handed bats have terrible splits against LHP, and these problems have been already addressed by platooning and benching.