- This study attempts to determine how much the fielders’ prowess, measured by the metric UZR (Ultimate Zone Range), affects a pitcher’s Earned Runs Average.
- The data used for the regression (collected from FanGraphs.com) includes collective ERA, BABIP, HR/9, BB/9, K/9 and UZR for every Major League Baseball team for the past three years.
- ERA (Earned Runs Average) is the amount of earned runs a pitcher allows per nine innings pitched. BABIP (Batting Average per Balls in Play) is the batting average against any given pitcher, but only including the at bats where the hitter puts the ball in play. HR/9 is home runs allowed per nine innings pitched. BB/9 is walks allowed per nine innings pitched. K/9 is batter struck out per nine innings pitched. UZR (Ultimate Zone Range) is a widely used metric to evaluate defense. It summarizes how many runs any given fielder saved or gave up during a season compared to the league average in that position.
- The model passed the F-test, the adjusted “R” squared came out at 91.2 percent and every one of the independent variables passed their respective t-test.
- The model tested negative for both Multicollinearity (using Variance Inflation Factors) and Heteroskedasticity (using the second version of the White’s test).
- The regression equation looks like this: ERA = -2.55 – 0.187 K/9 + 0.413 BB/9 +16.9 BABIP + 1.72 HR/9 – 0.00157 UZR. Even though the independent variable UZR has a low coefficient, it definitely affects a pitcher’s ERA, and in the way it was suspected. As the UZR goes up the ERA goes down.
Since Bill James started to write about baseball in the late 1970’s and started to defy the traditional stats used to evaluate players, hundreds of baseball fans have tried to follow his footsteps creating new ways to evaluate players and defy the existing ones. One of the stats that has been brought to light lately is Earned Runs Average (ERA).
According to several baseball analysts ERA is not an efficient way to evaluate how good or bad a pitcher performs. The rationale behind this thinking is pretty simple; ERA is the amount of earned runs that any given pitcher allows per nine innings pitched, but the pitcher is not always 100 percent responsible for every earned run allowed. Sometimes, a fielder’s lack of defensive prowess will allow hitters to reach base safely (I am not talking about errors), and when it happens, rather often, those hits will translate into earned runs, thus affecting the pitcher’s ERA.
One of the metrics that has been used to determine any given fielder’s prowess is UZR (Ultimate Zone Range). UZR compiles data on the outfielders arms, fielder range and errors and summarizes the amount of runs those fielders saved or gave up during a season compared to the league average in that position. Using that metric along with other metrics that affect the ERA, we can answer the question “How much does defensive prowess impacts a pitcher’s ERA?”
If in fact defensive prowess affects ERA, we could also determine how much it affects it. With that kind of information, cost-effective teams (Tampa Bay Rays and Oakland Athletics) can help improve their pitching staff without investing heavily on new pitchers.
The unit of observation for this study is one Major League Baseball team. And the number of observations is 90. Currently, there are 30 Major League Baseball teams, so data was collected for the past three Major League Baseball seasons. So the time period covered goes from 2010 to 2012, including both seasons.
The dependent variable used in this project was Earned Runs Average, and the independent variables are as follow:
- BABIP: Batting average per balls in play
- HR/9: Homeruns allowed per nine innings pitched
- BB/9: Walks allowed per nine innings pitched
- K/9: Hitters struck out per nine innings pitched
- UZR: Runs saved or given up by any given fielder during a season
All the data for this study is cross-sectional because all the observations have been collected at the same point of time.
All the data for this study was collected from the baseball website FanGraphs.com. FanGraphs is a widely known source of baseball stats and news, but the data they publish on their website is collected by another company called Baseball Info Solutions.
Regression Analysis: ERA versus BABIP, HR/9, BB/9, K/9 and UZR
The regression equation is
ERA = – 2.55 – 0.187 K/9 + 0.413 BB/9 + 16.9 BABIP + 1.72 HR/9 – 0.00157 UZR
Predictor Coef SE Coef T P VIF
Constant -2.5474 0.5594 -4.55 0.000
K/9 -0.18718 0.02428 -7.71 0.000 1.099
BB/9 0.41261 0.04671 8.83 0.000 1.052
BABIP 16.914 1.876 9.02 0.000 1.741
HR/9 1.7222 0.1105 15.58 0.000 1.180
UZR -0.0015743 0.0006219 -2.53 0.013 1.669
S = 0.133650 R-Sq = 91.7% R-Sq(adj) = 91.2%
Analysis of Variance
Source DF SS MS F P
Regression 5 16.5663 3.3133 185.49 0.000
Residual Error 84 1.5004 0.0179
Total 89 18.0668
The first step used to evaluate the model was the F-test, and since the model has a p-value less than 0.05, it is safe to say that the model passed the F-test. The adjusted “R” squared for the model was 91.2 percent, which means that 91.2 percent of the variation in ERA is explained by at least one of the independent variables used in this model. The method used to evaluate the relevance of the independent variables was the t-test, and each one of them, as mentioned earlier, had a p-value below 0.05, so in conclusion, they all passed the t-test. The p-value for K/9, BB/9, BABIP and HR/9 was 0.000 for each one of them, and the p-value for UZR was 0.013.
MODEL ESTIMATION SEQUENCE
- Correct functional form: To check for correct functional form, each one of the independent variables was plotted against the dependent variable. The scatter plots that resulted from this check show a linear relationship between each one of the independent variables and the dependent variable.
- Test for Heteroskedasticity: The data for this study is cross-sectional, so it was necessary to test for Heteroskedasticity, and such test was conducted by the second version of White’s test. To do so, the residuals for the original regression were stored. Those squared residuals were regressed against the Independent variables and the independent variables squared. After running the regression, an the F-test was applied to it and since the p-value was over 0.05, it can be concluded that the regression fails the F-test, therefore Heteroskedasticity does not exist in the initial model.
- Multicollinearity: This model also tested for Multicollinearity and it is done by using the correlation matrix and the Variance Inflation Factors, observed in the initial regression.
- Since none of the VIF’s is larger than 10, it can be concluded that Multicollinearity does not exist and the p-values from the t-tests can be trusted.
- A correlation matrix was calculated using all the independent variables but since every one of them passed the t-test, none will be dropped from the model.
- K/9: p-value (0.000), VIF (1.099), rho (0.252)
- BB/9: p-value (0.000), VIF (1.052), rho (0.195)
- BABIP: p-value (0.000), VIF (1.741), rho(0.604)
- UZR: p-value (0.013), VIF (1.669), rho (0.604)
- Drop any irrelevant variable from the model: Since all the independent variables in this model are relevant, none of them will be dropped from the model.
The final model is exactly the same as the initial model because the it passed the F-test, all of the independent variables passed their t-tests and neither Heteroskedasticity or Multicollinearity are present in the model, so it was not necessary to run another regression or drop any variable.
- K/9: When the team strikes out one extra batter per nine innings, the team’s ERA should go down by 0.187 runs per nine innings holding everything else constant.
- BB/9: When the team walks one extra batter per nine innings, the team’s ERA should go up by 0.413 runs per nine innings holding everything else constant.
- BABIP: If every time a batter puts the ball in play he records a hit, the ERA will go up by 16.9 runs per nine innings. This variable is hard to explain since it will never go up by 1, it will go up or down depending on how many hits the team allows in any given number of at-bats where the batter puts the ball in play. For example, if a team averages eight hits every 27 outs, the BABIP will be 0.296 throughout the entire season. Taking into account that every batter put the ball in play (no strikeouts). The expected increase in ERA given a 0.296 BABIP during a season, and holding everything else constant, would be 5.00.
- HR/9: When the team allows one more homerun per nine innings, ERA should go up by 1.72 runs per nine innings holding everything else constant.
- UZR: When the team saves one extra run defensively, ERA should go down by 0.00157 runs per nine innings holding everything else constant.
The null hypothesis for this project stated that defensive prowess didn’t affect ERA, but the results showed otherwise, so it is safe to reject the null hypothesis. Defensive prowess appears to affect ERA although in a small scale. This might not seem like much, but cost-effective teams like the Rays and Athletics can acquire premium defensive players at a much cheaper cost than a premium pitcher, and although they won’t be “game changers,” they will definitely improve the team’s ERA.
Baseball is a game of numbers, and these numbers don’t lie. A good defender will help his team save runs; a lot of good defenders will help their team save multitude of runs. Is this enough to get to the postseason or win a World Series? Absolutely not, but it has been proven already that finding edges in the game, as little as they might be, will help a team in the long run. The findings in this study are a concise proof that taking advantage of defense is an edge that can be exploited for the betterment of the organization.