Archive for October, 2016

The Home Run Conundrum: Is It a Matter of How You Spin It?

I was looking into a separate but overlapping issue when I ran into the puzzling home run question. As has already been pointed out in prior research, exit velocities (EV) are up about a half a mile per hour over the last year; however, for most, this is not really a satisfying conclusion given the relatively small expected distance change from that amount of an EV increase. There has to be more to the story.

My other overlapping project was initially looking into loft. There seems to be an organizational push for more loft and players have made comments along these lines. Although the benefits of loft in terms of incremental runs are well-known, there has been very little discussion of the cost side of the equation – what is a player sacrificing in terms of optimal bat path / ball path matching? Of the three ways to generate loft, what is the cost for each and how do they rank? More to follow on all that in another article.

Organizations and players have touted backspin even longer than the more recent focus on loft. In terms of additional distance from backspin, it is significant. Research by Alan Nathan indicates spin could add 30-50 ft starting from a low spin rate. What if backspin was a key piece in the missing home run puzzle?

Since spin rates on hits are not yet available, I created a Distance Model based on EV and LA data from Baseball Savant where combinations of both EV and LA could be held constant (to a tenth) in order to separate out Unexpected Distance where spin is likely the largest component. I excluded all balls hit at Coors Field and focused on balls hit 90 MPH or more between the launch angles of 15 and 45 degrees. The Unexpected Difference was calculated for each hit in the range above for 2015 and 2016. Since the data showed a clear bias depending on the location of the hit, I made the following adjustments to take out directional bias based on the 2015 data:

Hit Location          Directional Bias (Ft)

Pull-Side Gap                   +17

Oppo-Side Gap                 + 7

Center                                + 7

Pull                                    –  6

Oppo                                  -12


Clearly, balls hit predominantly with backspin have more lift than those hit flat or with side-spin. Considering that Coors Filed alone was a +17.5 average difference, the average ball hit to the pull-side gap is about the same magnitude as hitting at 5,200 feet. Just for fun, I ran the Unexpected Distance for a pull-side gap hit at Coors Field — a whopping 39.8 feet!

Analysis of Launch Angle Buckets

On the whole, exit velocity, launch angle and distance on well-hit balls (>=90 MPH and >=15 degree LA) are all little changed from last year. However, the launch-angle buckets indicate that backspin is likely a factor, particularly in the 30-35 and 35-40 degree segments which account for a combined 58% of the increase in HRs over 2015 while only representing a combined 32% of the categories. Additionally, the majority of the 6ft and 7ft increase in these categories, respectively, are coming from the Mean Unexpected Distance (MUD) — or most likely spin.

15-20 20-25 25-30 30-35 35-40 >40
Chng EV (MPH) 0.4 0.4 0.6 0.5 0.3 0.1
Chng Avg. Dist (Ft) (1.1) 1.4 2.5 6.0 7.1 2.8
Chng MUD (Ft) (3.6) (0.9) 0.3 3.9 5.6 2.5
Chng HRs (23) 90 111 190 54 (7)

Note: Home runs in both years only include those with EV and LA data.

Looking at the distribution of balls in the launch-angle groups over the past two years, there has been very little movement between the groups other than a slight move from the lowest to the highest group (below).

Distribution of Balls Hit >=90 MPH and >=15 Degrees

15-20 20-25 25-30 30-35 35-40 >40
2015 23.3% 20.6% 17.8% 13.6% 9.7% 15.0%
2016 22.6% 20.6% 17.8% 13.6% 9.6% 15.8%


As reflected in the data, it is not that there are significantly more lofted balls being hit but the ones in the 30-40 degree range are being hit with significantly more backspin relative to last year.

In diving into the home runs in the 30-40 degree category for both years, I was expecting to see players with either high or increasing MUD values. While there were some of those players…

HRs in the 30-40 Degree Group (Backspin Gainers)

2015 HRs 2016 HRs Chng 2015 MUD 2016 MUD MUD Chng
Brad Miller 2 7 5 (3.7) 8.3 12.0
Ryan Braun 4 9 5 (1.9) 8.1 10.0
Mookie Betts 4 8 4 0.6 8.9 8.3


There were also some in the “flat” hitting group that were simply just hitting the ball “less flat than last year” that are showing up in the positive MUD change group…

HRs in the 30-40 Degree Group (Flat Hitters – Hitting Less Flat)

2015 HRs 2016 HRs Chng 2015 MUD 2016 MUD MUD Chng
Kris Bryant 13 25 12 (17.0) (10.2) 6.8
Evan Longoria 3 13 10 (4.0) 0.0 4.1
Miguel Cabrera 3 9 6 (8.4) (5.6) 2.8
Victor Martinez 4 11 7 (5.5) (2.0) 3.5


At this point, I was about to conclude that spin is definitely a factor but it could just be noise rather than an organizational push for more loft and/or backspin…and then I read Jeff Sullivan’s post the other day and now it all fits! Look at the table below of the players with the highest and lowest MUD values for 2016 and see if you can find it.

Top 10 MUD (Backspin Hitters) 2016 Avg EV Avg LA Avg Dist MUD
Max Kepler 97.3 24.6 362.2 16.7
Melky Cabrera 97.0 24.1 349.3 12.5
Martin Prado 95.8 23.9 346.9 11.7
Ketel Marte 94.9 23.7 340.1 11.2
Aledmys Diaz 97.8 26.4 357.7 11.1
Cheslor Cuthbert 97.4 24.9 346.7 11.1
Aaron Hill 95.9 25.0 345.0 11.0
Yangervis Solarte 97.5 27.1 355.4 9.8
Alexei Ramirez 94.4 29.3 348.1 9.2
Adeiny Hechavarria 95.8 24.6 342.8 9.2
Average 96.4 25.4 349.4 11.3


Bottom 10 MUD (Flat Hitters) 2016 Avg EV Avg LA  Avg Dist MUD
Freddie Freeman 100.0 27.8 343.2 (14.6)
J.D. Martinez 102.1 27.7 355.7 (13.1)
Addison Russell 99.0 27.1 343.1 (12.4)
Chris Davis 101.5 28.6 358.7 (11.2)
Joe Mauer 97.7 25.2 330.2 (10.7)
Trevor Story 99.2 28.0 350.1 (10.6)
Kris Bryant 100.1 29.8 353.1 (10.2)
Joey Votto 98.8 28.2 344.2 (9.5)
Mark Teixeira 99.5 26.8 348.1 (9.4)
Nick Castellanos 99.5 28.3 350.0 (8.8)
Average 99.8 27.8 347.6 (11.0)


Yes, of course! The answer is that it is not just because chicks dig the long ball, it’s that the market that values the players digs the long ball. Notice the significant difference in the exit velocities of the two groups. The players who are relying on spin are doing so because they have to get more distance and HRs out of their existing tool kit and are willing to pay (in terms of consistency) in order to get it. The players with higher exit velocities and hence more “natural power” can continue in their square hitting ways since they have no need to pay a high price for something they already possess. I didn’t average the height and weight of the two groups but I think it is clear that the backspin group is significantly smaller in stature than the flat-hitting group. Note the 2 ft average distance advantage of the backspin group with a whopping 3.4 lower average MPH difference!

Another interesting tidbit from the above data is the average launch angle is significantly lower for the higher backspin group. While this may seem counter-intuitive, it actually makes complete sense – in order to get backspin, you have to have less loft in the swing and rely on the ball contact point for loft. Since this is no easy feat, balls will tend to come off the bat with more variability with many hits matching the amount of loft in the swing and hence a lower trajectory.

What is happening with the home run issue is not randomness that is going to revert to the mean. It is a secular trend that is the result of the incentives in the system. Hitting for average with no power is out of style and players, particularly those with lower EVs, are likely responding by getting the ball out of the park any way they can – whether it is swinging harder, utilizing more backspin, or hitting to the shorter (pull) side of the field. (Could the latter be the next big trend?) While there will likely be additional findings regarding the home run question, the way I see it, at least part of it is as clear as MUD.

Modeling Walk Rate Between Minor League Levels

After reading through Projecting X by Mike Podhorzer I decided to try and predict some rate statistics between minor league levels. Mike states in his book “Projecting rates makes it dramatically easier to adjust a forecast if necessary.”; therefore if a player is injured or will only have a certain number of plate appearances that year I can still attempt to project performance. The first rate statistic I’m going to attempt project is walk rate between minor league levels. This article will cover the following:

Raw Data

Data Cleaning

Correlation and Graphs

Model and Results


Raw Data

For my model I used data from Baseball Reference and am using the last seven years of minor league data(2009-2015). Accounting for the Short-Season A (SS-A) to AAA affiliates I ended up with over 28,316 data points for my analysis.

Data Cleaning

I’m using R and the original dataframe I had put all the data from each year in different rows. In order to do the calculations I wanted to do I needed to move each player’s career minor league data to the same row. Also I noticed I needed to filter on plate appearances during a season to make sure I’m getting rid of noise. For example, a player on a rehab assignment in the minor leagues or a player who ended up getting injured for most of the year so they only had 50-100 plate appearances. The minimum plate appearances I ended up settling on was 200 for a player to be factored into the model. Another thing I’m doing to remove noise is only attempting to model player performance between full-season leagues (A, A+, AA, AAA). Once the cleaning of the data was done I had the following data points for each level:

  • A to A+ : 1129
  • A+ to A: 1023
  • AA to AAA: 705

Correlation and Graphs

I was able to get strong correlation numbers for walk rate between minor league levels. You can see the results below:

  • A to A+ : .6301594
  • A+ to AA: .6141332
  • AA to AAA: .620662

Here’s the graphs for each level:




Model and Results

The linear models for each level are:

  • A to A+: A+ BB% = .63184*(A BB%) + .02882
  • A+ to AA: AA BB% = .6182*(A+ BB%) + .0343
  • AA to AAA: AAA BB% = .5682(AA BB%) + .0342

In order to interpret the success or failure of my results I compared how close I was to getting the actual walk rate. FanGraphs has a great rating scale for walk rate at the major league level:


Image from Fangraphs

The image above gives a classification for multiple levels of walk rates. While based on major league data it’s a good starting point for me to decide a margin of error for my model. The mean difference between each level in the FanGraphs table is .0183. I ended up rounding and made my margin for error .02. So if my predicted value for a player’s walk rate was within .02 of being correct I counted the model as correct for the player and if my error was greater than that it was wrong. Here are the models results for each level:

  • A to A+
    • Incorrect: 450
    • Correct: 679
    • Percentage Correct: ~.6014
  • A+ to A
    • Incorrect: 445
    • Correct: 578
    • Percentage Correct: ~.565
  • AA to AAA
    • Incorrect: 278
    • Correct: 427
    • Percentage Correct: ~.6056

When I moved the cutoff up a percentage to .03 the model’s results drastically improve:

  • A to A+
    • Incorrect: 228
    • Correct: 901
    • Percentage Correct: ~.798
  • A+ to AA
    • Incorrect: 246
    • Correct: 777
    • Percentage Correct: ~.7595
  • AA to AAA
    • Incorrect: 144
    • Correct: 561
    • Percentage Correct: ~.7957


Numbers are cool but where are the actual examples? OK, let’s start off with my worst prediction. The largest error I had between levels was A to A+ and the error was >10% (~.1105). The player in this case was Joey Gallo. A quick glance at the player page will show his A walk rate was only .1076 and his A+ walk rate was .2073 which is a 10% improvement between levels. So why did this happen and why didn’t my model do a better job of predicting this? Currently the model is only accounting for the previous season’s walk rate, but what if the player is getting a lot of hits at one level and stops swinging as much at the next? In Gallo’s case he only had a .245 BA his year at A-ball so that wasn’t the case. More investigation is required to see how the model can get closer on edge cases like this.


Gallo Dataframe Snippet

The lowest I was able to set the error to and still come back with results was ~.00004417. That very close prediction belongs to Erik Gonzalez. I don’t know Erik Gonzalez, so I continued to look for results. Setting the min error to .0002 brought back Stephen Lombardozzi as one of my six results. Lombo’s interesting to hardcore Nats fans (like myself) but I wanted to continue to look for a more notable name. Finally after upping the number to .003 for A to A+ data I was able to see that the model successfully predicted Houston Astros multi-time All-Star 2B Jose Altuve‘s walk rate within a .003 margin of error.


Altuve Dataframe snippet

What’s Next:

  • Improve algorithm for generating combined season dataframe
  • Improve model to get a lower error rate
  • Predict strikeout rate between levels
  • Eventually would like to predict more advanced statistics like wOBA/OPS/wRC+