Relationship Between OBP and Runs Scored in College Baseball

There is a segment of the population of the United States which meets the following criteria:  between the age of 18-21, devout FanGraphs reader, and was mesmerized by the movie “Moneyball.”  I have read the book and watched the movie a number of times, as well as dedicating time to understanding the guiding principles in the book and how they relate to professional baseball.  The relationship between on-base percentage and scoring runs in Major League Baseball is well established, but has anyone ever taken the time to examine the relationship at the collegiate level?

Collegiate baseball is volatile — roster makeups change dramatically each year, no player is around more than five years, not to mention there are hundreds of teams competing against one another. In terms of groundbreaking sabermetric principles, this study is not intended to turn over any new stones, but rather present information which may have been overlooked up to this point, which is the relationship between on-base percentage and runs scored in collegiate baseball.

To conduct this study, I compiled a list of Southeastern Conference team statistics from the 2014-2017 seasons (Runs Scored, On Base Percentage, Runs Against, and Opponents’ On Base Percentage).  I then performed linear regression on the distribution by implementing a line of best fit.  Some teams’ seasons were excluded due to inability to access that season’s data, and I felt like removing the 2014 Auburn season on the grounds that it was an outlier affecting the output (235 runs, 0.360 OBP).  Below is the resulting math:  the R2, and the resulting predictive equation:

Runs Scored = ( 3,537. x OBP ) – 933.6791

R² = 0.722849

I am by no means a seasoned statistician, but in my interpretation of the R2 value, the relationship between Runs Scored and OBP in this is moderately strong, with a team’s OBP accounting for roughly 72.3% of the variation in Runs Scored in a season.  Simply, OBP is statistically significant in determining the offensive potency of a team.

At the professional level, the R2 is found to be around 0.90.  The competitive edge the Oakland A’s used in “Moneyball” was using this correlation to purchase the services of “undervalued” players.  But what about in college?  Colleges certainly cannot purchase their players, but the above information can be useful to college programs.

For example, the average Runs Scored per season of the sample I used was roughly 347.8.  If an SEC team wanted to set the goal of being “above average” offensively, they would be able to determine, roughly, what their target OBP should be by using the resulting predictive equation from the Linear Fit:

Does this mean if an SEC program produces an OBP of .362 they would score 348 runs precisely? Obviously not. Could they end up scoring exactly 348 runs? Yes, but variation exists, and statistics is the study of variation.  Here are a few seasons in which teams posted an OBP at or around 0.362, and the resulting run totals:

The average of those six seasons’ run totals was 347.5, which is pretty darn close to 348, and even closer to the average of 347.8 runs derived from the sample.

Another use for this information is lineup construction and tactical strategy in-game.  The people in charge of baseball programs do not need instruction on how to construct their roster and manage their team, but who would disagree with a strategy of maximizing your team’s ability to get on base?

The purpose of this study was to examine the relationship between On Base Percentage and Runs Scored in college baseball, and how the relationship compares to its professional counterpart.  To conclude, the relationship between OBP and runs exists at the collegiate level, and carries considerable weight and value if teams are willing to get creative in utilizing its ability.

 

Disclaimer: I am a beginner-level statistician, and if you have any suggestions or critiques of this article, please feel free to share them with me.

Theodore Hooper is a Student Assistant, Player Video/Scouting, for the University of Tennessee baseball program.  He can be reached at thooper3@vols.utk.edu or on LinkedIn at https://www.linkedin.com/in/theodore-hooper/



Print This Post

newest oldest most voted
Corey2
Member
Corey2

R-squared is just the fit of the model, OBP predicts 72% of run scoring, this indicates cluster around the best fit line which is a function of variance. To know how IMPORTANT on base percentage is to run scoring you need to look at the coefficient, which tells you how many runs you expect to gain per one-unit increase in on-base percentage along with the statistical significance of that coeffient, which will tell you how likely it is that your sample is different than 0.

So the lower R-square tells us that there are more things that are important to run scoring in the SEC apart from OBP than in MLB. What I think would be interesting would be to compare college to different eras in the pros in this regard. From watching a fair amount of college baseball, they bunt A LOT. This means they’re playing more of a deadball era style of baseball, it seems like that would reduce the importance of OBP because teams are negating the value of their runners reaching base. If you’ve got the data I would try including two controls in the model, sacrifice bunts (which I expect to be much higher in the SEC than in MLB) and home run rate (which I expect to be higher in MLB than in the SEC). R-squared is there to measure model fit not impact. I would also multiply OBP (and any other rate stat) by 1,000 to get rid of that ridiculous and difficult to interpret 3,537 coefficient).

Dominikk85
Member

I have not done statistics on that but IMO the lower the level, the bigger the influence of OBP. In mlb it is still very important but the higher you get the bigger the role of power becomes while in youth baseball many guys who make it to first score anyway even without extra base hits because defenses and batteries are worse at holding runners at first and making the third out before the guy gets to scoring position (errors, wild pitches, stolen bases..,).

At higher level you usually won’t string together 2 walks and two hits so you need some power to score that one walk or hit from first.

College baseball is of course not hs ball and pretty good itself but could see the power to obp ratio a little more in favor of OBP than in pro ball.

That is just my observation from watching amateur ball, it would need to be tested.

Beau Horan
Member

I love the decision to include pictures in your article. Really brings your ideas to life! How did you manage to fit screenshots in the post?

Brad McKay
Member

Nice idea! A constant area of debate in amateur ball is the extent to which analysis of MLB data is applicable at lower levels.

As Corey mentions, R-squared tells you the percentage of variability in runs scored that can be explained by OBP, but it doesn’t tell you how much an increase in OBP will increase run scoring.

I’m interested in this question so I did a quick follow up: I looked at 2017 MLB team runs scored and OBP and ran the regression. Since runs scored is a counting stat while OBP is a rate stat, this analysis is never going to be perfect, but just to get a ball park idea I converted the MLB runs totals to a 62 game schedule.

Interestingly, the slope of the line (the impact of OBP on run scoring) is steeper in college ball than in the MLB. The MLB slope was 2051.6 while in college it was 3537… those numbers are weird to interpret and probably very sensitive to the accuracy of my games played estimate of 62. But my guess is OBP is either equally important or (more likely) more important in the college game.