Author Archive

Making Baseball Slow Again

If you’re a baseball fan, you may have noticed you’ve been watching on average 10-15 minutes more baseball then you were 10 years ago.  Or maybe you are always switching between games like me and never stop to notice. If you’re not a fan, it’s probably why you don’t watch baseball in the first place: 3+ hour games, with only 18 minutes of real action. You are probably more of a football guy/gal right?  Believe it or not NFL games are even longer, and according to a WSJ study, deliver even less action.

The way the MLB is going, however, it may not be long before it dethrones the NFL as the slowest “Big Four” sport in America (and takes away one of my rebuttals to “baseball is boring”). Currently, the MLB is proposing pitch clocks and has suggested limiting privileges such as mound visits.

Before I get into the specific proposal and the consequences of these changes, let me give you some long winded insight into pace of play in the MLB.

A WSJ study back in 2013 broke down the game into about 4 different time elements:

  1. Action ~ 18 minutes (11%)
  2. Between batters ~ 34 minutes  (20%)
  3. Between innings ~ 43 minutes (25%)
  4. Between pitches ~ 74 minutes  (44%)

The time between pitches or “pace” is what everyone is focused on, and rightly so. It makes up almost twice as much time as any other time element and is almost solely responsible for the 11-12 minute increase in game length since 2008. Don’t jump to the conclusion that this is all the fault of the batter dilly-dallying or the pitcher taking his sweet time. This time also includes mound conferences, waiting for foul balls or balls in the dirt to be collected, shaking off signs and stepping off, etc. Even if we take all of those factors out, there are still two other integral elements that increase the total time between pitches: the total batters faced and the number of pitches per plate appearance (PA).  If either of these increase, the total time between pitches will increase by default. In the graph below, I separated the effects of each by holding the rest constant to 2008 levels to see how each factor would contribute to the total time added.

Any modest game time reduction due to declining total batters faced was made up by a surge in pitches per PA. Increasing pace between pitches makes up the rest.

As we have heard over and over again in the baseball world, the average game time has increased and is evident in the graph above. It’s not just that the number of long outlier games has increased; the median game time has actually crept up by about the same amount.

Plenty of players are at fault for the recent rise in game time. You can check out Travis Sawchik’s post about “Daniel Nava and the Human Rain Delays” or just check out the raw player data at FanGraphs. Rather than list the top violators here, I thought it would be amusing to make a useless mixed model statistic about pace of play.

A mixed model based statistic, like the one I created in this post, helps control for opposing batter/pitcher pace and for common situations that result in more time between pitches. Essentially, for the time between each pitch, we allocate some of the “blame” to the pitcher, batter, and the situation or “context”.

I derive the pace from PITCHf/x data, which contains details about each play and pitch of the regular season. I define pace as the time between any two consecutive pitches to the same batter excluding intervals that include pickoff throws, stolen bases, and other actions documented in PITCHF/x (This is very similar to FanGraphs’ definition, but they calculate pace by averaging over all pitches in the PA, while I calculate by pitch). For more specifics, as always, the code is on GitHub.

It’s a nice idea and all, but does context really matter?

The most obvious example comes from looking at the previous pitch. Foul balls or balls in the dirt trigger the whole routine involved in getting a new ball, which adds even more time. The graph below clearly shows that time lags when pitches aren’t caught by the catcher.

The biggest discrepancy comes with men on base. Even though pickoff attempts and stolen bases are removed from the pace calculation, it still doesn’t account for the game’s pitchers play with runners on base. This includes changing up their timing after coming set or stepping off the rubber to reset.

The remainder of the context I’ve included illustrates how pace slows with pressure and fatigue as players take that extra moment to compose themselves.

As the game approaches the last inning and the score gets closer, time between pitches rises (with the exception of a score differential of 0, since this often occurs in the early innings).

And similarly, as we get closer to the end of a PA from the pitcher’s point of view, pace slows.

Context plays a large part in pace meaning that some players who find themselves in notably slow situations, are not completely at fault. I created the mixed model statistic pace in context, or cPace, which accounts for all of the factors above. cPace can essentially be interpreted as the pace added above the average batter/pitcher, but can’t be compared across positions.

When comparing the correlation of Pace and cPace across years, cPace seems like a better representation of batters’ true tendencies. My guess is that, pitchers’ pace varies more than the average hitter, so many batters’ cPace values benefited from controlling for the pitcher and other context.

After creating cPace, I came up with a fun measure of overall pace: Expected Hours Added Per Season Above Average or xHSAA for short. It’s essentially what it sounds like: how many hours would this player add above average given 600 PA (or Batters Faced) in a season and league average pitches per PA (or BF).

The infamous tortoise, Marwin Gonzalez, leads all batters with over 3 extra hours per season more than the average batter.

That was fun. Now back to reality and MLB’s new rule changes. Here is the latest proposal via Ken Rosenthal:

The MLB tried to implement pace of play rules in 2015, one of which required batters to keep one foot inside the box with some exceptions. The rules seemed to be enforced less and less, but an 18- or 20-second pitch clock is not subjective and will potentially have drastic consequences for a league that averages 24 seconds in-between pitches. Some sources say the clock actually starts when the pitcher gets the ball. Since my pace measure includes the time between the last pitch and the pitcher receiving the ball, the real pace relative to clock rules may be 3-5 seconds faster.

Let’s assume that it’s five seconds to be safe. If a pitcher takes 20 seconds between two pitches, we will assume it’s 15 seconds. To estimate the percentage of pitches that would be affected by these new rules I took out any pitches not caught by the catcher, assuming all the pitches left were returned to the pitcher within the allotted five seconds.

The 18-second clock results in about 14% of the pitches with no runners on in 2017 resulting in violations of the pitch clock. This doesn’t even include potential limits on batters times outside the box or time limits between batters, so we can safely say this is a lower bound. If both of the clocks are implemented in 2020, at least 23% of all pitches would be in violation of the pitch clock(excluding first pitch of PA). Assume it only takes three seconds to return the ball to the pitcher instead of five, and that number jumps to 36%!

And now we are on the precipice of the 2018 season, which could produce the longest average game time in MLB history for the second year in a row as drastic changes loom ahead. I don’t know who decided that 3:05 was too long or that 15 minutes was a good amount of time to give back to the fans. Most likely just enough time for fans to catch the end of a Shark Tank marathon.

Anyways, if game times keep going up, something will eventually have to be done. However, even I, a relatively fast-paced pitcher in college, worry that pitch clocks will add yet another element to countless factors pitchers already think about on the mound.

There are certainly some other innovative ideas out there: Ken Rosenthal suggests the possibility of using headsets for communication between pitchers and catchers, and Victor Mather of the NYT suggests an air horn to bring in new pitchers instead of the manager. Heck, maybe it’ll come down to limiting the number of batting glove adjustments per game. Whatever the league implements will certainly be a jolt to players’ habits and hardcore baseball fans’ intractable traditionalist attitude. The strategy, technology, and physicality of today’s baseball is changing more rapidly than ever. When the rules catch up, I have a feeling we will still like baseball.


Thinking Like an MLB MVP Voter

Photo: Yi-Chin Lee/Houston Chronicle

Baseball season is coming to a close and the Baseball Writers’ Association of America (BBWAA) will soon unveil its votes for AL and NL MVP. The much-anticipated vote is consistently under the public microscope, and in recent years has drawn criticism for neglecting a clear winner *cough* Mike Trout *cough*. This being one of the closest all-around races in years, voters certainly have some tough decisions to make. This might be the first year since 2012 where it’s not wrong to pick someone other than Mike Trout for AL MVP.

Of course, wrong is subjective. The whole MVP vote is subjective. Voter guidelines are vague and leave much room for interpretation. The rules on the BBWAA website read:

There is no clear-cut definition of what Most Valuable means. It is up to the individual voter to decide who was the Most Valuable Player in each league to his team. The MVP need not come from a division winner or other playoff qualifier. The rules of the voting remain the same as they were written on the first ballot in 1931:

1.  Actual value of a player to his team, that is, strength of offense and defense.

2.  Number of games played.

3.  General character, disposition, loyalty and effort.

4.  Former winners are eligible.

5.  Members of the committee may vote for more than one member of a team.

It won’t do any good for me to saturate the web with another opinion piece on who deserves to win. It won’t change the vote, and I don’t think I could choose. My goal is rather to illustrate how BBWAA voters have interpreted these rules over time. Have modern sabermetrics driven any shifts in voter consideration? Do voters actually consider team success? Do voters unconsciously vote for players with a better second half?

I thought the best (and most entertaining) way to answer these questions would be to create a model that would act as an MVP voter bot. Lets call the voter bot Jarvis. Jarvis is a follower.

  1. Jarvis votes with all the other voters.
  2. It detects when the other voters start changing their voting behavior.
  3. It evaluates how fast the voters are changing behavior and at what speed it should start considering specific factors more heavily.
  4. It learns by predicting the vote in subsequent years.

I created two different sides to Jarvis. One that is skilled at predicting the winners, and one that is skilled at ordering the players in the top 3 and top 5 of total votes. The name Jarvis just gives some personality to the model in the background: a combination of the fused lasso and linear programming. And it also saves me some key strokes. If you are interested in the specifics, skip to the end, but for those of you who’ve already had enough math, I will spare you the lecture.

Jarvis needs historical data from which to learn. I concentrated on the past couple decades of MVP votes spanning 1974 to 2016 (1974 was the first year FanGraphs provided specific data splits I needed). I considered both performance stats and figures that served as a proxy for anecdotal reasons voters may value specific players (e.g., played on a playoff-bound team). For all performance-based stats, I adjusted each relative to league average — if it wasn’t already — to enable comparison across years (skip to adjustments here).  Below are some stats that appeared in the final model.

Position player specific stats: AVG, OBP, HR, R, RBI

Starting pitcher (SP) specific stats: ERA, K, WHIP, Wins (W)

Relief pitcher (RP) specific stats: ERA, K, WHIP, Saves (SV)

Other statistics for both position players and pitchers:

Wins Above Replacement (WAR) Average of FanGraphs and Baseball Reference WAR

Clutch – FanGraphs’ measure of how well a player performs in high-leverage situations

2nd Half Production – Percent of positive FanGraphs WAR in 2nd half of season

Team Win % – Player’s team winning percentage

Playoff Berth – Player’s team reaches the postseason

Visualizing the way Jarvis considers different factors (i.e. how the model’s weights change) over time for position players reveals trends in voter behavior.

Immediately obvious is the recent dominance of WAR. As WAR becomes socialized and accepted, it seems voters are increasingly factoring WAR into their voting decisions. What I’ll call the WAR era started in 2013 with Andrew McCutchen leading the Pirates to their first winning season since the early 90s. He dominated Paul Goldschmidt in the NL race despite having 15 fewer bombs, 41 fewer RBI, and a lower SLG and OPS. While Trout got snubbed once or twice since 2013, depending on how you see it, his monstrous WAR totals in ’14 and ’16 were not overlooked.

As voters have recognized the value of WAR, they have slowly discounted R and RBI, acknowledging the somewhat circumstantial nature of the two stats. The “No Context” era from ’74 to ’88 can be characterized perfectly by the 1985 AL MVP vote. George Brett (8.3 WAR), Rickey Henderson (9.8), and Wade Boggs (9.0) were all beaten out by Don Mattingly (6.3), likely because of his gaudy 145 RBI total.

Per the voting rules, winners don’t need to come from playoff-bound teams, yet this topic always surfaces during the MVP discussion. Postseason certainly factored in when Miggy beat out Mike Trout two years in a row, starting in 2012. See that playoff-berth bump in 2012 on the graph below? Yeah, that’s Mike Trout. What the model doesn’t consider, however, are the storylines, the character, pre-season expectations: all the details that are difficult for a bot to quantify. For example, I’ve seen a couple of arguments for Paul Goldschmidt as the front-runner to win NL MVP after leading a Diamondbacks team with low expectations to the playoffs. I’ll admit, sometimes the storylines matter, and in a year with such a close NL MVP race, it could push any one player to the top.

What can I say about AVG and HR? AVG is a useless stat by itself when it comes to assessing player value, but it’s ingrained in everyone’s mind. It’s the one stat everyone knows. Hasn’t everyone used the analogy about batting .300 at least once? Home runs…they are sexy. Let’s leave it at that.  Seems like these are always on the minds of MVP voters and that is not likely to change any time soon.

I’m sure some of you are already thinking, “What about pitchers!?” Don’t worry, I haven’t forgotten — although it seems MVP voters have. Only three SP and three RP have won the MVP award since 1974, and pitchers account for only about 7.5% of all top-5 finishers. As you can see in the factor-weight graph below, their sparsity in the historical data results in little influence on the model; voter opinions don’t change often, and their raw weights tend to be lower than position players. Overall, it seems as though wins continue to dominate the SP discussion, along with ERA and team success. While I would expect saves to have some influence, voters tend to be swayed by recency bias and clutch performance along with WHIP and WAR.

What would an MVP article be without a prediction? Using the model geared to predict the winners, here are your 2017 MLB MVPs:

AL MVP: Jose Altuve    Runner Up: Aaron Judge

NL MVP: Joey Votto   Runner Up: Charlie Blackmon

Here are the results from the model tuned to return the best top-3 and top-5 finisher order:

It’s apparent that I adjusted rate and counting stats for league and not park effects given both Rockies place in the top 2. Certainly, if voters are sensitive to park effects, Stanton and Turner get big bumps, and Rockies players likely don’t have a chance. Larry Walker was the only Colorado player to win the MVP since their inception in 1993, but in a close 2017 race it might make the difference.

Continue reading below for the complete methodology and checkout the code on github.

A previous version of this article was published at

Statistical Adjustments

Note: lgStat = league (AL/NL) average for that stat, qStat = league average for qualified players, none of the adjusted stats are park adjusted

There were two different adjustments needed for position player rate stats and count stats.

Rate stat adjustment:  AVG+ =  AVG/lgAVG  

Count stats: HR, R, RBI

Count stat adjustment:  HR Above Average =  PA*(HR/PA – lgHR/PA)

There were three different adjustments needed for starting pitcher (SP) and relief pitcher (RP) rate stats and count stats.

Rate stats: ERA, WHIP

Rate stat adjustment:  ERA+ =  ERA/lgERA  

Count stats I: K

Count stat I adjustment:  K Above Average =  IP*(K/IP – lgK/IP)

Count stats II: Wins (W), Saves (SV)

Count stat II adjustment:  Wins Above Average = GS*(W/GS – qW/GS)

Fused Lasso Linear Program

I combined two different approaches to create a model I thought would work best for the purpose of predicting winners and illustrating change in voter opinions over time. Stephen Ockerman and Matthew Nabity’s approach to predicting Cy Young winners was the inspiration for my framework for scoring and ordering players. A players score is the dot product of the weights (consideration by the voters) and the player’s stats.

The constraints in the optimization require the scores of the first place player to be higher than the second place, and so on and so on. This approach, however, doesn’t allow for violation of constraints. I add an error term for violation of these constraints, and minimize the amount by which they are violated.

Instead of constraining the weights to sum to 1, I applied concepts from Robert Tibshirani’s fused lasso which simultaneously apply shrinkage penalties to the absolute value of weights themselves as well as the difference between weights for the same stat in consecutive years. This accomplishes two things: 1) it helps perform variable selection on statistics within years helping combat collinearity between some performance statistics, and 2) it ensures that weights don’t change too quickly overreacting to a single vote in one year.

However, this approach and formulation cannot be solved by traditional linear optimization methods since absolute value functions are non-linear. The optimization can be reformulated as follows:

To select the lambda parameters, I trained the model using the first 10 seasons of scaled data increasing the training set by 1 season each time and tested with the subsequent year’s vote.After in season statistical adjustments, I scaled the stats by mean and standard deviation of training data to enable comparison across coefficients. All position player stats were replaced with 0 for pitchers and vice versa.


1. Ockerman, Stephen and Nabity, Matthew (2014) “Predicting the Cy Young Award Winner,” PURE Insights: Vol. 3, Article 9.

2. R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67(1):91–108, 2005.


Stealing Bases and Splitting the Rewards

The contextual revolution (don’t really know if that’s a thing, but it sounds official) emerged in the MLB the past few years, attempting to control for more situational effects than current sabermetric-driven baseball stats. These models build upon Bill James’s work, Tom Tango’s all-important linear weights, and similar metrics that account for league, park, and positional production.

Baseball Prospectus (BP) writers developed baseball statistics that further quantify performance using mixed models . You can find a good introduction to mixed models in this article written by Jonathan Judge, Harry Pavlidis and Dan Brooks of BP, but if you are familiar with linear or logistic regression, a mixed model attempts to estimate the average performance over the course of the season (fixed linear model) and use the residuals (or error) to simultaneously quantify the contributions of “random” participants in any given play. Now, why do I say random? It isn’t so much that these participants are random, but that the baseball players are always changing and the number of “random” interactions they have throughout a season is endless, while the effect of an 0-2 count on run production stays relatively consistent or fixed throughout a whole season.

Some existing baseball stats based on mixed models include:

  1. Called Strikes Above Average (CSAA) — defensive statistic that measures catcher framing skills controlling for the batter, pitcher, catcher, and umpire
  2. Swipe Rate Above Average (SRAA) — base running metric that attempts to quantify base stealing ability for batters, and stolen base prevention for pitchers and catchers
  3. Take Off Rate Above Average (TRAA) — player specific effects on base stealing attempts
  4. cFIP — a new version of Fielding Independent Pitching (FIP) taking into account many aspects of a plate appearance. Read more about it here.

By the title, you can probably guess this article is about stolen bases, and you are correct. Specifically, I will be discussing Swipe Rate Above Average, or SRAA for short. SRAA is derived from a mixed model that attempts to account for the inning, the stadium, the quality of the pitcher, and the pitcher, catcher, and lead runner involved. SRAA is directly derived from a player’s random effect and is a single number, generally ranging from -10% to 10%, describing the additional probability a player contributes to a successful steal. For example,  Mike Trout had a 4% SRAA in 2016. Given the average stolen-base situation, Trout is 4% more likely to successfully steal than the average baserunner in 2016.

While SRAA accounts for pitcher skill using cFIP (See above link for more information), the quality of a pitcher can’t necessarily control for all variation in a pitcher’s pitch sequence or the occasional mistake in the dirt. Pitches in the dirt, pitch-outs, off-speed, and fastballs are treated equally in SRAA. Consequently, SRAA values may be lacking for runners that disproportionately get thrown out on pitch-outs or for catchers that consistently block balls in the dirt while still throwing out the runner.

Let’s explore some evidence of these effects before we include them in the pitch adjusted (pSRAA) model. I started by subsetting Retrosheet play-by-play data from the 2016 season to only stolen-base attempts by lead runners. For example, events with a steal of second base with a man on third were not included. I only included situations where a pitch preceded a stolen-base attempt. I supplemented the play-by-play data with PITCHf/x data which tracks trajectories of every pitch in the MLB. I aligned the pitch data with each stolen base with minimal missing connections between the two data sets. Only three stolen bases did not have PITCHf/x data since there technically wasn’t a pitch that occurred (e.g., steal of third, then steal home on a passed ball). An additional eight did not have valid trajectory readings in PITCHf/x.  I ended up with 2,809 total attempts. Excluding some of these stolen bases means, for those who are familiar with SRAA, my SRAA numbers will not match up directly with BP’s numbers.

I first examined pitch speed and its effects on stolen-base percentage. It’s no surprise that, in 2016, runners succeeded more often on slower pitches.

Notice a slightly higher success rate for pitch speeds that fall above 95 mph. This phenomenon is not unique to 2016, and Jeff Sullivan hypothesized that good base-stealers are the ones stealing against fireballers. Indeed, while only 8% of stolen bases occur during a pitch that is 95 mph or higher, speedsters Billy Hamilton and Starling Marte attempted over 12% of their stolen bases in these situations. These situations tend to arise later (about one inning later on average) in closer games (stealing team is only .39 runs ahead rather than .46 runs ahead on average), meaning base-stealers ought to be more certain of success before attempting to steal.

In addition to pitch speed, we also have access to pitch location data through PITCHf/x. As you can see in the figure below, the SB probability varies more drastically by location, and therefore, is the most meaningful of the two pitch metrics. The results below mirror the results I would expect. High SB probability along the right side of the plate for left-handed hitters confirms that most catchers (if not all) are right-handed, which makes it hard to throw over left-handed hitters. Similarly, catchers have more success with right-handed hitters and pitches closer to their throwing shoulder. And finally, the most obvious of all: It’s hard to throw a runner out when the ball hits the ground.

I also included the PITCHf/x pitch descriptions since they help improve the model slightly. Some descriptions occurred only a few times, so I combined them into larger categories:

  • Dirt: Ball in Dirt, Swinging Strike (Blocked)
  • Pitch-out: Pitch-out, Swinging Pitch-out
  • Strike/Ball: Ball, Called Strike,
  • Swinging Strike: Foul Tip, Missed Bunt, Swinging Strike

Below is a table detailing the SB success rates in each of the four groups. Dirt and Pitch-out are the most extreme categories, with “normal” pitches falling in-between. Something that jumped out at me was the lower success rate on swinging strikes, as I would expect this to distract the catcher. Two explanations I can come up with are: 1) catchers tend to hold the no-swing pitches a split second longer to get the call from the ump, or 2) swinging pitches occur during a hit and run play where runners tend to be less skilled at stealing bases.

Controlling for the lead runner’s base is the last addition I made to the original SRAA model. Adding this effect improved the model (AIC to be specific), indicating runners stealing third were more likely on average to be successful than runners attempting to steal second and especially home. A likely explanation is that runners stealing third need to be more confident in their ability to steal in the current situation and have a right-handed hitter obstructing the catchers throw about 65% of the time.

So now that we have this new metric pSRAA, lets take a look at how it deviates from SRAA. As you can see in the figure below, the distribution of both metrics are fairly similar.

pSRAA has a slightly tighter distribution for pitchers and runners, meaning pSRAA has absorbed some of the expected SB probability in these new variables and pushed pitcher and runner SB skills closer to the mean. This phenomenon occurs most likely because the variables we are trying to control for are largely out of control for these players and are not rectifiable or exploitable. By that, I mean pitchers can’t control whether the one pitch they throw in the dirt happens to coincide with a runner taking off, but catchers can use this event to prove their skill. While a pitcher “loses control” of the SB situation when the ball is released, a catcher can make a brilliant play, saving a potential wild pitch and converting it into an out. Thus, we see a wider variation in pSRAA for catchers, as pSRAA identifies the increasingly elite talent and the replacement players that struggle to nab runners on pitch-outs.

Examining how players’ metrics improved or worsened after controlling for these additional effects reveals some drastic changes, but mostly small adjustments. The figure below illustrates the change from the old metric to the new metric. The closer a player is to the dotted line (pSRAA = SRAA), the less that player deviated from the original SRAA measure. If a player ends up above this line, it means that pSRAA is higher than SRAA, so when controlling for pitches, pSRAA attributes more success (for runners — less success for pitchers and catchers) to their ability rather than luck.

How does this new pSRAA model help us as baseball fans or analysts? pSRAA can identify where SRAA was under or overvaluing players’ skills. For example, SRAA undervalues catcher Chris Iannetta at a 0.86% SRAA when pSRAA pegs him at whopping -4.19% (negative is good for catchers)!  In other words, Iannetta jumps from the 43rd percentile of catchers to the 70th percentile!

To give you an idea of the kind of adjustments pSRAA makes, here is a sample stolen-base attempt against Iannetta (video has no sound for those of you who are watching at work; for sound go to 1:51:40 here), specifically a SB attempt that the model predicts will happen 85.5% of the time. Actually, it is more like 88.4% if you account for the runner, Lorenzo Cain, the 15th-fastest baseball player according to Statcast’s speed measure.

Now let’s just freeze that frame. The ball is almost on the ground, and not to mention, only thrown at 80 mph, giving Cain almost an extra tenth of a second to get to second base. Regardless, Iannetta guns him out with an impeccable throw.

Not only can we use pSRAA to uncover insights such as above, but we can also abuse pSRAA to easily find awesome plays like this top 5 play. J.T. Realmuto, known for his unbelievable pop time, throws out Ben Revere on this gem of a play. The pSRAA model gives Realmuto a 10% chance of throwing out Ben Revere, but Realmuto pops up in a staggering 1.78 seconds (via Statcast) and throws a perfect 85mph toss to second.

Or this scenario, which had a 92% stolen-base probability. A.J. Pierzynski picks a throw off the ground, then navigates around Brandon Phillips to beat Suarez by a mile.

And finally, here is an example of a successful stolen base the model predicts will happen 15% of the time — not a surprise when you see where the pitch is thrown (actually 43% when you account for the speedy Rajai Davis and the way below average Kurt Suzuki).

pSRAA does well for these purposes, but may not illustrate the total value a player adds to his team’s success. A runner with a high pSRAA value with only a couple stolen-base attempts hasn’t added much value to his team since he didn’t utilize his skill often enough. We can leverage pSRAA and stolen base/caught stealing (CS) run values to come up with a more useful metric, which I have aptly named Pitch Adjusted Swipe Rate Runs Above Average (pSRrAA) —a mouthful, I know. I based pSRrAA upon linear-weights metrics like FanGraphs’ Weighted Stolen Base Runs (wSB). The term linear weights, often used in the world of baseball statistics, translates to the average run value of a certain action and its effect on run scoring over the course of an inning. For example, let’s say there is a man on first base with no outs. The average number of runs scored in an inning in 2016 starting with this exact situation is 0.8744 runs. He gets caught stealing, and now the situation is nobody on and 1 out. Starting in this situation, the run expectancy drops to 0.2737. Thus, the value of this specific play was about -0.6 runs. Examining these situations over the course of the whole season leaves us with average run values that we can assign to SB and CS. Combining the run values for SB (runSB = .2 runs) and CS (runCS = -.41 runs) produced by FanGraphs for the 2016 season, we can use pSRAA to attribute the run values more accurately:

pSRrAA = pSRRA x (runSB-runCS) x Attempts

This method for calculating pSRrAA works because of the following:
  1. pSRRA already determines the probability a certain player adds to a SB above average.
  2. If a player adds 10% probability to a SB, they are contributing runSB 10% more than the average player and runCS 10% less.
  3. pSRRA x (runSB-runCS) quantifies the average attempt value, so then we just multiply by attempts to get a full run value over the course of the season.

Of course, as I alluded to in the beginning, pSRAA doesn’t account for all types of stolen bases, only ones with pitches involved. Consequently, pSRrAA doesn’t account for the total value runners and pitchers contribute to their teams because attempts are excluded in which catcher isn’t involved. Finally, to take a look at the top 10 and bottom 10 performers for each position according to pSRrAA, see my original article here. And as always, you can find the code associated with pSRAA/pSRrAA and the analysis on my GitHub page here. Checkout my new Facebook page to stay up to date on new articles.

A previous version of this article was published at