(Re) Introducing Hitter Volatility

I suspect many researchers and writers have their own white whale or unicorn; an idea or concept that they are always chasing, regardless of how fruitless or costly that search may ultimately be.

My unicorn is the concept of volatility. I spent a large part of my tenure at Beyond the Box Score exploring the topic for both hitters and pitchers. I even looked at the concept in relation to team performance earlier this year at FanGraphs and other outlets.

Essentially, the idea is to understand whether there are appreciable differences in how players distribute their daily performances over the course of a season. For example, if you have two hitters that are roughly equal in terms of overall skill (i.e. both are 25% better offensively than the league average) is there a difference in terms of how much each is likely to vary from their overall performance on a game to game basis? Is one hitter more consistent day in and day out, while the other mixes in phenomenal performances with countless 0-4 days?

My initial work had some problematic issues (as most initial work does), but thanks to some great feedback from readers and colleagues alike I am ready to roll out the new and improved version of Volatility (VOL), starting with hitters.

The biggest issue with my initial formulation was that it assumed that hitters daily performances (measured by weighted on-base average — wOBA) were normally distributed.

As with team run scoring, it turns out that is not the case. To illustrate, here is the distribution of all daily performances for 2012 (note that I am including stolen bases and caught stealing in the wOBA calculation since I want overall offensive production, not just what hitters do at the plate):

This meant that simply looking at something like the standard deviation of daily performances risked creating a metric that was biased against hitters with a higher seasonal wOBA. I tried a few different things, but I still ended up with metrics that were highly correlated with seasonal wOBA.

Enter colleague and mathematical wizard Matt Swartz. Matt suggested an approach that he used in an older study on team-level run scoring where he transformed a team’s seasonal run scoring average exponentially until the correlation between run scoring and the new variable was close to zero.

Using this technique I managed to come up with a metric that only has a .005 correlation to a player’s seasonal wOBA:

VOL = STD(daily_wOBA)/Yearly_wOBA^.52


VOL = volatility

STD(daily_wOBA) = the standard deviation of a player’s daily batting performance, measured by wOBA

Yearly_wOBA^.52 = a player’s yearly wOBA raised to the .52 power

Armed with this new metric we can now ask a whole slew of questions. I’ll start with some basic descriptive data and get into more inferential analysis in future articles.

Here are the players with the 25 lowest VOL scores for 2012 (min >= 300 plate appearances); VOL- is simply VOL indexed so that league average is 100 (not park adjusted):

Name Plate Appearances Yearly wOBA* VOL VOL-
Derek Jeter 740 0.343 0.380 76
Elvis Andrus 711 0.306 0.384 77
Jon Jay 502 0.342 0.388 77
Jose Reyes 716 0.335 0.389 78
Willie Bloomquist 338 0.303 0.400 80
Shane Victorino 666 0.316 0.400 80
Ryan Hanigan 371 0.323 0.403 80
Shin-Soo Choo 686 0.365 0.405 81
Martin Prado 690 0.347 0.408 81
Denard Span 568 0.331 0.409 81
Joey Votto 475 0.448 0.409 82
Carlos Lee 615 0.308 0.411 82
Alejandro De Aza 585 0.327 0.411 82
Mike Trout 639 0.416 0.412 82
Dustin Pedroia 623 0.347 0.414 83
David Wright 670 0.374 0.415 83
Alex Gordon 721 0.354 0.415 83
Chase Headley 699 0.375 0.417 83
Dustin Ackley 668 0.272 0.417 83
Angel Pagan 659 0.325 0.417 83
Michael Young 651 0.295 0.417 83
Chase Utley 362 0.347 0.418 83
Brett Lawrie 536 0.311 0.420 84
Jayson Werth 344 0.362 0.421 84
Jordan Pacheco 505 0.320 0.421 84

The least volatile player in 2012 was Derek Jeter. This shouldn’t be surprising, since it turns out that Jeter is the least volatile player since 1974 for hitters with at least ten seasons with >= 300 plate appearances in those seasons. Over 17 seasons, Jeter posted an average .397 VOL, four points better than Brett Butler (.401 – 14 seasons).

For reference, here’s the 30 least volatile hitters since 1974 (min 10 seasons with >= 300 PAs):

Rank Name # Seasons Ave VOL Ave wOBA*
1 Derek Jeter 17 0.397 0.362
2 Brett Butler 14 0.401 0.334
3 Chuck Knoblauch 12 0.404 0.345
4 Pete Rose 12 0.405 0.333
5 Luis Castillo 11 0.406 0.324
6 Willie Randolph 17 0.406 0.322
7 Ichiro Suzuki 12 0.410 0.344
8 Tony Gwynn 17 0.412 0.363
9 Wade Boggs 18 0.415 0.365
10 Steve Sax 11 0.419 0.305
11 Rickey Henderson 23 0.420 0.371
12 Jason Kendall 15 0.421 0.329
13 Ozzie Smith 17 0.424 0.296
14 Paul Molitor 19 0.424 0.356
15 Vince Coleman 10 0.426 0.302
16 Tony Phillips 14 0.427 0.334
17 Roberto Alomar 16 0.429 0.351
18 Rod Carew 12 0.429 0.358
19 David Eckstein 10 0.430 0.307
20 Mike Hargrove 12 0.430 0.344
21 Mark Grace 15 0.431 0.357
22 Rafael Furcal 12 0.431 0.321
23 Tim Raines 17 0.431 0.363
24 Todd Helton 14 0.433 0.405
25 Kenny Lofton 16 0.433 0.351
26 Buddy Bell 15 0.433 0.326
27 Bobby Abreu 14 0.434 0.380
28 Jose Offerman 10 0.435 0.320
29 Frank Thomas 15 0.435 0.418
30 Tom Herr 10 0.435 0.311

Joey Votto logged a little less than 500 plate appearances, but posted a .409 VOL. That’s incredible when you think about the fact that he had a .448 wOBA for the season. Basically, he was just as consistent as Denard Span, but with a wOBA that was 35% higher than Span’s.

The leader board should illustrate the general point that consistent doesn’t always mean better. For example, Michael Young was 17% more consistent that the league average last year, but he was abysmal at the plate overall. In his case, greater consistency meant that the Rangers didn’t benefit from as many “boom” type games as a less consistent hitter might have provided.

For completeness, here’s the 25 most volatile hitters from 2012:

NAME Plate Appearances Yearly wOBA VOL VOL-
Alexi Casilla 326 0.277 0.595 119
James Loney 465 0.275 0.596 119
Alex Presley 370 0.285 0.604 120
Elliot Johnson 331 0.262 0.605 121
Gerardo Parra 430 0.302 0.608 121
Casey McGehee 352 0.288 0.611 122
Ty Wigginton 360 0.286 0.616 123
Matt Carpenter 340 0.340 0.617 123
Mitch Moreland 357 0.331 0.618 123
Jarrod Saltalamacchia 448 0.309 0.620 124
Tyler Colvin 452 0.345 0.627 125
Juan Rivera 339 0.287 0.629 125
Tyler Greene 330 0.281 0.637 127
Carlos Gomez 452 0.348 0.643 128
Gaby Sanchez 326 0.269 0.651 130
Bryan LaHair 380 0.328 0.654 130
Greg Dobbs 342 0.304 0.656 131
Logan Morrison 334 0.333 0.659 131
Nyjer Morgan 322 0.246 0.663 132
Brian Bogusevic 404 0.283 0.667 133
Eric Chavez 313 0.329 0.693 138
Alexi Amarista 300 0.287 0.713 142
Jesus Guzman 321 0.329 0.724 144
Scott Hairston 398 0.348 0.738 147
Justin Maxwell 352 0.347 0.787 157

The one bit of inferential analysis I’ve completed was a look at the year to year correlation of VOL. Turns out, this new formulation has a higher correlation year to year than my previous one (.39 vs. .23). Overall, it’s still low — basically, it’s as reliable year to year as batting average — but there is a decent relationship and we do see evidence in the data that, like BABIP, over the course of a career players will sort by generally higher or lower VOL. For example, the correlation between a hitter’s average VOL for years one and two and a hitter’s volatility in year 3 is .42. This is something that definitely needs to be examined further, which brings me to next steps.

Well, there is a lot to do.

First, I want to look at what traits might lead a hitter to be more or less volatile. From my earlier research, and observations from others, my initial guess is high on-base, low strikeout, solid contact hitters will tend to have lower volatility. From the initial leader boards I am seeing these might still be the most significant variables, but of course it needs to be verified empirically.

Second, there is the larger question of whether the volatility of hitters matters all that much. How does it factor in to team construction? There is evidence that more consistent offenses tend to perform better over the course of the year (i.e. beat their pythagorean expectation in terms of wins), but the relationship between individual-level volatility and team-level volatility still needs to be addressed.

I’ll turn to these questions (and I’m sure a few more) in the coming months. Until then, comments and suggestions are welcome.

Oh, and here’s the complete VOL and VOL- leader board for 2012 (min >= 300 PAs) — you may need to refresh the page to see it. And, yes, you are welcome, Eno Sarris:


*I used average constants from 2002-2012 in order to conduct some of the year to year correlational analysis. Also, as I mentioned earlier in the article I included stolen bases and caught stealing.

Print This Post

Bill works as a consultant by day. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, consults for a Major League Baseball team, and has appeared on MLB Network's Clubhouse Confidential as well as several MLB-produced documentaries. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Tumblr or Twitter @BillPetti.

36 Responses to “(Re) Introducing Hitter Volatility”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Sylvan says:

    It amuses me that Ruben Tejada and Prince Fielder are tied in this stat.

    Vote -1 Vote +1

  2. Muh man, Bill Petti.

    Vote -1 Vote +1

  3. CK says:

    Seems to me this could be important for playoff performance. A more consistent team being less likely to go cold at the worst possible time.

    Vote -1 Vote +1

    • Jon L. says:

      …and less likely to get hot at the best possible time.

      Vote -1 Vote +1

      • B N says:

        Considering that you will in theory be facing the toughest pitchers in the playoffs, I don’t know if “relying on guys getting hot” seems like a great strategy. If I could bank 3 runs in every game and just play defense, I’d win more often than a team who averages 4 runs a game but relies on a long tail to have that number (since it doesn’t matter if they score 4 runs or 4,000 to beat my sure-bet 3). Any distribution of run scoring is going to be truncated (unless you find a way to score less than zero), so volatility in run scoring is mainly going to be extending that right tail.

        Vote -1 Vote +1

      • MangoLiger says:

        B N– relying on guys to get hot isn’t a great strategy for the playoffs… it’s the only strategy. Too few samples. Hottest team wins.

        Vote -1 Vote +1

      • Garrett J says:

        Generalizing for the purpose of developing playoff strategy at this time is inane.

        A. Here’s the length and breadth of the best strategy we have right now for the playoffs: “Make the playoffs”

        B. This research is in it’s infancy. It’s premature to draw any conclusions from it.

        Vote -1 Vote +1

    • MangoLiger says:

      Conversely, a less consistent team being more likely to get hot at the best possible time.

      Vote -1 Vote +1

    • CJ says:

      This is pretty well-established: the better team wants to be consistent, because they’ll win games when nothing goes crazy. The worse team wants to be streaky, because they might as well try to get hot rather than lose on the slow grind.

      Vote -1 Vote +1

  4. jfcincotta says:

    What are you trying to answer here? There’s a -.787 correlation between your VOL and PA/G. This makes lots of sense if you think about the wOBA per game method. With Jeter leading the league in PA’s per game, the value of a randomly distributed 15 HRs will be divided by a significantly larger denominator. This makes him look consistent when in fact you’re just getting larger sample sizes for each wOBA datapoint you are taking the standard deviation of.

    If you are really asking who gave the most consistent value to his team day in and day out, this is fine. You are describing what happened well. That said, if you were were trying to see what hitters give value in a lumpier distribution, you might want to control for the major factors of batting order, and how often a lineup will get around and drive up PA/G.

    You could order all PA outcomes from least valuable to most valuable and take a Gini coefficient of the players’ distribution of total value, or you could more simply tack on the basepath outcome to hitting outcome of each PA and take wOBA/PA, but right now you’re telling a good deal about where this player batted in a lineup and how good the team around him was in addition to how he produced his value. Also, you’re stat interestingly demonstrates through the correlation with PA/G that managers are attuned to the factors of consistent production (high OBP, good running, lower power) in their selection of batting order.

    +26 Vote -1 Vote +1

    • Haishan says:

      I’m not sure if measuring “volatility by PA” will tell us much useful that we need to know either; can it really capture anything that’s not already measured by various rate statistics, like essentially H/PA, 2B/PA, BB/PA, HR/PA? In other words, would it really give us that much more than a combination of average, on-base, and ISO?

      Vote -1 Vote +1

    • David says:

      Exactly, We are dealing with a sample size issue here. Taken to the extreme, a player with 1 PA/G and a player with 4 PA/G both make exactly 1BB, 1 K, 1 HR, and 1 GIDP every 4 PAs, one will have a VOL of 0, the other will have a VOL of… I dont have a calculator, a lot.

      Vote -1 Vote +1

      • David says:

        This might be more valuable run with WPA. Its too late for me to actually think about things, but that might round out some of the rough edges by including more data in the sample…? That makes me think though, they need a WPA with defense, Id like to see the run value of trout climbing the wall

        Vote -1 Vote +1

  5. snydeq says:

    Very interesting analysis. Perhaps a dramatic handedness split, especially for players more optimally platooned than are being used by their clubs, could be one underlying factor in high volatility metrics (Loney, LaHair, McGehee, etc.).

    Glaring SwStr discrepancies on specific pitches as well?

    Might also be interesting to take out baserunning influence on wOBA to see how that affects the volatility of some of the “high volatility” bunch (Gomez, Parra, Morgan, etc.). Consistent SB success would likely result in less volatility, while lower OBP hitters with speed may have long dry spells between games where everything comes together for them.

    Vote -1 Vote +1

    • snydeq says:

      Just as a follow-up. Of the 25 “most volatile hitters of 2012” only eight are not in the bottom 10 percent of at least one of the standardized pitch value categories (wFB/C, wSL/C, wCT/C, etc.), based on min 300 PA.

      (Presley, McGehee, Carpenter, Moreland, Colvin, Chavez, Guzman, and Maxwell)

      Vote -1 Vote +1

  6. gweedoh565 says:

    Very cool and interesting, Bill.

    I guess when I think of ‘volatility’, I tend to think ‘streakiness’, and so I am surprised that Jay Bruce’s VOL is just a bit higher than average as he is one of the streakiest players I’ve witnessed. But, of course, his streaks last months instead of days.

    I realize ‘streakiness’ is a totally different thing in this context, but I do think it would interesting to also look at time increments of greater than 1 day, such as a week, or, if it is possible, increments of ~20 PA or something. Unless there are already metrics measuring this that I am unaware, of course.

    Vote -1 Vote +1

    • PM says:

      Agree. All he is currently measuring is the ratio of small events (single and walks) to large events (extra base hits). This is already pretty obvious from glancing at a guy’s stat line. Volatility needs to be measured over a fixed number of observations. So for players with a similar stat profile, who has exceptionally high or low per-30-PA (for example) volatility? And is it predictive?

      Vote -1 Vote +1

  7. RMR says:

    How would a player who has extended hot/cold streaks appear in a metric like this? That is, he may be relatively consistent day to day. But a few times during the year he undergoes a massive shift in his production. It would seem that the scale (daily) of the volatility metric could have a significant bearing on its interpretation.

    As a Reds fan, I’m constantly arguing with my fellow fans about whether or not Jay Bruce is inconsistent. He seems to go through long slumps throughout much of the season, interrupted occasionally by a stretch where he’s the best hitter in baseball. And yet, here he shows up as a mild 104.


    Vote -1 Vote +1

  8. Pinstripe Wizard says:

    Kudos to this stat. I’m a fan.

    Vote -1 Vote +1

  9. Brandon S says:

    Have you considered running a fangraphs version of the Sharpe Ratio using this data? In case you don’t know, the Sharpe Ratio is a risk-adjusted performance metric that is used to measure investment performance; seems like it would be a simple enough conversion using wOBA and the VOL metric. That sort of information would be incredibly useful in a weekly head-to-head league. It should also be fairly simple to construct.

    Vote -1 Vote +1

  10. rusty says:

    Stepping back a little bit from the model / stat itself, I’d be interested in some of the implications here. Are more volatile players more likely to polarize their fan base? (e.g. I go to a game, Nyjer Morgan makes a couple great plays, I leave thinking he’s pretty great despite his numbers; my friend goes to two games in a weekend series, sees a bunch of bad contact, thinks he needs to go)

    Another thought: Todd Helton ranks very highly on your since-1974 VOL- leaderboard — but in recent seasons has put up a wOBA far below his prime (he’s just below 300 PAs last season). What has his VOL- curve looked like as he’s aged?

    Vote -1 Vote +1

  11. Mike says:

    So I know someone over at THT did a study years ago looking at something similar for pitchers, and found that out of two equally productive SP, the more volatile one was more valuable to the team, meaning he allowed for his team to win more games, since he had more games of excellence than the other pitcher. I think he was looking at AJ Burnett. So my question is, how does hitter volatility relate to team wins? Is it better to have more volatility if it means more games of high-end wOBAs, and therefore helping your team win more games?

    Vote -1 Vote +1

  12. jim says:

    it seems strange to see david wright among the “least volatile” hitters, when his offense basically dropped continually all year

    Vote -1 Vote +1

  13. Garrett J says:

    I like this research, Bill, but two questions come to mind:

    1. Is VOL a skill? It looks like it is, but there’s got to be SOMEONE at the top and/or bottom of a bell curve. I suggest the old Tom Tango/The Book method: Find the leaders and laggers in year X, then check their tendencies for years X-1 and X+1, but there are lots of ways to check that correlation, YMMV. I THINK it’ll end up up correlating, but it may very well take longer than a season to do so.

    2. How do you correct for power? Certainly, wOBA is a decent metric but what strikes me is that you’re going to run into “volatility” created merely by the value of a home run. Don’t forget there are several zeroes in the normal distribution (it’s kinda like a normal distribution – is there a name for a normal bell curve that’s constrained by 1/2?) image at the top of your page, mainly because it’s not possible to put out a wOBA of 0.05 in one game, (and apparently it’s HARD to put out a wOBA of 0.50 in one game though not impossible.)

    By the same token, most power is expressed as home runs, and you can’t have HALF a home run. It just seems to me that you can easily run into what you call volatility, but is in reality the limitation of how granular you can get with some of a player’s contributions: Consider that a guy who goes 1 for 4 with a HR and nothing else has posted a daily wOBA of .487, a nigh-unreachable number for his yearly average. 1 for 5 with a HR is slightly more reasonable, at .390.

    I very much fear that you’ll see players with low volatility who just happen to have a yearly average wOBA that is extremely close to a common daily number, like .390 (1/5, HR) or 0.360 (2/5, two singles.)

    Some cursory examination shows that there exists a weak correlation (r-squared = 0.166) between VOL and ISO:


    I didn’t have time to identify all of the “low-hanging-fruit” for daily wOBAs, the most common numbers like .360 and .390 (those two are just brown numbers – they may not even be all that common)

    I’m curious about that. What’re the most common daily wOBAs and how are they arrived at?

    Vote -1 Vote +1

  14. Garrett J says:

    Also the massive difference between 1/4, HR and 1/5, HR really makes me sad. Guys who have a good day at the bottom of the lineup are much more likely to post 4PA days than guys at the top of the lineup, but is there really a difference between a guy who hits 3 home runs over 12 AB from the 8 spot (4 games) and a guy who hits 3 home runs over 12 AB from the 2 hole (3 games?) Not really, but STD( daily_wOBA ) will probably suggest there would be.

    Vote -1 Vote +1

    • B N says:

      Well, the primary difference is that the guy hitting 3 HR in 4 games from the #8 spot might be worth putting higher in the lineup ;)

      Vote -1 Vote +1

  15. Garrett J says:

    Oops, I missed the section about the correlation already in the article. Apparently my first question (is it a skill) had already been answered.

    Vote -1 Vote +1

  16. B N says:

    I find this so interesting I will overlook the wale of a spelling error in the very first sentence… unless some people are really looking for an albino version of Olubowale Victor Akintimehin’s award-winning rhymes?

    Vote -1 Vote +1

  17. Colin says:

    Very very interesting analysis. This is something I’ve often thought about.

    Another question I have is whether players who have less volatility are rewarded in free agency at a higher rate already than players of higher volatility?

    With regard to team performance it seems likely to me that, given two teams with equivalent wOBA on offense and identical run differential, that the team with lower VOL will not necessarily outperform the team with higher VOL. However, it does seem likely that the team with lower VOL will consistently have less variance to their overall record than the team with higher VOL.

    In that way, as somebody said before, if you’re a team with weaker talent it might behoove that team to construct itself with a inherently higher VOL lineup in order to attempt to “get lucky” and reach the high end of the variance in any given year. Vice versa it would appear a team with more talent would prefer to construct a lineup with lower variance to minimize its chances of hitting the low end of there possible outcomes.

    Vote -1 Vote +1

  18. CW says:

    I’m pretty new to this, so I apologize if the answer to this question is obvious. What is the reasoning behind transforming the VOL stat so it isn’t corellated with yearly wOBA? I’ve been trying to reason it out on my own, and my best guess is that a high wOBA player (so, a better player?) has more opportunity to have a higher wOBA volatility, or std. dev., because his bad days are as bad as everyone else’s, but his good days are better. So, by transforming VOL you get a measure of daily wOBA volatility that isn’t correlated with player skill and you can compare VOL regardless of skill level. Am I even close?

    Vote -1 Vote +1

    • Colin says:


      Yes I think you hit the nail on the head. Bad players are more likely to have low variance. As a result, variance probably does have a somewhat strong correlation to ability. We don’t want to measure ability or rather, we want to avoid measuring ability so the metric is changed so that there is no correlation.

      Vote -1 Vote +1

  19. Christian says:

    Very interesting approach; nicely done. Even before reaching the data, the back-of-the-envelope skill that seemed to fit your description of VOL was ability to maintain a hitting streak — it needn’t be a high cutoff like 20 games, but batters that you’d be unsurprised to have a slew of smaller hitting streaks during the season (say, 5 games at a time, just to throw a number out there). All of the VOL leaders from last season and the last few decades easily fit that description.

    Vote -1 Vote +1

  20. Justinw303 says:

    “This meant that simply looking at something like the standard deviation of daily performances risked creating a metric that was biased against hitters with a higher seasonal wOBA. I tried a few different things, but I still ended up with metrics that were highly correlated with seasonal wOBA.”

    Can someone briefly explain to me why the standard deviation of daily performance would be biased against high wOBA players?

    Also, is it just coincidence that the league average VOL is .500?

    Vote -1 Vote +1