Archive for Research

Little League Home Runs in MLB History, Part II

This article was originally developed as an oral presentation given by the author to the Society for American Baseball Research at their SABR 45 Convention in Chicago on June 27, 2015. The presentation, which featured the innovative use of video, audio and transitional animation embedded within a PowerPoint deck, was awarded the annual Doug Pappas Research Award as the best of the 32 oral presentations made during the convention that weekend.

This article has been repurposed from that deck. Since the Retrosheet play-by-play data on which this study was predicated were updated just days before the original presentation, all the data provided during the oral presentation have been updated for this article.

In yesterday’s installment of Little League Home Runs in MLB History, we reviewed the proposed definition of the Little League home run and found both the earliest recorded incidence of the event, and the earliest use of the term. In today’s post, we’ll examine some of the numbers and oddities surrounding historical Little League home runs.

But first, let’s enjoy another highly entertaining Little League home run (LLHR), coincidentally by the same guy who made the second error on the Cabrera LLHR we saw in Part I:

The fans in Milwaukee didn’t sound too happy to see that, did they? And Wilin Rosario really looked a little worse for the experience, didn’t he? He should probably consider hitting the treadmill more…

Little League Home Runs: Data and Oddities
In any event: out of the 148,390 games available to us to query on through the end of the 2014 season, we discovered, according to our simple definition, a total of 258 Little League home runs in Major League history, which works out to about 1.74 Little League home runs per 1,000 MLB games. So if we were to look just at the historical average outright, that would mean that in a typical modern season of some 2,460 or so games, including All-Star Game and the postseason, it should work out to a little more than four Little League home runs per season, on average. But there’s more to it than just that, as we’ll see shortly.

Of the 258 Little League home runs in total, 253 of them occurred during the regular season. Four occurred during postseason play, including that earliest known one from the 1911 World Series that we shared with you yesterday; and also one that occurred during the 1919 World Series. OK, now, I know what you must be thinking, but counterintuitively, that LLHR was not given up by the cheatin’ Black Sox in order to satisfy their gambler patrons, but instead by the Reds during Game Two of the Series. The batter, Ray Schalk, came around to score on a single to right and two Reds throwing errors, one by the right fielder to second and another by the third baseman to home in a futile attempt to nail Schalk at the plate. (And I’m sure Schalk received a rousing reception in the dugout from his Black Sox teammates once he arrived there, amirite?)

And then there’s the one Little League home run that occurred during an All-Star Game, hit by Leo Durocher in 1938, involved a couple of other Hall of Famers, and was called by WGN Radio’s Bob Elson, who was broadcasting the game throughout the Mutual Network that day:

Pretty exciting stuff, huh? That’s the earliest broadcast of a Little League Home Run I could manage to locate, and I’m fairly certain that’s the earliest radio clip of any Little League home run that’s available. But you’re the crowd, so I invite you to source an earlier example than this if you can, so we can acknowledge its existence.

Let’s take a look at a few stats and oddities in the history of the Little League Home Run. Broken down by how many were hit in how many seasons, here are how the numbers look:


But it actually gets more interesting when you step back and look at the forest a bit, instead of staring at a succession of trees. If you break it down by era, you can definitely a pattern at work:


You see a really high rate of Little League home run incidence taking place before 1929, which may or may not be an anomaly. But either way, there is a definite trend of Little League home runs diminishing over time, which you can tell because I asked Excel to add a trendline to the data, and there it is, in bright red dots below, and since Excel is pretty good at math, I will trust that it’s accurate:


The main takeaway here is that, even though there looks like a lot of up and down between eras, the highs and the lows are both going lower over time, and in fact the five-year period we just concluded saw the lowest rate ever for Little League home runs.

My theory about this drop in number is that, quite straight, players play better defense today overall than they ever have in history. Fielding is simply better. Yes, you could talk about, “Well, nobody playing today is as good a defender as Willie Mays was”, and OK, that’s fine, I’ll give you that one for the sake of moving the discussion along. But even so, that’s only one guy. I’m talking about defense overall.

What can’t be disputed is that there are many fewer errors per game today than there were a few decades or a century ago, and whether that’s because of better instruction today’s players received starting with — ironically — Little League, or because of better equipment, better gloves, better fields, or even that official scorers aren’t handing out errors on questionable fielding plays like they used to — whatever the reason is, the overall trend in baseball towards fewer errors is also reflected in the declining incidence of Little League home runs over the years.

Here’s an analysis across time by month of year:


This basically shows flatness across months, except that March-April looks unnaturally low and I’m not sure why. The number is roughly a third of any other month, and even taking into account a higher percentage of off days and weather postponements, I don’t believe there are only about a third the games played in the season’s first month versus the other months. I’m not even sure it’s a fluke. Maybe there’s something going on there? Thoughts, anyone?

By inning might make a little more sense:


You can see that the 3rd through 7th innings have the highest numbers of incidence, and then you start to see fewer from the 8th inning on. This might be because teams become more conservative, perhaps especially when they have small leads to protect or small deficits to overcome, so maybe there’s more putting the ball in their pockets. In the ninth and beyond, of course, teams tend to play even more conservatively. We see fewer Little League home runs in the 9th and extra innings combined than in the 8th inning alone, and that makes intuitive sense.

Little League home runs started by play type strikes me as interesting:


I didn’t expect to see singles making up more than half of all Little League home runs, and doubles almost a quarter. I would have bet money I would see Little League home runs started by infield errors composing a much bigger portion of the total, maybe even more than half overall.

But when I looked closer at the Retrosheet event descriptions, more than a third of those 134 singles were specified as starting on infield hits. So there apparently have been more infield Little League home runs than it might look like at first glance, and in my opinion, the infield Little League home run is definitely the purest form of the art.

Note that of Little League home runs started by infielder errors, the most were by pitchers with 16, and then by third basemen with 12. But out of the 148,390 games available to query on, only one started on a catcher error! Isn’t that wild? You’d think a lot more errant throws by catchers on bunts or dribblers in front of the plate would end up deep down the right field line, sending the batter-runner to third easily and leading to more overthrows of third that would send him home, wouldn’t you? But no…

One last thing about that table: you must be asking yourself, “How the hell can anyone hit a Little League home run on a walk?” Yes, it is absolutely as crazy as it sounds. This one was uncovered just this year with the mid-season update of the Retrosheet play-by-play archives, and occurred with the Yankees’ Lou Gehrig at bat in the fifth inning of the first game of a doubleheader against the White Sox on July 14, 1930. An account of that play shows up in the Retrosheet play-by-play for the game thusly:

Gehrig walked [Lazzeri scored (error by Mulleavy) (error by Tate) (unearned) (no RBI), Gehrig scored (unearned) (no RBI)]; the White Sox argued about the walk; Greg Mulleavy flipped the ball to the mound; Bennie Tate picked up the ball and threw it away

Isn’t that play nuts? More to the point, how can this play be called a Little League home run, even satisfying the definitional requirements of two errors and the batter scoring on the play? That’s a discussion we can take on in (spoiler alert) the third and final post of this series, going up tomorrow.

Here are Little League home run totals by franchises hitting them:


…and by franchises giving them up:


The thing to note here is that, obviously, practically all the top franchises on both lists are “Original 16” franchises (denoted in blue). The top expansion team on the Hit It list is the Astros, and then the Angels and Mets, all welcomed to the big leagues in the early 60’s. The top expansion franchises on the Gave It Up side are the Brewers-parentheses-Pilots, circa 1969, and, again, the Mets (1962) and Padres (1969).

You will note that Arizona has never hit one, and Tampa Bay has yet to give one up, and each have a “2” on the other side of their ledger. Both are 1998 teams.
But because these lists are biased towards the Original 16 and other clubs who have been in the majors longer, a better list for comparison might be one that shows Little League home runs per 1,000 games played by each club, which is what these tables, ranked by rate per 1,000 games from highest to lowest, do:



When you closely examine the rates by team, you start to see more expansion franchises float toward the top. But I do find interesting that even when analyzing rate per 1,000 games, Original 16 teams are still better represented than mere chance would dictate. Here’s what I mean: original 16 teams make up a little more than half of all teams — actually, 53%. But six of the top seven clubs which have Hit It, by rate, are Original 16s, and that’s 86%; and ten of the top 12, or 83%, which gave it up are Original 16s.

This, I believe, is directly related to the chart we saw earlier showing that rates of Little League home run were way higher in the early eras of the game, since the Original 16s were the clubs that were around when those rates were so much higher.

Home teams appear to have a minor advantage in hitting the Little League home run, the inexplicability of which is augmented by the realization that they don’t have to hit in their half of the ninth inning in about 45% of games:


Size of foul ground available might be a contributing factor to higher rates of Little League home runs. The Ballpark Database at allows us to download a zip file that includes a spreadsheet indicating, by ballpark/by year, whether the amount of foul territory in a given “ballpark year” is large, neutral or small (L, N, S) or, for modern parks, how much foul territory it has in thousands of square feet (ranging from 40.700 at the various iterations of Stadium, down to 18,100 at Fenway Park). While the inexactitude of the L/N/S scale renders this a less than an airtight analysis, the results are still logically directional:


This table indicates that, among ballparks characterized by descriptive nomenclature for foul ground size, large (+8.3% more versus average) and neutral (+12.8%) are more likely to experience Little League home runs, making up 76% of total incidents, with small parks (-22.3%) far less likely to see such plays. However, among the 63 incidents occurring in ballparks which report their foul ground territory in numbers, the typical Little League home run occurred in parks with roughly 2% less foul ground than average. Twelve Little League home runs occurred in ballparks without any available data.

Here’s a neat little oddity I came across, which I can’t really explain:


National League teams have hit far more, and have given up even more far more (if that’s a thing), than American League teams. NL teams hit +35% more and gave up +41% more than AL teams. Really odd, right?

But this table is even better, I think:


When you break it down by era, you see that in the vast majority of eras, the NL gave up more, and sometimes a lotta lot more, than the AL, right up to the last two periods combined in which NL teams gave up 15 total and the AL gave up just six. So if we initially thought the DH thing might have had anything to do with the difference in Little League home run rates between the leagues, we’d apparently be wrong.

There might be some other non-performance reasons why this difference should exist — harsher official scorers in the NL? Bigger ballparks in square footage in the NL overall? I don’t know — but what is clear is that this is a persistent thing.

The record for the player with the most career Little League home runs — how many would you think that would be? Would you think that it’s two? Or maybe three? How about four or more? When I made this presentation and asked the audience to guess, about half thought it was four or more, and the other half was split between two and three.

Well, the answer is: two. And these are the sixteen players with two:

Tommie Agee
Johnny Bench
Donn Clendenon
Tony Fernandez
Curt Flood
Jim Gilliam
Bobby Grich
Ron Hunt
Carlos Lee
Kenny Lofton
Garry Maddox
Jack Perconte
Luke Sewell
Ted Simmons
Tris Speaker
Gee Walker

You will notice some pretty good base stealers here: all of them except one have at least 65 career steals. (If you can tell right away without looking which one it is, then you’re really good.) This kind of suggests that you have to have a somewhat long career to get two, although Tony Fernandez got his two Little League home runs on back-to-back days against the Tigers in 1988. By contrast, Jim Gilliam’s two times came ten years apart from one another.

And tied for second in career Little League home runs, with one, is 226 other guys.

The last piece of trivia — as if this entire subject weren’t trivial enough already — is this: out of the 258 Little League home runs found in major league history so far, only six of them came on three errors. All the rest came on two.

And here are those six:


You might be asking yourself, exactly what does a three-error play look like? Let’s take a quick look at the earliest one on the list, which was hit by the exquisitely unibrowed Cardinals rookie Wally Moon in 1954. With a runner on third, Moon hit a grounder to the second baseman who exercised his fielder’s choice by trying to nail the runner advancing to the plate, but instead threw wildly past the catcher for the first error; Moon then tried to advance to second, drawing the throw from the catcher who instead heaved it into center for the second error; Moon then chugged around third toward home as the center fielder picked up the ball and threw wildly past the plate to allow Moon to score. And that’s one way in which you can commit three errors on a play.

My favorite three-error Little League home run, though, is by Jeff Leonard in 1988, the only such play during which all three errors were committed by one player: Tommy John.

Yes, they could give him three errors on the play! And they did! And thank God for that, because that helped me for this article.

In the next and last installment, we will contemplate the current proposed definition of the Little League home run and discuss whether it should be reconsidered, and if so, how.

Little League Home Runs in MLB History, Part I

This article was originally developed as an oral presentation given by the author to the Society for American Baseball Research at their SABR 45 Convention in Chicago on June 27, 2015. The presentation, which featured the innovative use of video, audio and transitional animation embedded within a PowerPoint deck, was awarded the annual Doug Pappas Research Award as the best of the 32 oral presentations made during the convention that weekend.

This article has been repurposed from that deck. Since the Retrosheet play-by-play data on which this study was predicated were updated just days before the original presentation, all the data provided during the oral presentation have been updated for this article.

Let’s start off this article the way I started off my presentation to SABR: with a quick poll. And you might as well be honest, now, because otherwise you’re just bullshitting yourself, and that would just be pathetic.

  • How many of you played Little League when you were a kid? Hands up, please. OK… keep them up. Now:
  • How many of you ever hit a home run in a Little League game? If you did, keep your hands up. OK… now, finally:
  • How many of you hit an actual home run clear over the outfielders’ heads and were able to trot all the way around the bases in a Little League game?

Not so many of you, right? Only the very best players on any given Little League team ever hit that kind of home run. If you’re like me, and like most Little Leaguers, if you ever hit a home run in Little League, this is what it probably looked like this:

Read the rest of this entry »

The 2015 Strike Zone, Through July

With strikeout rates soaring and run scoring dipping to generational lows in recent seasons, word came in the offseason that the Competition Committee would be monitoring the expanding strike zone in 2015. Given the scrutiny it is receiving at the league level, I have been tracking the strike zone over the course of the season, with updates at the end of each month. At the following links you can find the updates from the end of April, May and June.
Read the rest of this entry »

A New Way to Look at Sample Size: Math Supplement

This article is co-authored by Jonah Pemstein and Sean Dolinar.

For the introductory, less math-y post that explains more about what this project is, click here.

The concept of reliability comes from the classical test theory designed for psychological, research, and educational tests. The classical test theory uses the model of a true score, error (or noise) and observed score. [2]


To adapt this to baseball, the true “score” would be the true talent level we are seeking to find, and observed “score” is the actual production of a player. Unfortunately, the true talent level can’t be directly measured. There are several methods to estimate true talent by accounting for different factors. This is, to an extent, what projection systems try to do. For our purposes we are defining the true talent level as the actual talent level and not the value the player provides adjusted to park, competition, etc. The observed score is easy to measure, of course — it’s the recorded outcomes from the games the player in question has played. It’s the stat you see on our leaderboards.

The error term contains everything that can affect cause a discrepancy between the true score and the observed score. It contains almost everything that affects the observed outcome in the stat: weather, pitcher, defenses, park factors, injuries, and so on. This analysis isn’t interested in accounting for those factors but rather measuring the noise those factors in aggregate impart to our observed stat.

Read the rest of this entry »

A New Way to Look at Sample Size

Jonah Pemstein and Sean Dolinar co-authored this article.

Due to the math-intensive nature of this research, we have included a supplemental post focused entirely on the math. It will be referenced throughout this post; detailed information and discussion about the research can be found there.


“Small sample size” is a phrase often used throughout the baseball season when analysts and fans alike discuss player’s statistics. Every fan, to some extent, has an idea of what a small sample size is, even if they don’t know it by name: a player who goes 2-for-4 in a game is not a .500 hitter; a reliever who hasn’t allowed a run by April 10 is not a zero-ERA pitcher. Knowing what small sample size means is easy. The question is, though, when do samples stop becoming small and start becoming useful and meaningful?

Read the rest of this entry »

On Rotation, Part 2: The Effects of Spin on Pitch Outcomes

On Monday, I looked at how different spin rates for different pitches affect the way those pitches move through the air towards a batter. That post was useful for understanding the relationship between spin and velocity and movement. What it didn’t tell us, however, is too much about what the spin actually does for the pitcher: does more spin make pitches harder or easier to make contact with? Does more spin induce weaker contact? To answer those questions (as well as others), we can look at the actual production from hitters on these pitches. That’s the goal of this post.

The first such stat we’ll consider is contact rate (Contact%), or times made contact (balls in play or foul balls) per swing.

Read the rest of this entry »

What Hard-Hit Rate Means for Batters

Recently, one of the hot topics in baseball statistics has been the appearance of a measurement for hard-hit balls: here at FanGraphs, we added hard-hit rate to our leaderboards before this season, adding along with it a wealth of opportunities for analysis. An issue with any new statistic is that it can be cited without fully knowing its true use or impacts, and so hard-hit rate has been making the rounds in player analysis, generally cited in respect to how well or how poorly they have been performing.

For hitters, it might go without saying that hitting the ball harder is generally a good thing: the aim of hitting, in a certain sense, would seem to be to hit the ball as hard as possible as often as you can (except in the cases of bunting or other situational circumstances). However, it hasn’t been clear yet how hitting the ball hard impacts other rate and counting statistics, and that seems to be a hole in our understanding of a statistic that is undergoing a moment in the spotlight.

The aim today is, at the very least, to explore how hard-hit rate impacts a few of those stats, as well as to begin a conversation that more astute statistical minds may be able to take to deeper and exciting places. There are a couple levels to this piece today, but there are surely many more that I have not reached: I don’t intend to make hard conclusions, but rather to explore and provide a well-intentioned foray into the data. With that said, onward.

Read the rest of this entry »

On Rotation, Part 1: The Effects of Spin on the Flight of a Pitch

My last article was a look at the effects of pitch location on batted balls. While it ended with on somewhat disappointing note, showing that the results couldn’t really be applied to individual pitchers, it did make me think more about which components of a pitch affect the pitch, and in which ways.

So I decided to examine spin. Spin is captured by PITCHf/x in two measurements: rate (in revolutions per minute) and direction (the angle in degrees). As it turns out, the spin of a pitch has quite the effect on its outcome, much like location. Different spin rates make the pitch move differently (obviously) and get hit differently. (For a look at this topic from a physics standpoint, check out this infographic and this much more complicated article, both from the excellent Alan Nathan. And, to make sure everybody knows: I know little about the actual physics of this past what I can infer from my baseball playing and watching experience. I am just looking at the PITCHf/x data.)

Before we get right to the graphs, a quick note about my methodology. I grouped each pitch from 2009 onward — which is the year PITCHf/x started to record spin rate consistently — into buckets based on spin rate (pitches were rounded to the nearest 50 RPM) and pitch type (I included four-seam fastballs, curveballs, changeups, two-seam fastballs, cutters, knuckleballs, and sliders). I then found a multitude of stats for each bucket: contact rate, average speed, average movement, ground ball rate, and many more. I also did the same with spin angle, grouping pitches into buckets by rounding to the nearest 20 degrees, but the results weren’t particularly meaningful.

I also combined two-seam fastballs and sinkers when I was doing this. There has been some discussion in the past about whether there is a difference between those two pitches. While PITCHf/x classifies them separately, they are more or less indistinguishable, and when I first did this without combining them, they overlapped on nearly all of the various graphs.

Read the rest of this entry »

Batted-Ball Rates vs. Velocity Changes

Last year, I revisited Mike Fast’s “Lose a Tick, Gain a Tick” article and found how much a pitcher should expect to see his ERA, FIP and xFIP change with a velocity decline. Additionally, I found the rate of decline of strikeouts and walks. An interesting finding from the work was that FIP and ERA change by the same amount with a velocity decline while xFIP doesn’t follow the other two. I decided to examine some batted-ball stats to see which ones change when a pitcher’s velocity changes.
Read the rest of this entry »

Modeling Salary Arbitration: Stat Components

This post is part of an ongoing arbitration research project and is coauthored by Alex Chamberlain and Sean Dolinar.

April 24: Modeling Salary Arbitration: Introduction

Feb. 25: 2015 MLB Arbitration Visualized

* * *

A couple of weeks ago, we introduced a couple of regressions that modeled arbitration results using a basic formulae predicated on wins above replacement (WAR). Ultimately, the models estimated that an arbitration-eligible pitcher could expect his salary to increase by 14 percent, and his raise in salary to increase by 56 percent, for each additional WAR. A hitter could expect increases of 13 percent and 46 percent, respectively.

The models, however, were incomplete: they did not incorporate any other stats aside from WAR. This was by design, as we wanted to introduce simple one-variable equations for the sake of demonstration. WAR is, conveniently, a comprehensive variable that attempts to summarize a player’s worth in one easily digestible number. But what about the effects of a player’s age or arbitration year?

Moreover, the r-squared statistic — a quick-and-easy check of a model’s validity — for each specification is not especially strong, clocking in anywhere between .30 and .56. This is partly a result of specifying only one explanatory variable, so including more variables — which we have done in this post — should improve the goodness of fit of the models, assuming the variables are relevant.

With that said, we have new-and-improved models to share with you: one comprised of composite statistics and another comprised of traditional statistics. They are all vanilla, linear ordinary least squares (OLS) regression models, and it is important to remember that the values for each stat can only be used in the context of that specific model.

Non-Traditional Statistics

For each player, we specify…

  • a composite statistic, such as wins above replacement (WAR) for batters and RA9-WAR for pitchers, to measure overall performance (RA9-WAR uses runs allowed per nine innings rather than FIP);
  • a service statistic, such as plate appearances (PA) and innings pitched (IP), to measure playing time;
  • a “glory” statistic, such as home runs (HR) and saves (SV), to account for baseball’s affinity for traditional statistics and social constructs;
  • arbitration year (for pitchers*), indicating a player’s total service time;
  • and his age (for hitters*), to measure as best we can the number of years for which he has inhabited the earth.

We identify these particular stats not only to cover as much analytical ground as possible but also minimize the use of stats that have high correlation among themselves (multicollinearity). We want to isolate different aspects of player performance or value as best we can.

Read the rest of this entry »

A Look at Quality of Contact Profiles

It seems like it should matter how hard you hit the baseball. That statement probably seems self-evident, but until this year we haven’t really had a whole lot of evidence to demonstrate whether that’s true. We have an old month of HITf/x data from 2009 and there’s non-public data about exit velocity, but until StatCast data arrived this year, we didn’t really have the tools to determine how much quality of contact matters.

Last week, FanGraphs launched quality of contact statistics courtesy of Baseball Info Solutions to add to this effort. The methodology isn’t based solely on a raw exit velocity, but the data stretches back to 2002 and it’s publicly available now and easy to manage. As soon as people realized the data was available, the sabermetric masses went to work to run preliminary tests on the data. One of the interesting things that showed up right away was that the data didn’t do a great job predicting itself in the future and things like Hard% didn’t correlate with stats like BABIP or LD% as well as we might have otherwise thought.

Read the rest of this entry »

How Contact Ability Might Influence a Hitter’s Transition to the Majors

Back in February, there was some discussion about the transition from Triple-A to the majors, and whether that jump was getting any more difficult. It certainly seemed that way. Several highly-regarded minor leaguers completely flopped in their first tastes of big league action last year. Gregory Polanco, Jon Singleton, Xander Bogaerts, Jackie Bradley Jr. and the late Oscar Taveras all didn’t hit a lick after tearing it up in the minors. And perhaps worst of all, Javier Baez — a consensus top 10 prospect heading into the year — hit a putrid .169/.227/.324 with an unsightly 41% strikeout rate.

Jeff Sullivan and Ben Lindbergh both looked into the validity of this phenomenon, and wrote response articles more or less debunking it. Both concluded that the gap between Triple-A and the majors wasn’t growing after all, or at least not in any meaningful way. So much for that.

However, after thinking about it for a while, I started to wonder if there might be other ways to explain the initial failures of guys like Baez. Perhaps it might be more informative to look at these transitions from a different angle: Not across time, but across skill sets.

Baez’s flaws were easily identifiable. He struggled to make contact, and also showed a tendency to chase pitches out of the zone. But perhaps his rough transition wasn’t unique to him. Maybe his skill set — his poor plate discipline and/or poor bat-to-ball ability — just doesn’t play well against major league pitching. If that’s the case, it might help us be wary of the next Javier Baez. Read the rest of this entry »

Batted Balls: It’s All About Location, Location, Location

BABIP is a really hard thing to predict for pitchers. There have been plenty of attempts, sure, but nothing all that conclusive — probably because pitchers have a negligible amount of control over it. So naturally, when I found something that I thought might be able to model and estimate pitcher BABIP to a high degree of accuracy, I was very excited.

My original idea was to figure out the BABIP — as well as other batted ball stats — of individual pitches from details about the pitch itself. Velocity, movement, sequencing, and a multitude of other factors that are within the pitcher’s control play into the likelihood that a pitch will fall for a hit (even if to a very small degree). But much more than all of those, pitch location seems to be the most important factor (as well as one of the easiest to measure).

I got impressively meaningful results by plotting BABIP, GB%, FB%, wOBA on batted balls, and other stats based on horizontal and vertical location of the pitch. So I came up with models to find the probability that any batted ball would fall for a hit with the only inputs being the horizontal and vertical location (the models worked very well). I even gave different pitch types different models, since there were differences between, for example, fastballs and breaking balls. I found the “expected” BABIP of each of each pitcher’s pitches, and then I found the average of all of those expected BABIPs — theoretically, this should be the BABIP that the pitcher should have allowed.

Read the rest of this entry »

The Non-Speed Components of Double Plays

Last week, we rolled out some minor tweaks to WAR, one of which was the addition of wGDP. If you haven’t read the primer, wGDP is a measure of double play runs above average and captures how many runs you save your team by staying out of double plays.

In general, it’s a minor piece of the overall puzzle with the best and worst players separated by less than a win of value over the course of a full season. Staying out of double plays helps your team, but even the best players don’t stay out of a large enough number to swing their value in a big way. Introducing wGDP makes WAR a better reflection of reality and that’s a good thing, but it also allows us to better measure the GIDP column we’ve all seen for years because it puts double plays in the context of double play opportunities.

Dave and August have already looked at some surprising and obvious players who are great at staying out of double plays, but I wanted to consider this new statistic from another angle. For the most part, it seems like staying out of double plays should be a base running issue, as you have to be fast enough to get to first before the infield twists it.

Read the rest of this entry »

Investigating the Idea of Scarce Right-Handed Power

I want to put to rest the discussion about the lack of right-handed power in Major League Baseball today. There has been a lot of anecdotal commentary about how scarce right-handed power has become, but there haven’t been too many analytical articles supporting this idea. If anything, the handful of articles that have been written question if the problem even exists in the first place. There are two different arguments about this topic: the first is that right-handed power is scarce — that is to say left-hand power is bountiful — but right-hand power is not, while the second argument, which I won’t address today, is that relative to left-handed power hitters, right-handed power hitters have declined in number.

In a hypothetical choice between players of equal talent, you would almost always prefer a left-handed power hitter to a right-handed power hitter, since the lefty will have the platoon advantage more often and should be more productive as a result. There are valid arguments concerning rounding out line-ups, but right-handed batters are not scarce; good left-handed hitters are actually the scarce commodity.

For reference, the general population is estimated at having a left-handed rate of 10%, while baseball has a left-handed rate among batters is about 33%; lefties are overrepresented in baseball.

This is a box plot of the various player-seasons from 2010 until 2014. I’ve chosen this time span since it’s recent and it falls after the implementation of PITCHf/x, which improved the measurement of the strike zone. I’ve excluded switch hitters for simplicity, and set a floor at 200 plate appearances.

2010-2014 Single Season HR

Read the rest of this entry »

On the Consistency of ERA

We know that ERA isn’t a perfect indicator of a pitcher’s talent level. It depends a lot on the defense behind the pitcher in question. It depends a lot on luck in getting balls in play to fall where the fielders are. It depends a lot on luck in getting fly balls to land in front of the fence. It depends a lot on luck in sequencing — getting hits and walks at times where it doesn’t hurt too much.

That’s why we have DIPS. Stats like FIP, xFIP, SIERA, my recent SERA, and Jonathan Judge’s even more recent cFIP all attempt to more accurately measure a pitcher’s talent by stripping those things out. But what if there was an easy way to figure out how much ERA actually can vary? How likely a pitcher’s ERA was? What the spread of possible outcomes is? The aforementioned ERA estimators do not address that issue. They can tell you what the pitcher’s ERA should have been with all the luck taken away (or at least what they think the ERA should have been), but they can’t answer any of the questions I just posed.

Read the rest of this entry »

Examining SERA’s Predictive Powers

SERA, my attempt to estimate ERA with simulation, started off as an estimator. Then, later, I laid out ways to make it more predictive. Well, here’s the new SERA: a more predictive, more accurate and better ERA estimator altogether.

First, a refresher: The first SERA worked by inputting a pitcher’s K%, BB%, HR% (or HR/TBF), GB%, FB%, LD% and IFFB%. Then, the simulator would simulate as many innings as specified, with each at bat having an outcome with a likelihood specified by the input. A strikeout, walk or home run was simple; a ground ball, fly ball, line drive or popup made the runners advance, score or get out with the same frequency as would happen in real life.

To make SERA a better predictor of future ERA, I outlined a few major ways: not include home runs as an input (since they are so dependent on HR/FB rate, over which pitchers have almost no control), not include IFFB% for the same reason (it is extremely volatile and pitchers also have very little control over it) and regress K%, BB%, GB%, FB% and LD% based on the last three years of available data — or two or one if the player hadn’t been playing for three years. There were some other minor things, too.

Read the rest of this entry »

Towards a Better and More Predictive SERA

My last article introduced the concept of estimating a pitcher’s ERA using a simulation called SERA. As I pointed out throughout the article, SERA was strictly an estimator, not a predictor. That is, a pitcher’s SERA in one season wouldn’t do a great job predicting that pitcher’s ERA the next season. It’s more similar to FIP than it is to xFIP; descriptive rather than predictive.

But what if we want to create a simulator that predicts ERA for the future instead of just estimating what the ERA should’ve been? Some things are going to need to be changed — not just the code for the simulation, but also the inputs.

Read the rest of this entry »

Testing the Lasting Effect of Concussions

Pitchers are expected to lose command after Tommy John surgeries. Prolific base stealers coming back from hamstring injuries are expected to take it slow for a week or two before regularly getting the green light on the basepaths. A broken finger for a slugger is blamed for the loss of power; a blister for a pitcher might mean a loss of feel on their breaking ball. What is not well publicized, however, is how a player recovers from and reacts to returning from a concussion. For an injury that has been talked about in the media so often in the last few years, we know very little about the actual long-term, statistical impacts that concussions have on players that experience them.

Players often talk about being “in a fog” for some time after suffering a concussion – often even after they return to play. The act of hitting is a mechanism that involves identifying, reacting, and deciding on a course of action within half a second. With that in mind, I wondered: do concussions change the quality of a batter’s eye and discipline at the plate? Do brain injuries add milliseconds to those individual steps? Even though each injury is different, do varying lengths of disabled list stints due to concussions change a player’s performance on the field after they return? The most direct route to answering those questions might be studying the impact of concussions on strikeout and walk rates.

Read the rest of this entry »

Estimating ERA: A Simulated Approach

ERA, probably the single most cited reference for evaluating the performance of a pitcher, comes with a lot of problems. Neil does a good job outlining why in this FanGraphs Library entry. Over the last decade, plenty of research has cast a light on the variables within ERA that often have very little to do with the pitcher himself.

But what is the best way to use fielding-independent stats to estimate ERA? FIP is probably the most popular metric of this ilk, using only strikeouts, walks, hit batters, and home runs to create a linear equation that can be scaled to look like an expected ERA. Then there’s xFIP, which is based off the idea that pitchers have very little control over their HR/FB rate; to account for this, it estimates the amount of home runs that a pitcher should have allowed by multiplying their fly balls allowed by the league average HR/FB rate.

For many people, however, these are too simple. FIP more or less ignores all balls in play completely; xFIP treats all fly balls equally. Neither one correctly accounts for the effects that any ball in play can have; we know that the wOBA on line drives is much higher than the wOBA on pop ups, but we don’t see that reflected in many ERA estimators. The estimators we use also are fully linear, and may break down at the extreme ends; FIP tells us that a pitcher who strikes out every batter should have an ERA around -5.70, which is, well you know, not going to happen.

Read the rest of this entry »