## Trevor Bauer’s Peculiar Curveball

Earlier today, Eno Sarris took a look at the arsenals of tonight’s World Series Game Two starters, Trevor Bauer and Jake Arrieta. In this article, I’m going to hone in on one of those pitches in particular: Bauer’s curveball.

Pitchers want to disguise their pitches. This is a pretty obvious statement – it’s harder for a batter to hit a pitch if he can’t tell what’s coming. So naturally, conventional wisdom dictates that pitchers should try to make every pitch look the same coming out of their hand. You don’t want drastically different mechanics while throwing one type of pitch than while throwing another.

So when Trevor Bauer throws his curveball from a significantly different height than all his other pitches, that stands out. It’s hard to notice on television, but Bauer releases his curve a full six inches higher than all his other pitches.

## A Long-Needed Update on Reliability

It’s been over a year now since Sean Dolinar and I published our article(s) on reliability and uncertainty in baseball stats. When we wrote that, we had the intention of running reliability numbers for even more statistics, including pitching statistics, of which we had included none.

That didn’t happen. So a little while ago, when I was practicing honing my Python skills by rewriting our code in, well, Python (it was originally in R), I figured, “Hey, why not go back and do this for a bunch more stats?” That did happen. Sean was/is swamped making the site infinitely better, though, so I was on my own rewriting the code.

In case you need a refresher, never read our original article, and/or don’t want to now, here’s a quick description of reliability and uncertainty: reliability is a coefficient between 0 and 1 that gives a sense of the consistency of a statistic. A higher reliability means that there’s less uncertainty in the measurement. Reliability will go up with a larger sample size, so the reliability for strikeout rate after 100 plate appearances is going to be much lower than the reliability for strikeout rate at 600. Reliability also changes depending on which stat is being measured. Since strikeout rate is obviously a more talent-based stat than hit-by-pitch rate (well, maybe not for everybody), the reliability is going to be higher for strikeouts given two identical samples. You can think of it like strikeouts “stabilize” quicker than hit-by-pitches.

Reliability can be used to regress a player’s stats to the mean and then to create error bars around that, giving a confidence interval of the player’s true talent. To continue with the strikeout example, I’ll add another point — namely that, the more plate appearances a player has recorded, the closer the estimate of his true talent will be to the strikeout rate he’s running at the time. In fact, strikeout rate is so reliable that, after a full season’s worth of plate appearances, a player’s strikeout rate will probably be almost exactly reflective of his true talent. The same cannot be said for many other stats, like line drive rate, which is mostly random; the reliability for LD% never gets very high, even after a full season’s worth of batted balls.

## Introducing the Batter-Specific Run-Expectancy Tool

Today at FanGraphs, we’re introducing an interactive run-expectancy tool that incorporates the batter’s skill into the run-expectancy value. The tool, developed by the rather incredible Sean Dolinar, allows the user to input a few factors, including one to account for the batter, and in turn spits out a number estimating how many runs will be scored for the rest of the inning.

## How (Not) to Set Up a Fastball

Pitch sequencing, in my opinion, is the next big thing in the field of baseball research, and despite what Samsung might like to tell you, it isn’t here yet. There has been some tremendous work done, but we’re still a long ways away from aggregating findings into one clearly defined picture of how pitch sequencing exactly works.

But we might as well continue to add to the findings. I looked at one aspect of pitch sequencing – shifts in the called strike zone – last month. Next, I’m looking at how best to set up different types of pitches. We’ll start with four-seam fastballs, and, so as to keep it simple for now, focus just on the fastball and on the pitch immediately beforehand. Not pitches before that in the same at-bat, not pitches to the same batter earlier in the game, not pitches to that batter from a different game.

Intuitively, you might expect changing speeds on the batter to be an effective way to mess with their swing and timing. A changeup, then, should be a good pitch to set up a fastball – changeups are generally 10-plus mph slower than the same pitcher’s fastball. Curveballs, too, should be decent setup pitches, as should sliders to a lesser extent. (Sliders are usually thrown harder than curves.) As it turns out, though, it doesn’t quite work that way.

###### Contact% = Foul balls + balls in play per swing

There’s some year-to-year variation, but, by and large, changeups are ineffective ways to get swings and misses on the fastballs which follow them. Now, bear in mind, the scale here isn’t so large – it’s a few percentage points each way. But it’s still pretty clear that changeups, as well as curveballs, don’t help the pitcher throw a better fastball the next pitch.

## The Consequences of Changing an Umpire’s Eye Level

A while back, Jeff wrote this article on an Edinson Volquez pitch to Jose Bautista in the ALDS, and a commenter left this comment:

This is a good comment. I like this comment. I decided to investigate this comment. And, as it turns out, StroShow was spot on.

## The Importance of Fly Balls for Hitters

Hitters, we generally accept, are capable of controlling their balls in play (BIP) to some degree. They don’t have complete control — for example, BABIP is a much less reliable statistic than strikeout rate in the absence of huge samples — but when we see a batter with a high BABIP it’s less suspicious than it would be if that were a pitcher.

Interestingly enough, the year-to-year correlation for BABIP for hitters is quite low. The r-squared is just 0.08 (with 1 being a perfect 1:1 relationship and 0 being no relationship), even when weighting by the number of balls in play in both years. There isn’t quite a total lack of a relationship: the model’s p-value — that is, the measure of the probability that input variables have no effect on the output — is effectively 0, indicating that there almost certainly is a relationship. But knowing a hitter’s BABIP one year doesn’t tell us all that much about what it will be the next.

In graphical format, it’s easy to see the existing-but-not-very-strong relationship between a hitter’s BABIP one year and his BABIP the year after. (The size of the dots in this graph reflect the total number of balls in play the hitter had in the two years.)

I’ve always wondered, though, if batters have any ability to control things on a more granular level than this. For example, do hitters have a lot of influence over whether their ground balls turn into hits? Maybe something like BABIP on ground balls is pretty stable from year to year, and the rest of the hitter’s BABIP is just pure luck from his other kinds of batted balls. Or maybe the three are all separate skills over which the batters have a good degree of control, and the instability comes from a hitter having a down year in one category but a good year in the others.

## A New Way to Look at Sample Size

Due to the math-intensive nature of this research, we have included a supplemental post focused entirely on the math. It will be referenced throughout this post; detailed information and discussion about the research can be found there.

## INTRODUCTION

“Small sample size” is a phrase often used throughout the baseball season when analysts and fans alike discuss player’s statistics. Every fan, to some extent, has an idea of what a small sample size is, even if they don’t know it by name: a player who goes 2-for-4 in a game is not a .500 hitter; a reliever who hasn’t allowed a run by April 10 is not a zero-ERA pitcher. Knowing what small sample size means is easy. The question is, though, when do samples stop becoming small and start becoming useful and meaningful?

## On Rotation, Part 2: The Effects of Spin on Pitch Outcomes

On Monday, I looked at how different spin rates for different pitches affect the way those pitches move through the air towards a batter. That post was useful for understanding the relationship between spin and velocity and movement. What it didn’t tell us, however, is too much about what the spin actually does for the pitcher: does more spin make pitches harder or easier to make contact with? Does more spin induce weaker contact? To answer those questions (as well as others), we can look at the actual production from hitters on these pitches. That’s the goal of this post.

The first such stat we’ll consider is contact rate (Contact%), or times made contact (balls in play or foul balls) per swing.

## On Rotation, Part 1: The Effects of Spin on the Flight of a Pitch

My last article was a look at the effects of pitch location on batted balls. While it ended with on somewhat disappointing note, showing that the results couldn’t really be applied to individual pitchers, it did make me think more about which components of a pitch affect the pitch, and in which ways.

So I decided to examine spin. Spin is captured by PITCHf/x in two measurements: rate (in revolutions per minute) and direction (the angle in degrees). As it turns out, the spin of a pitch has quite the effect on its outcome, much like location. Different spin rates make the pitch move differently (obviously) and get hit differently. (For a look at this topic from a physics standpoint, check out this infographic and this much more complicated article, both from the excellent Alan Nathan. And, to make sure everybody knows: I know little about the actual physics of this past what I can infer from my baseball playing and watching experience. I am just looking at the PITCHf/x data.)

Before we get right to the graphs, a quick note about my methodology. I grouped each pitch from 2009 onward — which is the year PITCHf/x started to record spin rate consistently — into buckets based on spin rate (pitches were rounded to the nearest 50 RPM) and pitch type (I included four-seam fastballs, curveballs, changeups, two-seam fastballs, cutters, knuckleballs, and sliders). I then found a multitude of stats for each bucket: contact rate, average speed, average movement, ground ball rate, and many more. I also did the same with spin angle, grouping pitches into buckets by rounding to the nearest 20 degrees, but the results weren’t particularly meaningful.

I also combined two-seam fastballs and sinkers when I was doing this. There has been some discussion in the past about whether there is a difference between those two pitches. While PITCHf/x classifies them separately, they are more or less indistinguishable, and when I first did this without combining them, they overlapped on nearly all of the various graphs.

## Batted Balls: It’s All About Location, Location, Location

BABIP is a really hard thing to predict for pitchers. There have been plenty of attempts, sure, but nothing all that conclusive — probably because pitchers have a negligible amount of control over it. So naturally, when I found something that I thought might be able to model and estimate pitcher BABIP to a high degree of accuracy, I was very excited.

My original idea was to figure out the BABIP — as well as other batted ball stats — of individual pitches from details about the pitch itself. Velocity, movement, sequencing, and a multitude of other factors that are within the pitcher’s control play into the likelihood that a pitch will fall for a hit (even if to a very small degree). But much more than all of those, pitch location seems to be the most important factor (as well as one of the easiest to measure).

I got impressively meaningful results by plotting BABIP, GB%, FB%, wOBA on batted balls, and other stats based on horizontal and vertical location of the pitch. So I came up with models to find the probability that any batted ball would fall for a hit with the only inputs being the horizontal and vertical location (the models worked very well). I even gave different pitch types different models, since there were differences between, for example, fastballs and breaking balls. I found the “expected” BABIP of each of each pitcher’s pitches, and then I found the average of all of those expected BABIPs — theoretically, this should be the BABIP that the pitcher should have allowed.

## On the Consistency of ERA

We know that ERA isn’t a perfect indicator of a pitcher’s talent level. It depends a lot on the defense behind the pitcher in question. It depends a lot on luck in getting balls in play to fall where the fielders are. It depends a lot on luck in getting fly balls to land in front of the fence. It depends a lot on luck in sequencing — getting hits and walks at times where it doesn’t hurt too much.

That’s why we have DIPS. Stats like FIP, xFIP, SIERA, my recent SERA, and Jonathan Judge’s even more recent cFIP all attempt to more accurately measure a pitcher’s talent by stripping those things out. But what if there was an easy way to figure out how much ERA actually can vary? How likely a pitcher’s ERA was? What the spread of possible outcomes is? The aforementioned ERA estimators do not address that issue. They can tell you what the pitcher’s ERA should have been with all the luck taken away (or at least what they think the ERA should have been), but they can’t answer any of the questions I just posed.

## Examining SERA’s Predictive Powers

SERA, my attempt to estimate ERA with simulation, started off as an estimator. Then, later, I laid out ways to make it more predictive. Well, here’s the new SERA: a more predictive, more accurate and better ERA estimator altogether.

First, a refresher: The first SERA worked by inputting a pitcher’s K%, BB%, HR% (or HR/TBF), GB%, FB%, LD% and IFFB%. Then, the simulator would simulate as many innings as specified, with each at bat having an outcome with a likelihood specified by the input. A strikeout, walk or home run was simple; a ground ball, fly ball, line drive or popup made the runners advance, score or get out with the same frequency as would happen in real life.

To make SERA a better predictor of future ERA, I outlined a few major ways: not include home runs as an input (since they are so dependent on HR/FB rate, over which pitchers have almost no control), not include IFFB% for the same reason (it is extremely volatile and pitchers also have very little control over it) and regress K%, BB%, GB%, FB% and LD% based on the last three years of available data — or two or one if the player hadn’t been playing for three years. There were some other minor things, too.

## Towards a Better and More Predictive SERA

My last article introduced the concept of estimating a pitcher’s ERA using a simulation called SERA. As I pointed out throughout the article, SERA was strictly an estimator, not a predictor. That is, a pitcher’s SERA in one season wouldn’t do a great job predicting that pitcher’s ERA the next season. It’s more similar to FIP than it is to xFIP; descriptive rather than predictive.

But what if we want to create a simulator that predicts ERA for the future instead of just estimating what the ERA should’ve been? Some things are going to need to be changed — not just the code for the simulation, but also the inputs.

## Estimating ERA: A Simulated Approach

ERA, probably the single most cited reference for evaluating the performance of a pitcher, comes with a lot of problems. Neil does a good job outlining why in this FanGraphs Library entry. Over the last decade, plenty of research has cast a light on the variables within ERA that often have very little to do with the pitcher himself.

But what is the best way to use fielding-independent stats to estimate ERA? FIP is probably the most popular metric of this ilk, using only strikeouts, walks, hit batters, and home runs to create a linear equation that can be scaled to look like an expected ERA. Then there’s xFIP, which is based off the idea that pitchers have very little control over their HR/FB rate; to account for this, it estimates the amount of home runs that a pitcher should have allowed by multiplying their fly balls allowed by the league average HR/FB rate.

For many people, however, these are too simple. FIP more or less ignores all balls in play completely; xFIP treats all fly balls equally. Neither one correctly accounts for the effects that any ball in play can have; we know that the wOBA on line drives is much higher than the wOBA on pop ups, but we don’t see that reflected in many ERA estimators. The estimators we use also are fully linear, and may break down at the extreme ends; FIP tells us that a pitcher who strikes out every batter should have an ERA around -5.70, which is, well you know, not going to happen.

## Did Max Scherzer Really Have His Breakout in 2012?

Max Scherzer was on my fantasy baseball team in 2013. (Note: I recognize you don’t care about my fantasy team. This is in the service of a point, I promise.) My fantasy baseball team that year won the league championship, and Scherzer was a big reason why. I don’t remember if I thought to myself during the draft, “Hey, this guy is going to be really good because he had a 78 xFIP- last year,” or if I said, “Hey, whatever, it’s a late round, this pick won’t really matter. Why not take a flyer on this guy?” Scherzer wasn’t really much of anybody the year before, which is why I could get him late in my draft. Sure, he had a 3.74 ERA in 2012, and he won 16 games, but he certainly didn’t have the hype he does now.

Fast-forward to this offseason. Sooner or later, a real-life team will acquire Scherzer. He will be expensive, there’s no doubting that. And rightly so. Scherzer has established himself as one of the best pitchers in baseball. A true ace who has put up consecutive 5.5-win seasons, Scherzer now has a whole lot more value than pre-2013 Scherzer, who showed signs of promise but was just another pitcher who couldn’t put it together.

But how different is Scherzer now than he was two years ago? He’s two years older, of course. He’s a free-agent — as opposed to having two more years of team control. And he’s had three consecutive good (or better) years, instead of just one. But when you look closely, Scherzer is a very similar pitcher to who he was even before his Cy Young-winning 2013 campaign. And that’s not a bad thing.

## Updating and Improving The Outcome Machine

A little while ago, I wrote an article for the Community Research blog about projecting plate appearances before they happen based on the batter and the pitcher. It was pretty well received (which was nice, because I put some serious work into that thing), and apparently it was good enough for Dave Cameron to foolishly kindly decide to call me up to the big leagues.

If you read through the comments there (or if you left a comment!) you probably realized that no, the Outcome Machine — as the tool was dubbed — was not perfect. There were flaws in the way I conducted my research, and some of the assertions I made probably weren’t 100% true. So in this article, I am going to follow up on that first one and hopefully remedy any errors. Those include: