Hitting Wins Championships(?)

Over the past week or so, there have been baseball playoffs. And, like you, I have heard so many different opinions about what it takes to win a World Series Championship. Usually you hear “pitching wins championships”. This year, it’s “destiny”, “shut down bullpens”, and being a member of the San Francisco Giants. But what about hitting? Why is everyone so down on hitting? Isn’t it weird that the part of baseball people marvel at is brushed aside when trying to explain success in the postseason? Why have we never heard this?

Since I mostly despise the people that exclaim “THEY JUST KNOW HOW TO PLAY IN THE POSTSEASON” without any regard to statistics, I went back and looked at the World Series winners since 2002. I only went to 2002 because some data isn’t available on FanGraphs for the stats that I wanted to use.

The stats I used for this article

Starting Pitching and Relief Pitching

I used Wins, Saves, and Beard Length GB%, K%-BB%, and WAR because these are generally the three most looked at stats in terms of success for starting pitchers. I also felt it would give me a broader picture of the staff instead of just looking at WAR and being done with it.

Hitting

I used Runs, RBI, Bunts wRC+ instead of WAR because I wanted to isolate what the player did at the plate. We’ll look at defense and base running later. I also used K%, BB%, BB/K, ISO, and O-Contact%. I used the percentage and ratio stats to see if good discipline or free swinging mattered most. ISO is a better indicator of power than SLG and home runs. Using O-Contact%, however was a niche of mine that I threw in because I’ve always been scared of guys that have a bigger strike zone than others. It was also inspired by this Ken Arneson series of tweets. In theory, guys with higher O-Contact% rates are also harder to strike out, are more prone to BABIP luck, and also “put more pressure on the defense.”

Baserunning

I used BsR to measure both the weight in stolen bases and base running performance.

Defense

Even though it is far from perfect, I used UZR to quantify defense. Inspired by the Kansas City Royals, I also included outfielder UZR for this exercise.

Methodology

I picked out every WS winner since 2002 and wrote down the number of each stat mentioned above, and the league rank that went along with it. Here is my Excel spreadsheet, if you’re interested. I picked out the importance of each statistic based on top-5 and top-10 rank, and, to mirror the successes, bottom-10 and bottom-5 rank.

Results

If you looked at the spreadsheet that I linked to, you’ll notice that the statistic with the most top-5 rankings, the fewest bottom-10 rankings, AND the highest average ranking is wRC+. In fact, four of the top five stats with the highest average rank were hitting statistics. The top-5 with average rank: wRC+ 7.58, BB/K 9.17, SP WAR 10.17, ISO 10.25, O-Contact% 10.42. I’m not trying to say nothing else matters, but the data seems to suggest that teams need a better offense more than they do starting pitching, if only slightly so.

On the flip side of things, the statistic with the most bottom-10 ranks, and lowest overall ranking (K% would be lowest, but remember, lower is better with K%) is GB% for starting pitchers. Only the ’04 and ’11 Cardinals had a top-5 GB% while also getting league average (Rank > or = to 15) WAR from their starting pitchers. Six out of the 12 teams listed here posted bottom-10 ranks in GB%, which is incredibly interesting, given the theories behind ground ball pitchers that are so commonly found on the web nowadays. Does this mean ground balls are not important? Well, no. But it does mean that they may not be as important as they once were thought to be.

Base running didn’t end up being as big of a factor as I thought it would be, the Cardinals apparently care not for good defense, but look at O-Contact%! It was the fifth most important stat by average rank, and finished with only one team (’04 Red Sox) in the bottom ten, as opposed to six top ten placements. Furthermore, the rate at which teams struck out mattered more than how often they walked, but BB/K is the peripheral that seems to be the most telling.

We’ll probably never hear about how an offense won a team a World Series. In fact, we’ll probably instead hear it spun as a pitcher blowing the game. But at least now we have statistical evidence (even if it is only the past 12 years) that offense IS a major player in deciding who wins the World Series. We also have evidence to suggest that maybe hitters who expand the strike zone to their advantage are more valuable than has been discussed recently. Admittedly, this would take another article to deduce. Any takers?


Searching for a Postseason Fatigue Effect

Introduction:

If you had to pick one specific topic as baseball’s most prominent overarching narrative over the past couple years, there’s a good chance you would say “pitcher injuries”.  An era of high speeds and higher strikeout rates has been colored by constant announcements of elbow blowouts.  This year’s injuries alone included two guys who easily could have won their league’s Cy Young, Masahiro Tanaka and Jose Fernandez.

If you think the problem might be pitcher overuse, you’re in esteemed company. Since the famous “Joba Rules” of 2007, teams have experimented with limiting pitcher workloads to lessen the chance of injury.  The Washington Nationals famously limited Stephen Strasburg to 160 innings in 2012 in his first year back from Tommy John surgery.  (That storyline, by the way, was some of the greatest debate fodder baseball has seen in recent years.)

But sometimes an innings limit just isn’t feasible.  Sometimes a workhorse propels his team to the playoffs in a 33-start season, and then has to crank it up a notch for a playoff run. Surely that’s a form of overuse, right?  After a 250+ inning season — and a short off-season to boot — shouldn’t we be worried about fatigue or injury-susceptibility?  Let’s find out!

Methodology:

Obviously we can’t directly observe the answers to our questions, since we can’t observe the alternate-universe in which the previous year’s postseason pitchers didn’t go to the playoffs (though I hear Trackman is working on this). However, we can compare actual performance to projected performance. There are various projection systems out there, and for this study I chose Marcel. Though it’s not the most sophisticated system–you can find the basics here–Marcel compares well to the rest of the field. Keep in mind that all we need here is an unbiased system, not necessarily the most accurate one. Marcel is such a system, and it also has the advantage of being easy to download for multiple seasons (thanks to Baseball Heatmaps) and coming in a very similar format to the Lahman database, including identical player IDs. This makes comparisons between projections and actual performance a breeze.

So what we’re looking for now is whether postseason pitchers show a tendency to underperform their projections the next year, relative to pitchers who did not pitch in the postseason.  This could take the form of pitching less than expected or worse than expected.

Sample:

For the test group, I took all pitchers who started at least 28 regular-season games and at least 3 postseason games in a single year. I used all seasons from from 1995 to 2012. Basically, this means that the test group pitched (more or less) a full season and then pitched at least until the Championship Series, and they did so in the wildcard era.  For the control group, I took all pitchers with 28+ starts who did not appear in the postseason. For both groups I compared their Marcel projections with their actual performances from the next year (1996-2013).  I did not include 2014 because Lahman data is not yet available for this year.

A note about the samples: the test group pitchers are generally better than the control group pitchers. After all, they helped their teams reach the playoffs, and then were good enough to get a few postseason starts. There’s no reason to think this should taint our experiment, though. Remember, we aren’t worried about raw performance, but rather performance relative to projections.

Results:

First let’s look at playing time. If there really is a postseason effect here, we should expect our test-group pitchers to miss more time due to injury and ineffectiveness. In the case of our null hypothesis (no postseason effect) however, our test group should actually pitch more than the control group, since they’re better pitchers in general and therefore deserve to be given the ball more often.

Table1

N=161 and N=994 for the Test Group and Control Group, respectively.

As we can see both groups started more games than Marcel projected. This is actually unsurprising, since by definition our sample pitchers are more durable than average. The Marcel projection system regresses players to the mean (to varying degrees based on confidence levels), so the less durable and fringier pitchers we omitted pull the samples’ projections down.

The takeaway, however, is that the postseason pitchers exceeded their projected GS by much more than the control group did. This certainly refutes the hypothesis that postseason pitchers are more likely to go down next season. Let’s take a more detailed look with some density plots.

try2

The higher peak for the test group near 32 games started confirms what we just saw, that the test group generally pitched more. We also see that the control group is more densely populated at the left tail, which means that a higher proportion of these starters pitch very little the next year.  Again, they’re worse in general, so that’s not surprising from the perspective of the null hypothesis.

Now let’s look at the density plot for Games Started minus Projected Games started, to see in detail the ways in which both groups exceeded their projections.

try1

Both groups are equally (un)likely to exceed their projected starts by a great deal, as demonstrated by the near-identical right tails (this makes sense — you can’t exceed a 30-start projection by much).  For both groups, the most common result was to pitch a few games more than projected. However, the control group was somewhat more likely to fall far short of their projected starts. This gives more support to our null hypothesis: assuming no unique fatigue or injury effect, the test group is less likely to be ineffective enough to lose starts, since they’re better overall and may furthermore have built up some organizational goodwill from the previous year’s playoff run.

That seems to put the matter of playing time to rest. But what about results? Do postseason pitchers show a change in per-game performance the next year?  The below tables show the mean rates for both groups over various important pitching categories.

Tables

Tables

Note that in every category except Kper9, a small number is preferable. Thus, a positive value for (Actual Kper9 minus Projected Kper9) means the group outperformed projections, but a positive difference for all other categories means it underperformed.

In all five categories, the postseason pitchers did better relative to their projections than the non-postseason pitchers. Granted, some of those margins are thin, but this certainly provides more evidence that postseason fatigue doesn’t affect performance going forward.

This table is a bit misleading, however.  Calculating the mean rates for each group gives equal weight to all pitchers.  For our purposes, this is both good and bad. On the one hand, pitchers who only pitched a bit — and are thus liable to have some wacky rates — have a disproportionate effect on the group.   On the other hand, if a pitcher becomes so bad that his team has to pull him from the rotation, we want that to affect our calculations, since that’s exactly the kind of decline we’re researching.

With that in mind, let’s look at the same categories, but with both actual rates and projected rates weighted by actual innings pitched, so that we can get a good sense of each group’s real-world contribution.

Tables

Tables

As expected, this brings the difference between actual and projected performances closer to zero.  Still though, the test group is better than the control group relative to projections in all areas.

Conclusions:

In our search to find an impact of full season + postseason overuse, we’ve found nothing. In fact, if anything, results suggested a long season and postseason might be better for pitchers going forward. However, it’s unlikely that that’s a general truth. As I mentioned with Games Started, Marcel’s regression to the mean makes less sense when you single out durable pitchers as a whole. In terms of rate stats, differences between the two groups were generally small. As before, we can explain a bit of this difference through regression:

Marcel projections include a value for relative confidence, which signals how much the system regresses a player’s projections. The control group had a slightly lower overall value for this (0.78 vs. 0.80, weighted by actual IP), indicating that its values were regressed slightly more. Since the control group — despite being worse than the test group — was projected to be better than average for starters —

Tables

The left column is the control group’s projected rates weighted by projected IP (wheras in the previous charts everything was weighted by actual IP). The right column was calculated using Fangraphs data for K, BB, H, HR, R, and IP for all “Starters” over the same time span.

— we can tell that both groups were pulled in the direction of mediocrity. The lower confidence value for the control group means that those starters were pulled a bit harder. This extra pull could account for the fact that the control group was slightly worse relative to projections than the test group.

Overall, we’ve seen absolutely no evidence to suggest that a postseason run has a negative impact on a pitcher for the subsequent year. Perhaps a similar study of relievers would yield different results; pitching frequently in short bursts may have a different cumulative fatigue effect.

It’s also possible that the postseason fatigue effect does exist for starting pitchers but is not apparent after just one year, or it requires multiple full seasons plus postseasons to manifest itself. However, those questions pretty much boil down to, “can lots of difficult physical activity over a long period of time cause physical damage?” which is both boring and obvious.  The present study is interested in the immediate consequences of a long season.

We could also re-do the study with a more sophisticated projection system, but such a study would be unlikely to uncover something significant given that Marcel didn’t even hint at an effect. For now, at least, it seems wise not to argue “postseason fatigue” if James Shields has a poor April in 2015.

Player-season data comes from Sean Lahman’s database, both the “Pitching” and “PitchingPost” tables.  As stated in the piece, Marcel projections were downloaded from BaseballHeatmaps.com.  Finally, data for all starters over the relevant time span was obtained with Fangraphs’ “Custom Table” feature.


Albert Almora’s Inability to Walk

In 2012, the Chicago Cubs used the 6th overall draft pick to select Albert Almora, a high school outfielder from Miami. Almora was considered one of the top prospects in Chicago’s system and all of baseball entering 2014, ranking 36th on Baseball America’s Top 100, 28th on Keith Law’s Top 100, and 25th on Baseball Prospectus’s Top 101.

Almora struggled at the plate for a couple months in High-A this season before finally showing some brief improvement. This led to a promotion after just 89 games despite an OPS of .712. He performed even worse in Double-A, posting an OPS of .605 in the 36 games he played at the level. One of Almora’s most glaring flaws is his low walk rate—in 530 combined PA between the two levels at which he played this year, Almora walked just 14 times, a miniscule 2.6% of his plate appearances.

One explanation for Almora’s low walk rate is that his innate ability to make solid contact on most pitches prevents him from getting deep into counts and working walks. As Keith Law noted last offseason, “[Almora] has great hand-eye coordination that allows him to square up a lot of pitches, but has to learn to rein himself in and wait for a pitch he can drive to make full use of his hit and power tools — and if that means taking a few more walks, well, both he and the Cubs could use that right about now.”

We know that drawing walks is a good offensive skill to possess, but how problematic is it to be unable to do so? I wanted to better understand if it is possible for Almora to still have a successful major league career even if he is never able to overcome his inability to see ball four, and if so, how he might accomplish that.

I was a little surprised to find that out of all qualified major league hitters this year, the five lowest walk rates all belong to players who provided at least 2 WAR to their team, meaning they were at least average players. I examined how each player was able to do so despite posting a walk rate of 3.7% or lower.

Ben Revere owned the lowest walk rate in the MLB this year, coming in at 2.1%. Despite this, he was able to put up a respectable wRC+ of 92. Most of Revere’s offensive value comes from his ability to make contact (7.8% strikeout rate) and a high BABIP aided by his tremendous speed. He also provides a lot of value on the bases, where he is once again helped by his speed. While UZR hasn’t loved him in center field this year, he does play a premium position, and he has had better defensive numbers in the past. Revere mostly posted walk rates around 7-8% coming up through the minors, but his complete lack of power means that MLB pitchers are able to challenge him with strikes without having to worry about giving up extra-base hits. Revere has relied upon his speed to find success in the majors.

Adam Jones is the most successful of this bunch, posting a 5.4 WAR even with a walk rate of just 2.8%. Jones rates well in UZR this year and has won three Gold Gloves, but generally defensive metrics have not loved his defense, rating him below average in 2009-2013. Solid baserunning has helped Jones provide value to the Orioles, but his production mainly comes from his power, as he has a career ISO of .181. He has hit at least 25 home runs in each of the past four seasons, topping 30 twice. Jones’s power is his biggest asset and has allowed him to succeed, even with low walk rates and OBPs.

Salvador Perez posted a WAR of 3.3 in 2014, ranking him sixth among all catchers. Perez derives most of his value from two areas: his power and his plus defense at the most difficult position on the defensive spectrum. While the problems with measuring catcher defense have been well-noted, both stats and humans seem to agree that Perez is really good at it. On offense, his .148 career ISO has helped warrant a spot in the lineup, even while posting an OBP under .290 this year.

Next on the list is Alexei Ramirez. Ramirez’s greatest contributions come from playing an above average shortstop and running the bases well. He has put up solid, if unspectacular, offensive numbers thanks to good contact rates and decent power. Ramirez has been an average or above average player for five straight seasons even while walking only 4.4% of the time during that span.

The final player in this group is another Royal—Alcides Escobar, a shortstop known for his plus defense. His walk rates in the big leagues have mostly been around 3-4%, and his offensive production has fluctuated with his BABIP, as he relies on his average to carry his OBP. His strong defense at a premium defensive position and solid baserunning have provided enough value to keep him in the big leagues when his BABIP is low and to make him an average or above average player when it is high. 2014 was Escobar’s best season in the majors, but even when his offensive production is down, his defense and baserunning are able to make up for it enough to warrant a spot on a major league team.

Succeeding in the major leagues with a low walk rate is certainly possible, and these five players show there are multiple ways to do it. I think there are a few major takeaways from this exercise.

1) Players who rarely walk must get most of their value from defense and baserunning. All of these five players play a premium defensive position, allowing them to provide a lot of value on defense while requiring less of them at the plate. Most of them are also above average on the basepaths.

2) Players who rarely walk don’t necessarily have to be even an average offensive player, but they can’t be helpless either. They need to derive some sort of offensive value, whether it’s from hitting for power or making lots of contact and having a high BABIP to boost their OBP to a respectable level.

So where does this leave Almora? He checks off the first point, as most people seem to agree he is a plus defensive centerfielder, and although he’s not described as a burner on the basepaths, his instincts will likely allow him to be at least an average baserunner. At the plate, though, Almora still has a ways to go. While he doesn’t necessarily need to get his walk rate up a ton to be successful, he will have to find a way to provide more value than he has shown he can do this year.

It seems most likely that if Almora is to be a successful major leaguer, he will wind up in the Escobar/Ramirez mold—a player who makes plenty of contact and hits for a high average to support his OBP enough to keep him in a major league lineup while his defense accounts for most of his value. He still has a ways to go to reach even this level of competency at the plate, but he showed an ability to do it in the Midwest League in 2013, and he is still just 20 years old. 2014 was a step backward for Almora, and he’ll have to prove that he can provide some sort of offensive value if he wants to patrol centerfield on the north side, but he is not a lost cause and has the necessary skill set to succeed in the majors even with the impatience he has shown at the dish in his minor league career.


It Must Be Something: Explaining the Nationals-Giants series

Last week, the Washington Nationals lost their opening-round playoff series against the San Francisco Giants, falling 3-2 in Game 4 in San Francisco. The series offered a lot of gripping, exciting baseball; and for one Nationals fan, at least, it was an enriching experience even with the loss. (This post is written from a Nationals fan perspective, but may be of wider interest). After a close playoff series, it is natural to try to understand what happened. I’d like to look at an idea which has surfaced in prominent places in recent days:

** The Nationals suffered from a lack of poise in the face of the heightened pressure in the playoffs; and the Giants exhibited more poise, in a manner which contributed significantly to their victory.

This idea can be found in two recent columns by the Washington Post’s Thomas Boswell (“Washington Nationals must recognize, and embrace, that October is whole new ballgame” and “Hard truth is Nationals are not yet a match for the poised, traditional powers of the NL”, both from October 8). There is similar praise of the Giants in Jayson Stark’s ESPN article “For Giants, it’s ‘ugly, but it works’” (also October 8).

I’m afraid I think reactions like this are superficial. Both teams scored nine runs over four games, so by this familiar measure they were equal. But we all share a tendency to think that the Giants must have won for a good reason: there must be something which distinguishes the two teams. Rather than being unique to inquisitive baseball fans, this desire for an explanation has deep roots far outside the sporting world; it is codified in some circles as “the principle of sufficient reason.”

Regarding the baseball playoffs, this principle is often applied as follows:

Playoff contests between evenly matched teams are often won by the team which possesses more poise. As compared to the regular season, there is more pressure in the playoffs, and what really matters is whether you respond to this with poise. In fact, poise is so important in the playoffs that it often allows a less talented team to beat a more talented team.

Several factors combine to make the poise theory an inevitable diagnosis of the Nationals-Giants series. The Nationals had a better regular season record (96 wins vs. 88 for San Francisco) and are perceived as having more talent. Also, the Giants had established a reputation as a very poised playoff team by winning two of the previous four World Series. From my side of the country, it sounds like they also picked up a reputation for outperforming their regular season record in the playoffs.

Not only that, but in 2012 the Nationals had another excellent regular season before losing to the Cardinals in a five-game first round playoff series. As you know, the Cardinals also have a reputation for being a poised playoff team. And it should not be a surprise that the 2012 Nationals-Cards series seemed to lend itself to the explanation that the Cardinals exhibited more poise.

Our series matched a post-season poise team against a regular-season performer with question marks surrounding its playoff poise. So, after the series concluded in the manner that it did, a logical next step was the appearance of the poise theory.

The problem with the poise theory is that it starts with the winner and works backwards. It cherry-picks moments that are easy to remember, at the expense of more gradual or incremental dynamics. The theory routinely assigns these moments too much significance. Often, this mindset looks at only one side of what happened at various points in the game. The analytical result is that the winner won via poise, and the loser gets no credit for exhibiting poise, or any other positive qualities.

The poise account of the Giant-Nationals series is that the Nationals were frozen by the moment and didn’t hit well, that the Giants tied game 2 when down to their last out (and won it with a poised HR in extra time 9 innings later), that the Nationals made several on-field errors in game 4, and made two questionable (or just bad) bullpen-decisions in games 2 and 4…and that the Giants played gritty, opportunistic, mistake-free baseball throughout the series.

One obvious flaw in the poise account is that the last idea is false: Madison Bumgarner’s throwing error in game 3 allowed the Nationals to score 2 runs in their 4-1 victory. In addition, this error was triggered by a two-strike bunt from Wilson Ramos, which would seem to qualify as an exhibition of playoff poise (and of a player adapting to the moment, etc.).

Why doesn’t Bumgarner’s two-run throwing error count against our attribution of poise to the Giants? One reason is because we are working backwards from the fact that the Giants ultimately won the series. Since the Giants won a close series which can only be explained in terms of poise, elements of the series which clash with this narrative are suppressed to preserve the integrity of the explanation.

The “poise” explanation of the Giants’ victory is also challenged if we admit that the Nationals exhibited poise, because then the two teams do not differ in a way that explains the Giants’ victory.

Unfortunately for the poise theory, the Nationals displayed loads of this quality throughout the series – for example, via a two-strike bunt, via Jordan Zimmermann’s game 2, or via Doug Fister’s game 3. (If you are currently protesting that Ramos’ bunt was very improbable, you are just tracking the series outcome and the prior reputations of the teams).

Also, in game 4, although the Nationals certainly struggled in innings 2 and 7, including loading the bases twice, walking in a run, and throwing a wild pitch — they kept themselves in the game by limiting the total damage to 3 runs. This fact would have played very well in “poise” articles written in the scenario where the Nationals went on to win. It is now somewhat difficult for us to see poise at work in those innings. But again this is perception well shaded by the outcome. This illustrates how in baseball the attribution of poise just tracks who won a close game or series.

The poise theory cherry-picks parts of games; it also cherry-picks parts of plays. In the seventh inning of game 4, after his wild pitch, Aaron Barrett threw a ball over the catcher Ramos’ head; they were trying to walk the batter. But Ramos was able to recover the ball, Barrett covered the plate; and, in a poised, well-executed play, they threw out Buster Posey at the plate, thus preventing another run.

In game 2, with Drew Storen pitching in the 9th, Pablo Sandoval hit a ball down the left-field line which scored one run, which tied the game, and which threatened to score two. But the Nationals made two accurate throws starting from deep left field, and a good tag at the plate, to get Buster Posey (again, so to speak) at the plate.

The poise theory presumably gives Posey credit for pushing the action in close games; and here I agree. But we should also give credit to the Nationals for showing the poise, and, relatedly, the baseball fundamentals, to throw him out twice to prevent runs.

I think a normal look at poise finds it in abundance on both teams in this series. However, the baseball variant of this concept has a different logic. This variant just tracks the winner when the outcome is close.

In addition to the poise issue, there were other interesting aspects of the series.

Although the Nationals were regarded as the better team, the two clubs were not far apart with respect to many regular-season statistical measures.

Nationals batting (pitchers excluded):
.261 avg. / .330 oba / .407 slg. *** 107 wRC+, 151 HR *** 8.6% BB / 20.0% K

Giants batting (pitchers excluded):
.263 avg. / .319 oba / .401 slg. *** 107 wRC+, 128 HR *** 7.2% BB / 19.3% K

The two teams had very similar offenses, although the OBA and HR numbers represent real differences. Also, their K and BB rates cohere (to a small degree) with the idea that the Giants are more of a contact hitting team, in that they swung more (i.e., walked less) and struck out less than the Nationals. One suggestion I’ll make below is that some of the Nationals should have swung a bit more.

Turning to pitching, although the Nationals came in with a better pitching reputation, and although the Nationals have better pitching, this point is not straightforwardly validated by the full range of ERA-like measures made available by contemporary analysis:

Nationals: 3.03 ERA / 3.18 FIP / 3.43 xFIP
Giants: 3.50 ERA / 3.58 FIP / 3.59 xFIP

The pitching stats converge as we move to measures which factor out balls in play (roughly, FIP) and then factor out the home run/fly ball rate (roughly, xFIP).

FIP and xFIP bring the teams together; so do somewhat blunter measures like runs allowed per game:

Nationals: 3.43
Giants: 3.79

The teams’ xFIP’s were very close, and they were closer than I would have guessed in terms of Runs Allowed. The Nationals had a better record, but I think this was due in part to the Giants just playing the Dodgers more! These teams were closer than the lead-in fanfare communicated.

I’ll offer two observations about the Nationals’ hitting, both of which cut somewhat against the playoff poise theory. The first is that while the Nationals’ offense certainly has a high-gear mode, this is not the only face they present to the world on an ongoing basis. For instance, the non-pitchers were .252 avg. // 101 wRC+ in the first half of this season…vs. a .273 avg. // 115 wRC+ in the second half of the season.

The streakiness is due in part to a group of more or less low-average, high-power players (LaRoche, Desmond, Ramos). These players are somewhat prone to 4-0-0-0 nights anyway, and in the playoffs series the Giants appeared to have good plans for them. My subjective recollection is that there were many at-bats when these players were not close to getting a hit.

But what about the Nats’ better hitters? I am thinking of Rendon and Werth in particular, and again the Giants appeared to have a plan. Here I do have a concrete suggestion about what was going on. Werth and Rendon each had 20 plate appearances in the series, and they both had 10 appearances where they took the first pitch as a called strike. This may be a surprise to you, but I doubt it’s a surprise to the Giants. Werth and Rendon are both deliberate hitters, and I think the Giants resolved to take advantage of this, and to keeping throwing early strikes until Werth and Rendon made them pay.

Of course, Rendon batted .368 for the series, and Werth batted .056. However, Rendon’s hits were all singles, from a 21 HR / 39 2B hitter. The Giants gained an edge here – in a specific, tangible way – and Rendon and Werth didn’t make the requisite adjustment. But this is one piece of a story which could easily have been different. For example, Rendon hit a very deep fly ball in extra-innings game 2, which might have made it to the wall or farther in different wind conditions. Werth had similar misfortune on deep fly balls, the most memorable of which was Hunter Pence’s excellent catch late in Game 4.

The Giants deserve credit for executing a good approach against the Nationals’ hitters. On the other side, the Giants did not exactly light up the Nationals’ pitching. After game 2, the Giants did not score a run off a hit. So I suspect that the Nationals’ pitchers executed similar strategies as well. These layers of the competition are more remote to those of us who observe the game from the outside; but they are probably more significant than psychological differences between the teams.

What about Bryce Harper?

Bryce Harper did more than exhibit poise in this series. Bryce Harper displayed the superlative animal dynamism which our games can extract from us and showcase, the best they can offer. More than any other player, Harper elevated a series marked largely by deadlock and attrition. A series like that does require poise, which both teams showed. A series like that is exciting, but not transcendent. Poets celebrate poise when a contest offers little other inspiration.

OK, what are the proper takeaways?

Boswell writes

If you send the winning run home on a wild pitch (Aaron Barrett); if you can’t field a two-hop grounder back to the mound (Gio Gonzalez); if three players look at each other and none of them picks up a sacrifice bunt attempt (Gonzalez, Anthony Rendon, Ramos); if you can’t throw a strike with the bases loaded and walk home a run (Gonzalez); if you get confused and throw home when no Giant is actually running toward the plate (LaRoche), squandering an out, then you have no business staying at baseball’s October party.

Amen! But why not issue a similar edict against the Giants, who, again, did not score a run off a hit in the last two games? Out of context, that doesn’t sound like a terribly promising formula either.

Boswell also draws an analogy to golf: “Right now, the Nationals are like professional golfers who win a bunch of weekly Tour events but falter under the pressure in major championships.” His remark connects us with a long-running discussion in golf about competitors with various records in the majors (the Masters, the US Open, the British Open, and the PGA Championship) and in regular events. This discussion of golf players is characterized by an all-too-familiar blend of mythology, pop psychology, and information gaps. Nonetheless, I think there are instructive parallels between the majors and the baseball playoffs, which help us understand the recent Nationals-Giants series, and perhaps offer some lessons for the Nationals looking ahead.

Boswell’s peroration about disqualifying mistakes is wrong. Golfers win major tournaments despite serious, embarrassing, incriminating blow-ups. At Carnoustie’s 18th hole on Sunday of the 2007 British Open, Padraig Harrington twice hit his ball into a narrow, winding waterway, but ended up winning a playoff against Sergio Garcia. I am fine with the idea that you have no business trying to win a major if you find the water twice on the 18th hole. But this plausible moral stance is falsified by events. Similarly, in 1999, on the same final hole at Carnoustie, Jean van de Velde elaborated an even greater disaster; he blew a three-shot lead, but still qualified for a playoff.

The significance of an error depends on where you are in the competition and on what your opponents are doing. In a high-pressure situation, they may not be doing very much. At Carnoustie in 2007, the golf course and the moment got the better of everyone, in that the top three finishers (Harrington included) were a combined six over par for the last two holes. At Carnoustie in 1999, the course had been winning all week, in that no one finished under par for the tournament. In fact, van de Velde’s blow-up brought him back to a three-way tie for the lead at 6 over par. Looking at a different golf course, in the 2006 US Open, won by Geoff Ogilvy, the top four finishers all suffered serious damage on the final day, with Phil Mickelson and Colin Montgomerie taking double bogeys on the final hole.

The Nationals should work on their play inside the diamond, but they shouldn’t beat themselves up about it. Everyone is likely to screw up in the furnace of playoff pressure, including the Giants…who yielded two runs on one bunt.

Let’s say that an attrition contest is one in which even the winner takes a beating. Although this model is prominent in major golf, it is not universal. (I’m sure it isn’t in baseball either. But I have a better grasp of recent golf). Some players get a lead early and are never seriously threatened. Many of Tiger Woods’ victories fit this pattern. A recent, more mortal example is Martin Kaymer’s 8-shot victory in the 2014 US Open.

Another interesting major winner is Charl Schwartzel, who birdied the final 4 holes at the 2011 Masters, to resolve a highly fluid final-day horse race in which 8 different players had at least a tie for the lead at different times during the day. Five past or future major winners finished behind Schwartzel in the top 10, as well as Rory McIlroy, who lost a two-stroke lead, shot an 80 for the day, and finished out of the top 10. (McIlroy won the next major in 2011 and has since won three more majors). Schwartzel elevated his play above his competitors at the climax of one of the world’s great sporting events. In this setting, against this group, poise is out as an explanatory variable. Schwartzel won with the sort of imperious dynamism which I have already praised as the most admirable character trait athletic competition reveals to us.

I think the Nationals can win an attrition playoff series, because they almost did. (Just ask the Giants in a candid moment). But playoff success for them is likely to go by a different path. A team which can post a second-half 115 wRC+ (pitchers excluded) without a healthy Ryan Zimmerman and Bryce Harper, while posting a team 2.96 ERA over the same period, may not need to change the way it plays. It may need to embrace the way it plays.

Less poetically, I’m optimistic about what the team can do with a full season of Zimmerman and Harper, Harper, Harper :-).


Roster Doctor: Colorado Rockies

It was a grim year for the Rockies, with the once proud franchise sagging to 96 losses, just ahead of the woeful Snakes in the NL West. For this Dan O’Dowd, one of baseball’s longest serving GM’s, was finally shown the door, resigning rather than accepting the inevitable blindfold and cigarette. Rockies player development director Jeff Bridich now takes the reins, and he has a daunting challenge as he seeks to reinforce Colorado’s status as a purple state.

Faced with numerous roster holes, Bridich will confront perhaps the biggest decision of his GM career almost immediately: whether to trade Troy Tulowi(t)zki. Tulo was having an epic offensive season (.340/.432/.603, wth 21 HR in just 91 games) before injuries felled him, as they frequently do. In his 9-year career, Tulo has reached 600 plate appearances just 3 times. On the other hand, Tulo has failed to reach 5.0 bWAR (or, for the more traditionally minded, has failed to hit at least 20 HR) just 3 times. He recently turned 30, and is owed $20 million per year through 2019, during which his performance will inevitably decline as time’s relentless march claims another career. His contract will pay $14 million in 2020, followed by what will likely be a $4 million team buyout.

Trading Tulo is probably the only way the Rockies could even attempt to obtain young, impact starting pitchers who are at or near major-league ready. And the Rox staff is bad. Yes, Coors continues to waterboard pitchers, but the Rox were bad on the road too, regardless of your statistical weapon of choice (last in ERA, last in FIP, and 24th in xFIP). Bridich will need to examine innovative options (humidors? animal sacrifices? precision air strikes?) to aid in constructing an effective staff, but he’ll also need to at least consider trading the Rockies only real star.

The Mets, Reds, and Marlins have holes at SS and (perhaps) high-end pitching to trade, although only the Mets have it in quantity. What none of these teams probably has, however, is the will to take on a huge contract. Tulowitzki doesn’t have a no-trade clause, but the high value (both total and average annual) of his contract tends to act like one. If the Rockies could pry one or two of the Mets’ top young starters away, they should probably make the trade, but in the absence of that (and the Mets seem much more likely to trade with the Cubs, who have a glut of young, cheap, and potentially very good middle infielders), the Rockies should hold onto Tulo, and my guess is that they will. He has a legitimate shot at the Hall of Fame and is either still in his prime or just slightly past it.

This puts increased emphasis on finding solutions from the farm; that the team’s owners promoted Bridich, the player development chief, to the GM’s role suggests they have some confidence in the system he has overseen. The reviews this year on the pitching front are mixed: 3 of the Rockies’ top 5 prospects, as ranked by Baseball America during the preseason, were pitchers. Of those, Jon Gray (#1) had a good but not great year at AA Tulsa. His modest 3.91 ERA was worse than the team’s mark, but he was the youngest pitcher on the staff and his peripherals stacked up well. Eddie Butler (#2) on the other hand went backwards, as his strikeouts disappeared. While posting a decent 3.58 ERA at Tulsa, he only managed a 5.2 K/9 rate. Chad Bettis (#5) has already been moved to the pen, where he put up 24 Innings of Horror in the majors. Danny Winkler, not among BA’s Top 30 Rockies prospects, had a breakout year at Tulsa, posting a 1.41 ERA and strikeout and walk rates of 9.1 and 2.2, respectively. This is, however, about it; there aren’t many other horses in this cavalry brigade. It’s likely that none of these guys will develop into a true ace (though Gray still has an outside shot), but as the Orioles have demonstrated this year, it is possible to win without having a starter who even sniffs the Cy Young race.

Another but probably more tractable problem is the Rockies offensive ineptitude on the road. This is isn’t solely because of a drop in power; Rockies hitters on the road this year were last in on-base and 26th in slugging, leading to a wOBA of .278 on the road, better only than the San Diego Padres. Since their last postseason appearance in 2009, the Rox have been rock-bottom in road wOBA.

The good news for Bridich is that the damage isn’t uniformly spread throughout the batting order. Tulowitzki, Justin Morneau, Michael Cuddyer, Corey Dickerson, and Nolan Arenado were all effective on the road in 2014, with Arenado having the lowest road wOBA among that group at .314, a respectable mark compared to the MLB average of .310. The rest of the lineup was … well … let’s just go to the numbers (2014 wOBA):

Wilin Rosario           .235

DJ LeMahieu             .240

Carlos Gonzalez       .242

Ray Oyler                    .252

That’s Ray Oyler’s wOBA for his “career year” of 1967. Alert readers will have noted that Oyler did not in fact play for the Rockies in 2014, but his demon spawn did. Even Kershaw would struggle to win games with a 3-Oyler lineup behind him. Each of these guys presents a slightly different problem, so let’s take them in turn.

Wilin Rosario had a face-plant campaign for most of the year, but rallied at the end to put up batting and on-base averages (.267/.305) pretty close to his career numbers.  His power, however, receded (.435 SLG, compared to a career rate of .483). And oh my oh my oh my was he bad on the road, as Scott Strandberg covered in detail a few days ago. But there is some hope; while Rosario has always been weaker on the road than at home, he’s never been anywhere close to his abysmal 2014 performance. For his career (from 2012-2014) his road wOBAs are .305, .342, and <gulp!> .235 (I’m leaving out 24 PAs in 2011).

As Strandberg noted, Rosario actually improved his plate discipline this year, while dealing with rumors that he would eventually be forced to move to first because of his subpar catching skills. I’d be willing to bet that his late-season surge (.470 wOBA in September) was a sign that the swing-tinkering (if that’s what it was) was beginning to take effect, and that Bridich won’t write off his starting catcher based on 184 road PAs, even 184 as bone-chilling as Rosario’s last year. But the team will need to work with Rosario to either improve his fielding enough to keep him behind the plate long-term, or to improve his hitting enough to justify a move to first.

DJ LeMahieu can’t hit on a train. He can’t hit on a plane. He can’t hit a la mode. He can’t hit on the road. From 2012 – 2014, LeMahieu had the third worst wOBA on the road among players with more than 500 road appearances:

Darwin Barney    .237

J.P. Arencibia      .259

DJ LeMahieu           .260

Like Rosario, LeMahieu had some success on the road in the past, but much less of it. For the last three years, LeMahieu’s road wOBAs are .318, .252, and .240. He’s an excellent defender with plus speed who puts up ok numbers in Coors, but this skill set fits much better on the bench. Unlike Rosario, LeMahieu’s 2014 road performance was very much in character. It’s time for the Rox to look elsewhere for their second baseman. Minor leaguer Taylor Featherston might be able to help by the 2015 All-Star Break.

Carlos Gonzalez is a two-time All-Star who is only 28. He also hit like Ray Oyler on the road this year, which entirely accounts for his disappointing 2014 results. He was still very effective at home, posting a .407 wOBA in what was clearly his worst overall season. His road wOBA in 2014 was a full 80 points below his career road number. Some of this (perhaps a lot) is down to bad luck. CarGo had a miniscule .181 BABIP on the road, and he struggled (as usual) with injuries. It’s possible that he had the bad luck to suffer more from these on the road, or that the Rox medical staff did a better job keeping him healthy at home. In any case, Gonzalez is a much better player than his ghastly road numbers this year would suggest, and the Rockies have few alternatives available, in part because CarGo  will be hard to trade after this down year. Their best bet here is to stay the course, and to give the plate appearances he inevitably misses to Corey Dickerson if Dickerson’s not starting in center.

Bridich starts his new job with a wonderful ballpark, enthusiastic and knowledgeable fans, and a media market relatively free of piranhas. He won’t face pressure to make splashy moves, which is good, because he doesn’t have many to make.


Job Posting: Manager of IT and Technical Support, TrackMan Baseball

Manager of IT and Technical Support

TrackMan Baseball is looking for a resourceful, innovative, self-starter to take ownership of IT and Technical support for our network of stadium and remote data collection systems.

 

About TrackMan

TrackMan develops, manufactures and sells 3D ball flight measurement equipment used in a variety of sports. Today, TrackMan is the world leader in golf ball flight and club data measurements and the company is considered to have set the industry standards for accuracy in golf and baseball.

 

TrackMan Baseball measures stuff - the location, trajectory and spin rate of pitched and batted balls – and provides real-time feedback for coaching and a new set of statistics for analyzing player performance. TrackMan Baseball is used by the majority of Major League baseball teams and premier NCAA, international and amateur baseball programs. Additionally, TrackMan is used for R&D, marketing, and media purposes by equipment manufacturers to develop more effective products and broadcasters to enhance content and analytical capabilities.

 

Position Description / Responsibilities

Candidate will be responsible for overseeing and maintaining internal IT, Cloud services and supporting a network of distributed system located in Major League, Minor League and NCAA baseball stadiums, and amateur baseball tournaments. Responsible for effective installation/configuration, operation, and maintenance of systems hardware and software, proactive monitoring of critical and network systems and troubleshooting. Candidate will support the company in the overall design and implementation of IT systems.

 

Responsibilities include, but are not limited to the following

  • Optimize, develop and implement monitoring efforts and system building.
  • Design, develop and document solutions for troubleshooting
  • Interact with internal and external IT and non-IT personnel when setting systems and diagnosing problems.
  • Manage a team, set schedules and develop escalation policies for a network operations center

Required skills:

  • Comfortable working on Windows & UNIX operation systems
  • Proficient with backup and disaster recovery plans
  • Experience system building and automation
  • Strong organizational, analytical and problem solving skills
  • Strong ability to multi-task /change focus quickly, ability to deal with unexpected events
  • Strong technical documentation skills

Desired Skills

  • Experience in programming both scripted and compiled languages.
  • Proficient with Microsoft SQL Server, working knowledge of relational database.
  • Knowledge of No-SQL databases
  • Experience with Cloud Services like Azure and Amazon

 

Education and Work Experience

  • Degree in Computer Science or related field experience.
  • 2+ years of experience managing IT

Location, Compensation & Application

Location: This position is full time and based in Stamford, CT. Salary

Compensation: Commensurate with experience.

Application: Send resume and cover letter to: np@trackman.dk

 

About TrackMan Inc.

TrackMan Inc. is a US based subsidiary of TrackMan A/S.

 

TrackMan A/S has developed a range of products for the golf market and is considered the gold standard in measurement of ball flight and swing path. TrackMan’s golf products are used by top touring professionals, teaching pros, broadcasters and governing bodies.

 

TrackMan Inc. is based in Stamford, CT, about 30 miles north of New York City.  TrackMan, Inc. introduced 3D Doppler radar technology to the baseball industry and the technology is now used by more than half of Major League Baseball teams.  TrackMan, Inc. is revolutionizing baseball data by measuring the full trajectory of both the pitch and hit and has been featured in publications such as the New York TimesSports Illustrated and ESPN.

 

 


Job Posting: Software Development Intern, TrackMan Baseball

Software Development Intern, TrackMan Baseball
 
Join our team as a Software Development Intern at TrackMan Baseball, a US based sports technology firm.  You will take on a critical role in a small, fast moving entrepreneurial company that is breaking new ground in sports.
 
In this position, you will be a contributor on the application development team and work on projects that are actively used within and outside of the organization.
 
REQUIREMENTS:
  • Proficiency in an object-oriented programming language such as Python, C#, Java, etc.
  • Ability to work independently and collaboratively
  • Strong attention to detail and ability to work well with others
DESIRED SKILLS AND EXPERIENCE
  • Bachelors or Masters degree in Computer Science or a related field.
  • Strong knowledge of relational and non-relational databases such as SQL and MongoDB
  • Experience working with large baseball related data-sets.
  • R or another scripting language experience is a plus.
This is a great opportunity for someone who wants to break into the baseball community and get experience with data available exclusively to professional baseball teams.  Full training is provided and you’ll have the opportunity to work closely with all members of the TrackMan staff and interface with our partner teams.  Weekend availability is important.
 
To apply, send a resume to np@trackman.dk.  No phone calls please.
 
Compensation:
This is a paid internship.
 
About TrackMan Inc.
TrackMan Inc. is a US based subsidiary of TrackMan A/S.
 
TrackMan A/S has developed a range of products for the golf market and is considered the gold standard in measurement of ball flight and swing path. TrackMan’s golf products are used by top touring professionals, teaching pros, broadcasters and governing bodies.
 
TrackMan Inc. is based in Stamford, CT, about 30 miles north of New York City.  TrackMan, Inc. introduced 3D Doppler radar technology to the baseball industry and the technology is now used by more than half of Major League Baseball teams.  TrackMan, Inc. is revolutionizing baseball data by measuring the full trajectory of both the pitch and hit and has been featured in publications such as the New York TimesSports Illustrated and ESPN.
 
http://www.hardballtimes.com/tht-live/trackman-baseball/
http://www.si.com/more-sports/2011/04/12/fastballs-trackman
http://www.businessweek.com/articles/2014-03-04/major-league-baseball-unveils-an-even-newer-player-tracking-system

The Mariners’ Short Window

The Mariners are in a tough spot.

In 2014, the AL West was baseball’s best division. Yes, Oakland mortgaged their future at the deadline. Yes, the Angels minor league system looks weak. Yes, the Rangers aren’t guaranteed to snap back next year and have a healthy, competitive roster. Yes, the Astros aren’t there yet. There will be prominent sports writers picking the M’s to win their division next year and they will likely get bandied about as a dark horse. But…the Mariners have been baseball’s ninth-best club by BaseRuns and only the third-best in their own division. Next year’s A’s and Angels shouldn’t be drastically different, either.

What makes the Mariners situation so tough, though, is their own muddled roster construction. The M’s had a historically good year at preventing runs but still found themselves right on the edge of contending. In large part that’s because they can’t hit, and the biggest reason they can’t hit is that they have only one average or better right-handed bat, Austin Jackson. Aside from Jackson, the M’s outfield has given big chunks of playing time to four different lefties: Dustin Ackley, Endy Chavez, Michael Saunders, and James Jones.

Their biggest hole, however, has been at 1B/DH, and this isn’t a new thing for the M’s. Last year they received solid production from Kendrys Morales and an average campaign from Justin Smoak, but neither has been anywhere near effective this year. The only bright spot this year has been Logan Morrison with his wRC+ of 110. In sum, the Mariners actually had a historically terrible year from their DHs, and that was nothing new.

Looking to the minors, there is hope. 2013 1st rounder DJ Peterson has already made his way to AA, but may start 2015 back in Jackson after posting a .261/.335/.473 in 248 PAs. Jackson is a fringe candidate to contribute for a stretch run, but probably won’t be a significant contributor for quite some time. In fact, former Rutgers defensive back Patrick Kivlehan may be contribute to the big league club sooner after crushing AA pitching with a .300/.374/.485 line in 430 PAs.

But things get trickier as we look toward the offseason.

When the Mariners signed Robinson Cano, they rapidly accelerated the timeline for fielding a competitive team. While Cano and Felix will still be around when Peterson and 2014 1st rounder Alex Jackson are, theoretically, contributing to the big league club, neither is likely to be better than they are now. Both have had incredible seasons, but realistically both players can only get worse.

The window gets even shorter when you consider that Hisashi Iwakuma, Austin Jackson, and Fernando Rodney will be eligible for free agency after the 2015 season. Couple them with Felix, Cano, and a cost-controlled Kyle Seager, and the M’s, who should have about $20 million in budget flexibility next year after arbitration raises, might be best poised to try and seriously compete next season.

Any big trade or free-agent splash, however, is going to block playing time, and if that sounds like a familiar situation for this club, that’s because it is. When they signed Cano, it gutted Nick Franklin’s value, and it took the Jack Z almost eight months to make a trade.

The best place for the M’s to look would be for a bat-first, right-handed outfielder who can platoon with Michael Saunders and play DH against righties. Torii Hunter would be a great fit, although he alone probably wouldn’t be enough. Manager Lloyd McClendon has repeatedly referred to the need for two bats.

The M’s also could try and use their prospect surplus and to try and land a more impactful player. In Brad Miller and Chris Taylor the M’s have two capable (if not quite good) shortstops at the big-league level, and there had reportedly been lots of interest in Dustin Ackley at the trade deadline even before his strong second half. It wouldn’t be surprising to see the M’s try and lock up with Dodgers for Matt Kemp (with a lot of swallowed salary) or the Red Sox for a piece of their crowded outfield. Shane Victorino would be a great fit on the M’s and could be out of a job. In DJ Peterson, Taijuan Walker, and James Paxton, the M’s also have chips to land a guy like Yoenis Cespedes, but Jack Z has (wisely) shied away from moving a piece of that caliber.

But if the M’s stand pat, they probably won’t be good enough next year. Chris Young may not be a good a pitcher, and regardless he will be looking for a raise and will likely be elsewhere next season. The M’s don’t have much depth behind what still looks to be a strong group in Felix, Iwakuma, Roenis Elias, James Paxton and Taijuan Walker. As stands right now, their 2015 DH is Logan Morrison and their first baseman is Justin Smoak, but the M’s will have to choose between a $3.6M team option and a $200k buy-out, and his Mariners days are probably over.

The M’s could write off Kendrys Morales’ 2014 struggles as a result of missing spring training, but his batted ball distance in August and September is down 12 feet from last year, and generally follows what is known of the aging curve for first basemen. Kendrys’ power, at this stage, is probably in the 15-20 home run range, and along with his 49% GB rate, terrible base running, and mediocre defense, that’s not a strong package. What all this means is that, just like last winter, Kendrys will probably look for a lot more than he’s worth, and it wouldn’t be a good gamble for the M’s to be the ones to pay him, even if it’s only a couple million.

In 2018, when the Mariners will theoretically feature DJ Peterson, Alex Jackson, Taijuan Walker and James Paxton in their primes, Oliver thinks Cano will be worth 2.8 WAR. On the plus side, Felix will still only be 32 years old and, theoretically, just beginning his decline phase). Kyle Seager will be eligible for free agency after the 2017 season, so he will either be gone, expensive, or not very good. And even without Seager, the M’s have $50 million committed to Cano and Felix.

As a Mariners fan, it’s been a blessing to watch Cano this year after so many years of offensive mediocrity, but this is the predicament the Mariners have put themselves into with his signing. The M’s were supposed to be about .500 club this year, and even if you look optimistically at their improvement, put faith in Brad Miller breaking out next year, and call Ackley and Morrison’s strong second halves improvement rather than streaks, this club still needs some work.

And, from the looks of things, the Mariners are going to hurt themselves no matter what road they take. Spend now, and they inhibit playing time and take away from extensions for guys like Seager and Paxton. Trade now and they potentially strike out big. The most likely course is that pursue players like Delmon Young and Michael Cuddyer hoping for a big year. Jack Z has repeated played the high-risk, low cost card for his clean-up hitters, from Russell Branyan to Milton Bradley to, more recently, Corey Hart and Kendrys Morales. Jack Z has said the M’s will be reasonably aggressive pursuing free agents this winter, but even money may not be enough lure talent to the northwest.

While a 2015 Mariners club with Melky Cabrera and Victor Martinez would be a legitimate contender, and the M’s are flush in TV cash right now, Seattle was a hard sell even after their 116 win season in 2001. Team president Kevin Mather places the blame on the M’s tough travel schedule, but the (at least historically) tough hitting environment, cold and wet weather, and reported organizational dysfunction likely don’t help matters either.

In 2014 the M’s both have led the league with increase in attendance and have failed to sell out important September games. This is club that needs just a little bit more oomph. A 2018 Mariners club with Cano, Melky Cabrera, and Victor Martinez, however, probably isn’t very good though. The 2014 trade deadline had been labeled as make-or-break for Jack Z, and this coming winter won’t be any different.


A Discrete Pitchers Study – Predicting Hits in Complete Games

(This is Part 2 of a four-part series answering common questions regarding starting pitchers by use of discrete probability models.  In Part 1, we dealt with the probability of a perfect game or a no-hitter. Here we deal with the other hit probabilities in a complete game.)

III. Yes! Yes! Yes, Hitters!

Rare game achievements, like a no-hitter, will get a starting pitcher into the record books, but the respect and lucrative contracts are only awarded to starting pitchers who can pitch successfully and consistently. Matt Cain and Madison Bumgarner have had this consistent success and both received contracts that carry the weight of how we expect each pitcher to be hit. Yet, some pitchers are hit more often than others and some are hit harder. Jonathan Sanchez had shown moments of brilliance but pitch control and success were not sustainable for him. Tim Lincecum had proven himself an elite pitcher early in his career, with two Cy Young awards, but he never cashed in on a long term contract before his stuff started to tail off. Yet, regardless of success or failure, we can confidently assume that any pitcher in this rotation or any other will allow a hit when he takes the mound. Hence, we should construct our expectations for a starting pitcher based on how we expect each to get hit.

An inning is a good point to begin dissecting our expectations for each starting pitcher because the game is partitioned by innings and each inning resets. During these independent innings a pitcher’s job is generally to keep the runners off the base paths. We consider him successful if he can consistently produces 1-2-3 innings and we should be concerned if he alternately produces innings with an inordinate number of base runners; whether or not the base runners score is a different issue.

Let BR be the base runners we expect in an inning and let OBP be the on-base percentage for a specific starting pitcher, then we can construct the following negative binomial distribution to determine the probabilities of various inning scenarios:

Formula 3.1

If we let br be a random variable for base runners in an inning, we can apply the formula above to deduce how many base runners per inning we should expect from our starting pitcher:

Formula 3.2

The resulting expectation creates a baseline for our pitcher’s performance by inning and allows us to determine if our starting pitcher generally meets or fails our expectations as the game progresses.

Table 3.1: Inning Base Runner Probabilities by Pitcher

Tim Lincecum

Matt Cain

Jonathan Sanchez

Madison Bumgarner

P(O Base Runners)

0.333

0.352

0.280

0.356

P(1 Base Runner)

0.307

0.310

0.290

0.311

P(≥2 Base Runner)

0.360

0.338

0.430

0.333

E(Base Runners)

1.326

1.250

1.586

1.233

Based upon career OBPs through the 2013 season, Bumgarner would have the greatest chance (0.356) of retiring the side in order and he would be expected to allow the fewest base runners, 1.233, in an inning; Cain should also have comparable results. The implications are that Bumgarner and Cain represent a top tier of starting pitchers who are more likely to allow 0 base runners than either 1 base runner or +2 base runners in an inning. A pitcher like Lincecum, expected to allow 1.326 base runners in an inning, represents another tier who would be expected to pitch in the windup (for an entire inning) in approximately ⅓ of innings and pitch from the stretch in ⅔ of innings. Sanchez, on the other hand, represents a respectively lower tier of starting pitchers who are more likely to allow 1 or +2 base runners than 0 base runners in an inning. He has the least chance (0.280) of having a 1-2-3 inning and would be expected to allow more base runners, 1.586, in an inning.

As important as base runners are for turning into runs, the hits and walks that make up the majority of base runners are two disparate skills.  Hits generally result from pitches in the strike zone and demonstrate an ability to locate pitches, contrarily, walks result from pitches outside the strike zone and show a lack of command.  Hence, we’ll create an expectation for hits and another for walks for our starting pitchers to determine if they are generally good at preventing hits and walks or prone to allowing them in an inning.

Let h, bb, and hbp be random variables for hits, walks, and hit-by-pitches and let P(H), P(BB), P(HBP) be their respective probabilities for a specific starting pitcher, such that OBP = P(H) + P(BB) + P(HBP). The probability of Y hits occurring in an inning for a specific pitcher can be constructed from the following negative multinomial distribution:

Formula 3.3

We can further apply the probability distribution above to create an expectation of hits per inning for our starting pitcher:

Formula 3.4

For walks, we do not have to repeat these machinations.  If we simply substitute hits for walks, the probability of Z walks occurring in an inning and the expectation for walks per inning for a specific pitcher become similar to the ones we deduced earlier for hits:

Formula 3.5

We could repeat the same substitution for hit-by-pitches, but the corresponding probability distribution and expectation are not significant.

Table 3.2: Inning Hit Probabilities by Pitcher

Tim Lincecum

Matt Cain

Jonathan Sanchez

Madison Bumgarner

P(O Hits in 1 Inning)

0.457

0.466

0.439

0.443

P(1 Hits in 1 Inning)

0.315

0.314

0.316

0.316

P(2 Hits in 1 Inning)

0.145

0.141

0.152

0.150

P(3 Hits in 1 Inning)

0.056

0.053

0.061

0.060

E(Hits in 1 Inning)

0.896

0.870

0.947

0.936

The results of Table 3.2 and Table 3.3 are generated through our formulas using career player statistics through 2013. Cain has the highest probability (0.466) of not allowing a hit in an inning while Sanchez has the lowest probability (0.439) among our starters. However, the actual variation between our pitchers is fairly minimal for each of these hit probabilities. This lack of variation is further reaffirmed by the comparable expectations of hits per inning; each pitcher would be expected to allow approximately 0.9 hits per inning. Yet, we shouldn’t expect the overall population of MLB pitchers to allow hits this consistently; our the results only indicate that this particular Giants rotation had a similar consistency in preventing the ball from being hit squarely.

Table 3.3: Inning Walk Probabilities by Pitcher

Tim Lincecum

Matt Cain

Jonathan Sanchez

Madison Bumgarner

P(O Walks in 1 Inning)

0.685

0.718

0.589

0.776

P(1 Walk in 1 Inning)

0.244

0.225

0.286

0.189

P(2 Walks in 1 Inning)

0.058

0.047

0.093

0.031

P(3 Walks in 1 Inning)

0.011

0.008

0.025

0.004

E(Walks in 1 Inning)

0.404

0.351

0.580

0.264

The disparity between our starting pitchers becomes noticeable when we look at the variation among their walk probabilities. Bumgarner has the highest probability (0.776) of getting through an inning without walking a batter and he has the lowest expected walks (0.264) in an inning. Sanchez contrarily has the lowest probability (0.589) of having a 0 walk inning and has more than double the walk expectation (0.580) of Bumgarner. Hence, this Giants rotation had differing abilities targeting balls outside the strike zone or getting hitters to swing at balls outside the strike zone.

Now that we understand how a pitcher’s performance can vary from inning to inning, we can piece these innings together to form a 9 inning complete game. The 9 innings provides complete depiction of our starting pitcher’s performance because they afford him an inning or two to underperform and the batters he faces each inning vary as he goes through the lineup. At the end of a game our eyes still to gravitate to the hits in the box score when evaluating a starting pitcher’s performance.

Let D, E, and F be the respective hits, walks, and hit-by-pitches we expect to occur in a game, then the following negative multinomial distribution represents the probability of this specific 9 inning game occurring:

Formula 3.6

Utilizing the formula above we previously answered, “What is the probability of a no-hitter?”, but we can also use it to answer a more generalized question, “What is the probability of a complete game Y hitter?”, where Y is a random variable for hits. This new formula will not only tell us the probability of a no-hitter (inclusive of a perfect game), but it will also reveal the probability of a one-hitter, three-hitter, etc. Furthermore, we can calculate the probability of allowing Y hits or less or determine the expected hits in a complete game.

Let h, bb, hbp again be random variables for hits, walks, and hit-by-pitches.

Formula 3.7

Formula 3.8

Formula 3.9

The derivations of the complete game formulas above are very similar to their inning counterparts we deduced earlier. We only changed the number of outs from 3 (an inning) to 27 (a complete game), so we did not need to reiterate the entire proofs from earlier; these formulas could also be constructed for an 8 inning (24 outs), a 10 2/3 inning (32 outs), or any other performance with the same logic.

Table 3.4: Complete Game Hit Probabilities by Pitcher using BA

Tim Lincecum

Matt Cain

Jonathan Sanchez

Madison Bumgarner

P(O Hits in 9 Innings)

0.001

0.001

0.001

0.001

P(1 Hit in 9 Innings)

0.006

0.007

0.004

0.005

P(2 Hits in 9 Innings)

0.023

0.026

0.017

0.018

P(≤3 Hits in 9 Innings)

0.060

0.067

0.046

0.049

P(≤4 Hits in 9 Innings)

0.124

0.137

0.099

0.105

E(Hits in 9 Innings)

8.062

7.833

8.526

8.420

The results of Table 3.4 were generated from the complete game approximation probabilities that use batting average (against) as an input. Any of the four pitchers from the Giants rotation would be expected to allow 8 or 9 hits in a complete game (or potentially 40 total batters such that 40 = 27 outs + 9 hits + 4 walks), but in reality, if any of them are going to be given a chance to throw a complete game they’ll need to pitch better than that and average less than 3 pitches per batter for their manager to consider the possibility. If we instead establish a limit of 3 hits or less to be eligible for a complete game, regardless of pitch total, walks, or game situation (not realistic), we could witness a complete game in at most 1 or 2 starts per season for a healthy and consistent starting pitcher (approximately 30 starts with a 5% probability). Of course, we would leave open the possibility for our starting pitcher to exceed our expectations by throwing a two-hitter, one-hitter, or even a no-hitter despite the likelihood. There is still a chance! Managers definitely need to know what to expect from their pitchers and should keep these expectations grounded, but it is not impossible for a rare optimal outcome to come within reach.


Progressive Pitch Projections

When examining a batter’s strike zone judgment, the analysis is typically done based on where the pitches passed the plane of the front of the strike zone. However, this analysis usually does not include a discussion of the pitches’ trajectories as they approached the plate, which influences whether or not a batter may choose to swing at a pitch. The aim of this research is to apply a simple model to project a pitch to the plane of the front of the strike zone, from progressively closer distances to home plate, and track how the projected location changes as the pitch nears the plate. In order to quantify the quality of a pitch’s projection as it approaches home plate, we will use a model for the probability of a pitch being called a strike to assess its attractiveness to a batter. While the focus of this will be the projections and results derived from them, a discussion of the strike zone probability model will be given after the main article.

To begin, we can start with a single pitch to explain the methodology. The pitch we will use was one thrown by Yu Darvish to Brett Wallace on April 2nd of 2013 (seen in the GIF below screen-captured from the MLB.tv archives) [Note: I started working on this quite awhile ago, so the data is from 2013, but the methodology could be run for any pitcher or any year].

 photo Darvish_Wallace_P.gif

The pitch is classified by PITCHf/x as a slider and results in a swinging strikeout for Wallace. The pitch ends up inside on Wallace and, based purely on its final location, does not look like a good pitch to swing at, two strikes or not. In order to analyze this pitch in the proposed manner of projecting it to the front of the plate at progressively closer distances, we will start at 50 feet from the back of home plate (from which all distances will be measured) and remove the remaining PITCHf/x definition of movement (as is calculated, for example, for the pfx_x and pfx_z variables at 40 feet) from the pitches to create a projection that has constant velocity in the x-value of the data and only the effects of gravity deviating the z-value from constant velocity. This methodology is adopted from an article by Alan Nathan in 2013 about Mariano Rivera’s cut fastball. At a given distance from the back of home plate, the pitch trajectory between 50 feet and this point is as determined by PITCHf/x, and the remaining trajectory to the front of home plate is extrapolated using the previously discussed method.

If we examine the above Darvish-Wallace pitch in this manner, the projection looks like this from the catcher’s perspective:

 photo Darvish_Wallace_XZ_250ms.gif

In the GIF, the counter at the top, in feet, represents the distance that we are projecting from. The black rectangular shape is the 50% called-strike contour, where 50% of the pitches passing through that point were called strikes, the inside of which we will call our “strike zone” (for a complete explanation of this strike zone, see the end of the article). Within the GIF, the blue circle is the outline of the pitch and the blue dot inside is the PITCHf/x location of the pitch at the front of the plate. The projection appears in red/green where red represents a lower-than-50% chance of a called strike for the projection and green 50% or higher. As one can see, early on, the pitch projects as a strike and as it comes closer to the plate, it projects further and further inside to the left-handed hitter. If we track the probability of the projection being called a strike, with our x-axis being the distance for the projection, we obtain:

 photo Darvish_Wallace_Probability.jpeg

Based on this graph, the pitch crosses the 50% called-strike threshold at approximately 29.389 feet (seen as a node on the graph). With this consideration, and the fact that the batter is not able to judge the location of the pitch with PITCHf/x precision, it seems reasonable that Brett Wallace might swing at this pitch.

We can also examine this from two other angles, but first we will present the actual pitch from behind as another point of reference:

 photo DarvishWallace_C.gif

Now, we will look at an angle which is close to this new perspective: an overhead view.

 photo Darvish_Wallace_XY_250ms.gif

The color palette here is the same as the previous GIF (blue is the actual trajectory in this case and red/green is as defined above) with the added line at the front of home plate indicating the 50% called-strike zone for the lefty batter. Note that since the scales of the two axes are not the same, the left-to-right behavior of the pitch appears exaggerated. The pitch projects as having a high probability of being called a strike early on and around 30 feet, starts to project more as a ball.

From the side, the pitch has nominal movement in the vertical direction, and so the projection appears not to move. However, the color-coding of the projected pitch trajectory shows the transition from 50%+ called-strike region to the below-50% region.

 photo Darvish_Wallace_YZ_250ms.gif

With this idea in mind, we can apply this to all pitches of a single type for a pitcher and see what information can be gleaned from it. We will break it down both by pitch type, as identified by PITCHf/x, and the handedness of the batter. We will perform this analysis on Yu Darvish’s 2013 PITCHf/x data and compare with all other right-handed pitchers from the same year.

To begin, we will examine Yu Darvish’s slider, which, according to the data, was Darvish’s most populous pitch in 2013. Since we are dealing with a data set of over 1000 sliders, we will first condense the information into a single graph and then look at the data more in-depth. We will separate the pitches into four categories based on their final location at the front of the strike zone: strike (50%+ chance of being called a strike) or ball (less than 50%), and swing or taken pitch. We will take the average called-strike probability of the projections in each of these four categories and plot it versus distance to the plate for the projection.

For left-handed batters versus Darvish in 2013:

 photo Darvish_ST_BS_SL_LHB.jpeg

The color-coding is: green = swing/strike, red = take/strike, blue = swing/ball, orange = take/ball. Looking at just pitches that are likely to be called strikes, the pitches swung at have a higher probability of being called strikes throughout their projections, peaking at the node located at 12.167 feet (0.928 average called-strike probability for the projections) for swings and at 1.417 (0.91), the front of home plate, for pitches taken. The swings at pitches in the strike zone end at a 0.924 average called-strike probability. Both curves for pitches outside the strike zone peak very early and remain relatively low in terms of probability throughout the projection.

We can also group all swings together and all pitches taken together to get a two-curve representation.

 photo Darvish_ST_SL_LHB.jpeg

For sliders to lefties, the probability of a called strike is higher throughout the projection for swings compared to sliders taken. Similar to the previous graph, the swing curve peaks before the plate, at 20 feet with a 0.627 average called-strike probability and ends at 0.613, whereas the pitches taken peak at the front of the plate with a called-strike probability of 0.402.

To examine this in more detail, we can look at the location of the projections as the pitches moves toward the plate, similar to the GIFs for the single pitch to Wallace. Using the same color scheme as the four-curve graph, we will plot each pitch’s projection.

 photo Darvish_Pitch_Proj_SL_LHB_250ms.gif

Of interest in this GIF is the observation that most swings outside the zone (blue) are down and to the right from the catcher’s perspective. In particular, based on the projections, there appears to be a subset of the pitches with a strong downward component of movement that are swung at below the strike zone, while most other pitches have more left-to-right movement. In addition, the pitches taken are largely on the outer half of the strike zone to lefties. To better illustrate the progressive contribution of movement to the pitches, we will divide the area around the strike zone into 9 regions: the strike zone and 8 regions around it: up-and-left of the zone, directly above the zone, up-and-right of the zone, directly left of the zone, etc. In each of these 9 regions, we will display the number of swings and number of pitches taken as well as the average direction that the projections are moving as more of the actual trajectory is added in, or in other words, the direction that the movement is carrying the pitch from a straight line trajectory, plus gravity, in the x- and z-coordinates.

 photo Darvish_Pitch_Proj_Gp_SL_LHB_250ms.gif

Note that the movement of the pitches is predominately to the right, from the catcher’s perspective, with some contribution in the downward direction. In the strike zone, the pitches taken have an average location to the left of those swung at. This may be due to the movement bringing the pitches into the strike zone too late for the hitter to react. Computing the percentage of swings in each region produces the following table:

 

Darvish – Sliders vs. LHB
10 25 0
12.9 62.8 12.5
33.3 65.4 49.2

 

From the table, where the middle square is the strike zone, we can see that the slider is most effective at inducing swings outside of the strike zone, which has a better percentage of swings than the strike zone itself (Note that some of these regions may contain small samples, but these can be distinguished by the above GIFs). Next is the strike zone, followed by the region directly down-and-right of the strike zone. Going back to the projections, pitches in the two aforementioned non-strike zone regions start by projecting near the bottom of the strike zone and, as they move closer to the plate, project into these two regions.

Putting these observations in context, the movement on the sliders from Yu Darvish to lefties may allow him to get pitches taken on the outer half of the plate, which is generally in the opposite direction of the movement, and swings on pitches down and inside, in the general direction of the pitch movement. This would signify that movement has a noticeable effect on the perception of sliders to lefties. Also of note is that the pitches up and left of the strike zone have very few swings among them, and those that were swung at are close to the zone. Again using movement as the explanation, the pitches project far outside initially and, as they near the plate, project closer to the strike zone, but not enough to incite a swing from a batter.

We can further illustrate these effects on the pitches outside the zone by treating the direction of the movement at 40 feet, taken from the PITCHf/x pfx_x and pfx_z variables, as a characteristic movement vector and finding the angle of it with the vector formed by the final location of the pitch and its minimum distance to the strike zone. So if the movement sends the pitch perpendicularly away from the strike zone, the angle will be 0 degrees; if the movement is parallel to the strike zone, the angle will be 90 degrees; and if the pitch is carried by the movement perpendicularly toward the strike zone, the angle will be 180 degrees. As an illustrative example, consider the aforementioned pitch from Darvish to Wallace:

 photo SZ_MVMT_Angle.jpeg

In this case, the movement vector of the pitch (red dashed vector) is nearly in the same the direction as the vector pointing out perpendicular from the strike zone (blue vector). This means that the angle between the two is going to be small (here, it is 0.276 degrees). If the movement vector in this case were nearly vertical, lying along the right edge of the zone, the angle would be close to 90 degrees.

Taking the movement for all sliders thrown to lefties in 2013 by Darvish and finding the angle it makes relative to the vector perpendicular to the zone, we get the following hexplot:

 photo Darvish_Out_SL_LHB.jpeg

Summing up the hexplot in terms of a table:

 

Darvish – Sliders Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.8 0.779
Less Than 90 Degrees 67.9 0.691
All X 0.608

 

So 31.8% of the sliders thrown outside the strike zone to lefties had an angle of less than 45 degrees between the movement and the vector perpendicular to the strike zone. The average distance of these pitches from the strike zone was 0.779 feet. Increasing the restriction to less than 90 degrees, meaning that some part of the movement is perpendicular to the strike zone, we get 67.9% of pitches outside met this criterion with an average distance from the zone of 0.691 feet. Finally, for all pitches outside, the average distance was 0.608 feet.

As a point of comparison, for all MLB RHP in 2013, the same analogous plot and table are:

 

 photo MLB_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 25.3 0.652
Less Than 90 Degrees 52.6 0.624
All X 0.606

 

Note that the range of possible angles is 0 to 180 degrees, with 25.3% lying in the 0-45 degree range and 52.6% in the 0-90 degree range. So based on this and examining the hexplot visually, the pitches are fairly uniformly distributed across the range of angles.

Comparing Darvish to other RHP in 2013, he threw his slider more in the direction of movement outside the zone. In particular, for angles less than 45 degrees, he threw his slider an average of 1.5 inches further outside compared to other MLB RHP. That disparity shrinks when restricting to less than 90 degrees and is virtually the same for all pitches outside.

While this observation on its own does not have much significance, we can look to see if this was an effective strategy by looking only at swings and seeing the effects.

 

 photo Darvish_Swing_Out_LHB.jpeg

 

Darvish – Sliders Swung At Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 39.9 0.59
Less Than 90 Degrees 83.2 0.526
All X 0.478

 

Examining both the hexplot and the table, Darvish induced most of his swings outside of the strike zone with pitches having its movement at an angle of less than 90 degrees relative to the strike zone. Note that when the pitch is thrown outside the zone in the general direction of movement (an angle of less than 90 degrees), the pitch can still induce the batter to swing while pitches not thrown in this general direction are only swung at when very close to the zone. In particular, the majority of pitches that reach the farthest outside the zone and still lead to swings are in the range of 30 to 60 degrees. This is due to many of the swings outside the zone being below the strike zone, where the angle with the down-and-to-the-right movement will be in the neighborhood of 45 degrees.

For all MLB RHP in 2013, the hexplot for swings produces a similar result:

 photo MLB_Swing_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Swung At Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.8 0.436
Less Than 90 Degrees 64.3 0.421
All X 0.405

 

From the hexplot, we can see that the majority of pitches swung at are at an angle of 90 degrees or less; 64.3% to be precise. For less than a 45-degree angle, the percentage is 31.8%. These are both up from the percentages from all pitches. As seen with the Darvish data, as the angle decreases, the average distance tends to increase.

Finally, for pitches not swung at outside the zone, we get a complementary result to the swing data:

 photo Darvish_Take_Out_SL_LHB.jpeg

 

Darvish – Sliders Taken Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 26.3 0.976
Less Than 90 Degrees 57.4 0.854
All X 0.696

 

Here, the percentages are lower than for swings and, while the largest distance is for small angles, there is a grouping of pitches present in pitches taken at angles greater than 90 degrees that is virtually nonexistent for swings. So for Darvish, throwing sliders outside the strike zone with an angle greater than 90 degrees does not appear to be a fruitful strategy, unless it plays a larger role in the context of pitch sequencing. To sum up this observation, it would appear that pitching in the general direction of movement outside the strike zone is a necessary but not sufficient condition for inducing swings from left-handed batters.

For MLB right-handed pitchers, this observations appears to still hold:

 photo MLB_Take_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Taken Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 22.1 0.809
Less Than 90 Degrees 46.7 0.765
All X 0.708

 

As with Darvish, the percentages drop when comparing pitches taken to pitches swung at. The hexplot also bears this out, with the largest concentration of pitches taken outside the strike zone having an angle between movement and the strike zone vector of greater than 90 degrees. These results match in general with what we have seen with Darvish, and based on the numbers, Yu Darvish is able to play this effect to his advantage, with a larger-than-MLB-average percentage of sliders outside the zone to lefties with an acute angle.

Next, we will perform a similar analysis on sliders to righties. This will allow for comparison between the effects of the slider on batters from both sides of the plate.

 photo Darvish_ST_BS_SL_RHB.jpeg

Once again, for pitches in the strike zone, the sliders swung at by righties have a higher probability of being called strikes than those taken. The peak for swings at strikes occurs at 18.333 feet (v. 12.167 feet for LHB) with a 0.945 called-strike probability and ending at 0.931, and taken strikes at 13.667 feet (v. 1.417 feet for LHB) with a 0.892 probability and ending at 0.885.

 photo Darvish_ST_SL_RHB.jpeg

Just examining swings and pitches taken, the peak projected probability is earlier than for lefties at 26.25 feet with 0.672 probability and finishing at 0.629. It also peaks earlier for pitches taken, at 23.147 feet with peak and ending probabilities of 0.454 and 0.442, respectively. Comparing with the results for lefties, the RHB both swing at and take sliders with a higher probability of being called strikes, but have an earlier peak probability.

Breaking it down again in terms of the individual pitches:

 photo Darvish_Pitch_Proj_SL_RHB_250ms.gif

The plot here looks similar to that of the lefties. However, the pitches taken in the strike zone (red) appear more evenly distributed. In addition, the swings outside the zone (blue) appear to be more down and to the right and less directly below the strike zone. To confirm these observations, we can again simplify the plot to arrows indicating the direction of movement in each region and the number of each type of pitch in each region.

 photo Darvish_Pitch_Proj_Gp_SL_RHB_250ms.gif

The table below gives the percentage of swings on pitches in each of the nine regions for Yu Darvish’s sliders to RHB:

Darvish – Sliders vs. RHB
4.3 15 16.7
0 54.3 26.7
38.9 42.1 46.3

To confirm the first observation, note that the red arrow (pitches taken) virtually overlaps with the green arrow (pitches swung at) in the strike zone. Examining the table, the value that differs the most, among the reasonably populated regions, is directly below the strike zone (42.1% to RHB v. 65.4% to LHB). One possible explanation for this is that some of the sliders ending up in this region to LHB have a stronger downward component of the movement than for RHB. This can be seen by comparing the two GIFs.

Moving on to the results for the angle between the movement and the strike zone vector, the hexplot is heavily populated by pitches thrown in the direction of movement:

 photo Darvish_Out_SL_RHB.jpeg

Considering the same metrics for interpreting this plot as before:

Darvish – Sliders Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 42.3 0.587
Less Than 90 Degrees 78.9 0.618
All X 0.572

From the table, we see that Yu Darvish threw 42.3% of his sliders to RHB with an angle of less than 45 degrees between the strike zone vector and the movement vector, up from 31.8% to LHB. Nearly 79% of his sliders outside the zone were thrown with an angle less than 90% degrees, again up from 67.9% to lefties. However, the average distance is down across the board as compared to lefties.

As a point of comparison, for MLB righties to right-handed batters, the distribution looks similar to that of Darvish:

 photo MLB_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.6 0.671
Less Than 90 Degrees 62.4 0.664
All X 0.673

Compared to Darvish, MLB RHP tend to throw a lower percentage of sliders with an angle less than 45 and 90 degrees. However, the MLB average distance from the strike zone is greater across the board.

Now, isolating only swings:

 photo Darvish_Swing_Out_RHB.jpeg

Darvish – Sliders Swung At Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 46.8 0.513
Less Than 90 Degrees 86.2 0.558
All X 0.512

For RHB versus LHB, Darvish’s percentages are up, if only by a few percent. The average distance for less than 45 degrees is down from 0.59 feet to LHB but up in the other two cases. This can be seen in the hexplot since the protrusion in the distribution is around 60 degrees rather than being closer to 45 degrees as before.

The 2013 MLB data shows a similar result, with a roughly triangular pattern in the hexplot, where the distance from the strike zone for swings increases as the angle between the strike zone vector and movement vector decreases.

 photo MLB_Swing_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Swung At Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 32.3 0.437
Less Than 90 Degrees 64.8 0.427
All X 0.417

As in the case of lefties, all metrics for Darvish are above MLB-average.

For the sliders taken by right-handed batters:

 photo Darvish_Take_Out_SL_RHB.jpeg

Darvish – Sliders Taken Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 39.8 0.634
Less Than 90 Degrees 74.9 0.656
All X 0.605

For angles less than 45 degrees, the percentage of sliders taken outside is noticeably up, as compared with LHB (39.8% v. 26.3%) as well as for less than 90 degrees (74.9% v. 57.4%). This is not surprising since the distribution for all pitches was markedly different between batters on either side of the plate and, in this case, skewed toward the less-than-90-degrees region. The average distances are, however, down from the case for lefties.

Comparing Darvish to other RHP in 2013, the results are similar:

 photo MLB_Take_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Taken Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.3 0.781
Less Than 90 Degrees 61.3 0.777
All X 0.788

In contrast to MLB RHP, Darvish’s sliders that are taken outside the strike zone are closer to it across the three measures. As before, Darvish’s sliders taken are thrown more in the direction of movement as compared to MLB righties in 2013.

Discussion

When constructing this algorithm, we need to choose a metric by which to group the pitches at each increment. In this case, we are using distance from the back of home plate. While this may be suitable for analyzing a single pitcher, when dealing with multiple pitchers or flipping the algorithm around and using it for evaluating a hitter, the variance in velocity of pitches in between pitchers may have an effect on the results. Therefore, it may be better, for working with multiple pitchers or a hitter, to use time as a metric instead. So rather than tracking the projections as y feet from home plate, we would use t seconds from home plate.

Using this method, with further refinement, we could potentially try to measure quantities such as “late break”. Granted, the PITCHf/x data is restricted to its parameterization by quadratic functions so even if aberrant behavior occurred near the plate, PITCHf/x would not be able to represent it. However if we define late break as x inches of movement over distance y from home plate (or t seconds from home plate), we could hope to quantify it. Based on how we construct the projection, such as including factors other than the PITCHf/x definition of movement, late break could be considered as a difference in perceived position at a distance versus the location at the front of the plate. As seen in the swing/take curves, after a certain distance, the probability of a called strike starts to drop off for Darvish’s sliders, and we could possibly choose, from that point on, to calculate late break for each pitcher. But to do this, we would first have to figure out all elements we wish to use, including movement, to make up pitch perception. As we have seen, for both Darvish and MLB RHP in general, throwing sliders outside of the strike zone in the general direction of movement (with less than a 90-degree angle between the movement vector and the vector perpendicular to the strike zone) elicits swings at a higher rate farther outside the strike zone. In the hexplot for swings, this takes the form of, roughly, a triangular shape of the data which widens in the distance direction as the angle decreases. This can also be seen in the GIFs for the blue pitches (swings outside of the strike zone).

In addition, other elements could be added into this medley for attempting to model a hitter’s perception of a pitch as it approaches the plate. First, one could remove the drag from the movement, leaving it in the projection. Without running the projections, we can see how this would affect the results by looking at how the “movement” differs at 40 feet with and without drag. Pictured below is a subsample of the movement vectors at 40 feet for Darvish’s sliders based on the PITCHf/x definition, in green, and the movement without drag, in blue. The blue vectors are found based on Alan Nathan’s paper on the subject. The dashed red lines connect the same pitch for the different versions of movement. We can see that the movement without drag is larger in magnitude, and in the downward direction and to the right, meaning the projections would start higher and to the left. Comparing the movement vectors with and without drag, the average change in movement for the entire sample is 1.571 inches and the average change in angle between the pairs of vectors is 5.527 degrees. With drag left in the projection and out of the movement, the swing hexplots would likely take a more triangular shape with the angle between the vectors decreasing and shifting the data downward for the pitches outside the zone that were previously moving more laterally.

 photo Darvish_Slider_Movement.jpeg

One could also affect the time to the plate for the pitches as well. As it stands, this approach assumes that the hitters have perfect timing and track pitches using a simple extrapolation approach. If one were to assume that the remaining velocity in the y-direction (toward the plate) was perceived as constant for the pitches, the hitters would be expecting the pitches to arrive faster than they actually are. This would lead to the projections appearing higher, since gravity would have less time to have an effect.

A rather large assumption that we are making is that batters can decouple vertical movement from gravity. Even in cases where the vertical movement is small, this will have an effect on the projected pitch location. This may also serve as an explanation as to why the sliders swung at below the strike zone do not always have a strong vertical component of movement.

Next time, we will look at Darvish’s four-seam fastballs, followed by his cut fastballs, in a similar manner. As we will see, certain pitches excel at inducing swings outside the strike zone when thrown in the general direction of movement while others show little to no benefit at all. We can also break down the pitches swung at by the result (in play, foul, swing-and-miss) to gain further insight.

Strike Zone Analysis

This section explains the calculation and choice of model for the probability of a called strike used in the above analysis. There have been a lot of excellent articles analyzing the strike zone, such as by Matthew Carruth, Bill Petti, and Jon Roegele, among others, and this method is derivative of those previous works. Our goal is the create an explicit piecewise function that reasonably models the probability that a pitch will be called a strike, based on empirical data. However, rather than treat the data as zero-dimensional (no height, width, or length for each datum), we represent each pitch as a two-dimensional circle with a three-inch diameter. Then, over a sufficiently refined grid, we calculate the number of 2D pitches that intersected each point that were called strikes divided by the number of 2D pitches that were taken (ball or strike). This gives the percentage of pitches that intersected each point that were called strikes. This number provides an empirical estimate of a pitch passing through that point being called a strike. The advantage of taking this approach is that we do not impose any a priori structure on the data, which can happen when using methods such as binning or model fitting to the zero-D data. It also conforms with using a 2D strike zone to perform the analysis by representing the data fully in 2D. Note that since using all MLB data from 2013 to generate these plots, we have a large enough data set that we do not get jumps or discontinuities for the strike zone that may occur for smaller data sets, such as for a single pitcher. As an example, the called-strike probability for LHB in 2013 looks like:

 photo SZ_Heat_LHB-1.jpeg

The colormap on the right gives the probability of a pitch at each location being called a strike, based on the data. The solid rectangle represents the textbook strike zone (with 1.5 and 3.5 vertical bounds), and the two dashed lines will be explained concurrently with the model.

For the model, we assume a small region where the probability of a called strike is essentially 1, which, in the graph, is the long-dashed line. Far outside the strike zone, will assume that the probability that a pitch is called a strike is essentially zero. In between, we need a way to model the transition between these two regions. To do this, we will adopt a general exponential decay model of the form exp(-a x^b), where a and b are parameters. In this case, we take x to be the minimum distance to the probability-1 region of the strike zone (long-dashed line). Since there is some flexibility in how we choose the probability-1 region and the subsequent parameters, we will do this less rigorously than could be done in order to keep things simple.

First we examined slices of the empirical data in profile and found that experimenting with the probability-1 region bounds and a, b values, a value around 4 for b worked well at matching the curvature. Then a choice of a equal 4 was found similarly via guess-and-check. Finally the probability-1 region was adjusted to make the model match the data based on a contour plot for each (see below). For lefties, the probability-1 region is [-0.55,0.25] x [2.15,2.85] feet.

 photo SZ_Contour_LHB.jpeg

Note that we do a decent job of matching the contours outside of the lower-right and upper-left regions, where there is some deviation. This can be adjusted for by changing the shape of the probability-1 area, but this increases the complexity of calculating the minimum distance. When plotting the model for the probability:

 photo SZ_Heat_LHB_Approx.jpeg

Here, the solid and long-dashed lines are as before, and the dotted line is the 50% called-strike contour from the model, which is used as the boundary of the strike zone in the above analysis. While the shape of the strike zone may seem unconventional, it is a natural approach for handling the zero-dimensional PITCHf/x data. For example, if we place a pitch on the edge of the rectangular textbook zone, a so-called borderline pitch, and track the path that the center would make as it moved around the rectangle, it would trace out a similar shape.

 photo SZAnimation.gif

For RHB, the heat map is much more balanced, left to right, making the fit much closer than could be achieved for LHB.

 photo SZ_Heat_RHB.jpeg

Again, the top and bottom of the 50% called-strike contour lies near 3.5 and 1.5 feet, respectively. Examining the contour map:

Here, the identified contours fit well all around. The called-strike probability, with the model applied, is:

 photo SZ_Heat_RHB_Approx.jpeg

In this case the probability-1 region is [-0.43,0.40] x [2.15,2.83] feet.

So, overall, the RHB called-strike probability model fits much better, especially in the corners, than for LHB. In order to properly fit the called-strike probability to such a model, one would first need to have a component of the algorithm that adjusts the probability-1 area, both by location and size, and possibly by shape. Then the parameters for the decay of the strike probability could be fit against the data. The probability-1 area could then be adjusted and fit again, to see if the overall fit is better. This might work similar to a simulated annealing process. However, for our purposes, sacrificing the corners for LHB seems reasonable to maintain simplicity of method and calculations.

In closing, if you made it this far, thank you for reading to the end.