Archive for April, 2017

Curse of the Giants Bullpen

First game of the season for the Giants, and the bullpen’s falter in the 8th and 9th inning is terrifying. The fear comes from the reminiscence of the ghost of 2016. The addition of Mark Melancon, and departure of the core of the Giants pen, seemed to be the remedy for the expulsion of this ghost, but opening day seemed to tell a different tale.

The new setup man in the 8th inning, Derek Law, came in to relieve Madison Bumgarner, who took the Giants into the 8th with a 4-3 lead that he pretty well mustered up all alone. Law gave up back-to-back singles, before a meeting was called at the mound. Law gave up another single to Paul Goldschmidt, surrendering a run, and the lead. Ty Blach was summoned from the pen for a lefty-lefty match up against Jake Lamb, and he got him to ground out into a double play, and Bruce Bochy then went for his righty-righty matchup with Hunter Strickland against Yasmany Thomas, which ended up in a ground out to get out of the inning with a tie ball game.

My argument is that Bochy’s uncertainty on how he is going to handle his pen is, perhaps, one of the reasons for this supposed curse. Before the season started, the underlining story was that the pen would be fixed by the certainty of roles, as Melancon was the sure closer and this definitive role was going to bring stability to the pen that was not there last season. However, the setup man in the 8th gets banged up for three hits in a row, and Bochy immediately cuts the cord for his matchup ideals. These matchups end up working, and they get out of the inning relatively unscathed. However, it seems that this lack of trust for his relievers to get out of trouble may be one of the reasons the bullpen struggles when the game is tight in the late stages.

Let’s compare to the three other teams who had to pitch in tight situations in the closing stages that same day.

Their opponent, the D-backs:

J.J. Hoover comes in at the top of the 8th with his club down by one. With one out, he walks Buster Posey and allows Brandon Crawford to single. Torey Lovullo allows Hoover to get himself out of danger to end the 8th.

Fernando Rodney comes into the 9th with the game tied, and immediately gets hit for a triple. He gets a sac fly for his first out, but allows a run, to give the Giants a lead. He then allows a single, throws a wild pitch, walks Brandon Belt, throws a wild pitch, and walks Hunter Pence. Instead of pulling him after a mound visit, Lovullo allows Rodney to work out of his trouble, and Rodney gets a fly out and a ground out to end the inning.


Bottom of the 8th, down by a run, Joe Maddon uses Pedro Strop. First hitter he sees, he walks, then a pop-up, and then he allows a two-run HR. He then walks his next batter, but finally works his way out of the inning with back-to-back ground outs. Maddon uses Mike Montgomery in the bottom of the 9th of a tied game. He allows a one-out double, and Maddon comes out to talk him through the inning. He intentionally walks Yadier Molina to set up a possible inning-ending double play. He gets a K but then walks Kolten Wong, and is eventually led to his loss by a line drive to left field by Randal Grichuk.


After doing a good job getting the final two outs of the 8th, Seung Hwan Oh was asked to close out the top of the ninth. He hits Ben Zobrist with a pitch, Ks Addison Russell, then is hit for a single and hit for a three-run HR, but then closes out the inning with a K and a pop-up.

You could argue here that the D-Backs and Cards just won because of their scoring output in the 9th, and that the Cubs had the same fate as the Giants. However, what I am trying to argue is that the short leash that Bochy demonstrated in the 8th is an outlier to the other three managers, and perhaps, may be an element that has been driving the curse of the bullpen.

Bochy’s tactics get really twisted as he allows Melancon the long leash to try and work his way out of danger in the 9th. Presumably because Melancon is the undisputed closer, and he had two outs in the inning. However, it seems like the stability of the bullpen becomes unraveled as soon as the short leash is initiated in the 8th.

If Bochy believes the curse was created from the instability of not having a definitive closer, than perhaps it is also the instability of definitive roles in the pen. If he believed that Law deserves the 8th inning setup role over Strickland, then he should stick to his guns and let Law pitch out of the 8th inning. (Still not sure how Matt Cain got the fifth spot over Blach.) If he lets up and wants to shuffle up the roles for the next game, then so be it, but the shift from short leash to long leash, concrete roles to matchup roles, all seem to be unbalancing to the pen.

Nothing is more evident of this than the series against the Cubs last October. Game 3, Bochy lets Sergio Romo finish up his work in the 9th, but not before Romo had given up two runs and allowed the Cubs to tie. The Giants would end up winning this game. Game 4, on the other hand…up 5-2 in the 9th, Bochy uses Law, who immediately allows a single and is pulled for Javier Lopez. Lopez walks Anthony Rizzo and is pulled for Romo. Romo allows a double and is pulled for Will Smith. Smith allows a single, and then gets the first out on Jason Heyward’s bunt. He is then pulled for Strickland, who allows a single, but then ends the inning with a double play. The Giants end their season with a monumental bullpen collapse in the 9th inning.

This short-leash/ r-r l-l matchup tactic that Bochy sometimes uses, and sometimes does not, seems to have a role in the haunting of this Giants pen. While last year he never had the luxury of that star closer, and definitely does not have the likings of a Clippard-Betances-Chapman bullpen, I think Bochy does fare better when he allows his bullpen to settle into roles with a margin of error. Moreover, the Giants have a great bullpen of Strickland-Law-Melancon and supporting cast. However, the bullpen probably fares better when the question mark of that order disappears and the setup men have the chance to play out their roles.

Hell, we are one game into the season and do not know if the bullpen is still cursed, but if it is, perhaps the curse is caused by the handling of the pen, and not the skill within it.

An Opening Day Overreaction: Jose Ramirez the MVP Candidate

Jose Ramirez broke out for the Indians last season. Long seen as just a placeholder for Francisco Lindor, Jose hit well enough all year to keep getting starts in a utility role, and eventually moving up to become the full-time third baseman for Cleveland. Ramirez derived the majority of his value from an elite contact rate, and excellent base-running, swiping 22 bags and hitting for an average well over .300. All in all, Jose was worth nearly five wins above replacement. There has been a lot of speculation as to whether he can repeat his huge breakout last year, if we may have already seen his peak, or maybe, he’s just getting started and there are even greater things to come.

Looking at Jose’s numbers from last season, I can see three areas for improvement. First off is defense. According to defensive runs above average, Jose was only worth 0.5 throughout the 2016 season. Jose is a shortstop by training; however, he spent most of the first half of the season bouncing around positions in a utility role, before landing at third base full-time. Considering this and the fact that he has rated out as a plus defender in past seasons, I think it is safe to project an improvement here with a more consistent role.

Second is his walk rate. Jose walked in 7.1% of his plate appearances in 2016. To compare, in 2015, as well as in his Triple-A career, his walk rate hung right around 9%, so there is possibly some room for improvement there as well.

The final area for improvement is Jose’s home-run hitting. Despite recording 60 XBHs last year, Jose only left the yard 11 times. It is very difficult to put up an MVP-quality season with lower-end HR numbers. Since 2011, there have been 29 positional-player seasons worth 7+ WAR, and every single one of them included over 20 HRs.

So what will Jose look like in 2017? It’s hard to tell, unless of course you decide to overreact to this week’s opening game, in which case…


Jose batted four times in the opener, and he had one walk and one HR. THE TWO THINGS HE NEEDED TO GET BETTER AT!!! Now obviously this article is a bit tongue-in-cheek, and a sample size of one game means VERY little. The walk especially tells us just about nothing. You give me 4 PA in a major-league game and I might even luck into a walk. However, there is reason to take note of the home run.

Prior to 2017, Jose Ramirez had hit 19 home runs in the majors. His previous best exit velocity was 107.8 MPH on a HR. The longest HR of his career had traveled 437 feet. Jose’s HR in game number one left his bat at 109.3 MPH and traveled 447 feet.

So MAYBE this is a hint that Jose has added some power since last season. That would be a reason to get excited. If you take Jose’s 2016 numbers, then bump him to a 9% walk rate, 22 HRs, and plus defensive value, and even account for a few points of BABIP regression, he’s a 7-8 WAR player, and looks real similar on paper to Mookie Betts.

So, if we overact to opening day, this would make Jose a legitimate star and MVP candidate. His season will be extremely exciting to follow, although in the end probably overshadowed by Madison Bumgarner’s race to 60 dingers.

Berrios and Beer

Beerrios! That’s a way better title, but we’ll stick with the original. So I’m spending my Saturday brewing a batch of beer and dealing with some pitchFx data. If everything goes well, you’re going to get some baseball info and some brewing highlights. But also Happy Opening Week! It’s the greatest time of the year.

All right, let’s deal with some baseball first. Jose Berrios had a pretty brutal 2016 with the big club — all in all, he started 14 games and rattled off a 3-7 record with an 8.02 ERA, an ugly 1.87 WHIP, and a not-top-of-the-rotation strikeout rate of 7.6 K/9. His FIP and xFIP were better but still not great at 6.20 & 5.64. His BABIP certainly didn’t help his numbers, sitting at 0.344, but that alone can’t explain how truly atrocious his numbers looked in his first taste of the big leagues. I’m going to use the pitchRx package to look at pitching data from 2016 and see if we can figure out what went wrong and how we can fix it.

All right, now on to the beer portion. Today I’m making my Deep Lake Dark Lager. Behind every great beer there is a great story. This story begins when I worked out at a remote research camp and fridge space was not reserved for amateur beer-making. A key process in lagering a beer is fermenting the beer at low temperatures, which is why I mentioned the fridge space. We got around lagering our beer in the fridge by putting the fermenting beer into a keg and dropping the keg into a lake to about 15 meters deep (~49 feet). At this depth the temperature was steady at about 5°C (41°F). A little tip for any newcomers to the brewing community: Your beer needs to maintain command, something Berrios couldn’t do. Zing!

If anyone wants to take a quick look at some gifs of Berrios’ sick curveball (and other pitches), check them out here:

I tried to find a comp for Berrios related to pitch velocity, and if we ignore his slider, Jacob deGrom comes out looking like a pretty good match. Here are how their pitch velocities line up using 2016 pitchFx data. I know it’s not a good idea to exclude one of deGrom’s best pitches, but I’m more interested in consistency between starts.

Velocity Comparison – Berrios vs DeGrom
Name Four-Seam Two-Seam Curveball Change-up
Jacob DeGrom 93.4 93.3 80.4 85.5
Jose Berrios 93.4 93.2 81 84.7

Just eye-balling, they look pretty good. Let’s take a look at pitch velocity by start.

Just looking by eye, it’s hard to tell if you could consider one guy more consistent than the other. But obviously we might be able to give deGrom the benefit of the doubt here, since he was pitching with scar tissue or bone spurs in his elbow. Either way, he was pitching in discomfort. There is one thing that catches my attention, though — it’s those last 11 starts by Berrios, and you can see his change-up velocities start to sneak up from ~83.5 to 86 MPH. The unfortunate thing is that there is no concurrent increase in four-seam (FF) or two-seam (FT) velocity. Near the end of the season Berrios was trying to complement his fastballs with a change-up that had a really poor velocity difference. Let’s check that out in a bit more detail.

Okay, you give that plot a bit more thought. My timer just went off and I’ve got to go sparge the grains. I converted a five-gallon water cooler into a mash tun for steeping my brews, which works awesome, because it’s insulated so it holds the heat really well. The aspect I really love about this beer is that is has a really light lager taste, but it has a nice dark colour, which makes it a great spring beer. And to get this effect in your beer is really simple. For the 45 minutes where you are steeping your grains, only add the light grains, then just before you sparge, throw on your dark grains (in my case carafa III). That way, as you sparge you get the colour from the dark grains and none of the taste. Yaaaay beer. Everything is all sparged and now I’ve got to bring the wort to a nice rolling boil.

All righty, let’s discuss that velocity difference. I’d say there is a similar trend from his early-season call-up and his late-season starts as well. He starts out with a pretty decent velocity difference, but with each start that difference gets smaller and smaller. Especially in the second half of the season — he started out with a fantastic velocity difference. That combination should have led to some really effective pitches, but as we move into September and October those pitches are starting to look more and more similar and their effectiveness all but disappears.

Back to brewing — boiling achieved! I’ve got to let the liquid boil down for a couple hours, so through the magic of the internet, let’s fast-foward to the next step. And what a fantastic surprise, the battery in my scale is dead and of course it’s some weird specialized kind that I don’t have on hand. Well luckily I wrote the weight on the bags when I packed the hops last fall so I’m going to eyeball it and hop(e) for the best. At least now I can honestly say I can never reproduce this batch, but that’s part of the fun. So I figure I added about an ounce of hops; we’ll really never know. I’ve got to let that boil for another 40 minutes then add the flavour hops and some irish moss.

I think we can agree that this was not a season marked by consistency for Jose Berrios. But I was curious as to how his release point affected the velocity of each pitch. For all of the data presented here, I used the pitchRx package to download and store the 2016 pitchFx data. In the pitchFx data, you can pull out the release point for each pitch recorded throughout the season. Using this data, I created a general additive model using the bam() function for the R peeps out there, and within the bam function I modeled pitch velocity and pitch break separately using a Gamma link. I like to use the Gamma link because, in a not very sciencey description, it’s very flexible and fits a wide range on models. So first, a couple of notes; 1) There are two plots coming up; the first predicts pitch velocity and the second predicts pitch break length (movement). 2) Pay attention to the prediction window, the coloured box, for each pitch. And 3) These models were only run on Berrios.

And pitch break (break_length):

You can tell that velocity and break length change with different release points. I mean, there is a pretty complicated relationship with how his release point affected both the pitch break and the velocity, and I’m not really sure what the sweet spot actually is. His change-up velocity plot has a really nice faded red area sort of right in the middle of the prediction grid. This area represents roughly an 84.5 MPH change-up which would complement his 93 MPH fastball quite nicely, but unfortunately his arm slot seems to be drifting along an axis which we will get to in a second. Did you happen to notice how the coloured boxes moved slightly among pitches?

So remember how Berrios was apparently tipping his pitches this past year? Well, if not, check this out. So the way he was delivering the ball basically gave the batters a full view of what was coming. I mean, I know I don’t possess the ability to spot small things in deliveries and assess pitches. I watched gifs of Berrios throwing all of his pitches over and over many times and I can’t pick anything up. But I am sure that there are players out there who can pick up those minor details. So I’m thinking there may be more to this than just how he started his wind-up, and where he was releasing the ball was also giving batters a clue as to what was coming. Check out this plot showing how Berrios and deGrom released their pitches.

So you’re probably wondering what’s going on there. Each ellipsoid represents a different pitch, curveballs in blue, 2-seamers in orange or orange-red etc. Each ellipsoid contains 95% of the pitches thrown for each pitch type. Generally deGrom releases the ball about a foot over in comparison to Berrios, but that’s not what it important. What’s important is how each pitcher’s change-up overlaps with their respective fastballs. deGrom has remarkable consistency to throw both types of fastballs and his change-up, and the ellipsoids are basically completely overlapping. Right away we can see that something is going on with how Berrios is releasing his change-up. It only overlaps with about half of his fastball release points, but his arm angle also seems to be drifting, and you can see the ellipsoid is stretched one direction. So I’m guessing he’s not only tipping his pitches in his wind-up, but there is also some release-point trouble happening here that no doubt some hitters are able to pick up on.

Final update on the beer: I added in the flavour hops and irish with about five minutes left and took everything off the heat. Luckily, it’s still a bit cold here so I left the beer outside to cool for a couple hours to get it down to room temperature so I could pitch the yeast. And fast-forward a couple of days…I let the yeast start the fermentation process at room temperature for a couple of days, then moved it into a fridge. I’ll leave it there for about three weeks, transfer the beer to a keg, and then it’s basically ready to drink!

Thanks for sticking it out to the end; I hope you enjoyed “Beerrios.” This ended up having a lot more deGrom in it than initially planned, but I think it was a good comparison to include. I think we were able to successfully identify a couple serious flaws from Jose Berrios’ debut season, and hopefully he’ll be able to shake that off, work on fixing his mechanics, and take another shot at the majors in 2017. I have a feeling we are going to see him mid-April or early May, and I really hope we get to see what he can do over an entire season. If he can transfer just a fraction of his minor-league success to the majors, we will get to see a pretty dynamic young pitcher, and the Twins have been waiting a long time to get a pitcher of this caliber back into their rotation.

Shut the (Heck) Up About Sample Size

The analytics revolution in sports has led to profound changes in the way in which sports organizations think about their teams, players play the game, and fans consume the on-field product. Perhaps the best-known heuristic in sports analytics is sample size — the number of observations necessary to make a reliable conclusion about some phenomenon. Everyone has a buddy who loves to make sweeping generalizations about stud prospects, always hedging his bets when the debate heats up: “Well, we don’t have enough sample size, so we just don’t know yet.”

Unfortunately for your buddy, sample size doesn’t tell the whole story. A large sample is a nice thing to have when we’re conducting research in a sterile lab, but in real-life settings like sports teams, willing research participants certainly aren’t always in abundant supply. Regardless of the number of available data points, teams need to make decisions. Shrugging about a prospect’s performance, or a newly cobbled together pitching staff, is certainly not going to help the bottom line, either in terms of wins or dollar signs.

So the question becomes: How do organizations answer pressing questions when they either a) don’t have an adequate sample size, or b) haven’t collected any data? Fortunately, we can use research methods from social science to get a pretty damn good idea about something — even in the absence of the all-powerful sample size.

Qualitative Data
Let’s say you’re a baseball scout for the Yankees watching a young college prospect from the stands. You take copious notes about the player’s poise, physical stature, his hitting, fielding ability, and running abilities, as well as his throwing arm power. For instance, you might write things like, “good approach to hitting” and “lacks pure run/throw tool.”

All of these rich descriptions of this player are qualitative data. This observational data from one game of this college player is a sample size of 1, but you’ve got a helluva lot of data. You could look for themes that consistently emerge in your notes, creating an in-depth profile of the prospect; you could even standardize your observations on a scale from 20-80. Your notes help build a full story about the player’s profile, and the Yanks like the level of depth you bring to scouting.

You’ve worked as a scout for a few years, and the Yankees decide to bring you into their analytics department. It’s the end of the 2011 season, and one of your top prospects, Jesus Montero, just raked (.328/.406/.590, in 69 PAs) in the final month of the season. The GM of the Yankees, Brian Cashman, knocks on your door and says that they’re considering trading him. What do you say?

You compile all of Montero’s quantitative stats from the last month of the season and the minors, as well as any qualitative scouting reports on him. Good job. You’ve mixed quantitative and qualitative data to provide a richer story given a small sample of only 69 PAs. You’ve also reached the holy grail of social science research, triangulation, by which you examined the phenomenon from a different angle and, bingo, arrived at the same conclusion that your preliminary performance metrics gave you. Montero is a bum. Trade him, Brian.

Resampling Techniques
It’s four years later and Cashman knocks on your door again (he’s polite, so he waits for you to say, “come in”). It’s early October and you’ve just lost to the Houston Astros in a one-game playoff. Cashman asks you about one of the September call-ups, Rob Refsnyder, who Cashman thinks is “pretty impressive.” You combine Refsnyder’s September stats (.302/.348/.512, in 46 PAs), minor league stats, and scouting reports, but the data don’t point to a consistent conclusion. You’re not satisfied.

A fancy statistical method that might help in this instance is called bootstrapping; it works by resampling Refsnyder’s small 46 PA sample size over and over again, replacing the numbers back into the pool every time you draw another sample. The technique allows you to artificially inflate your sample size with the numbers that you already have. You can redo his sample of 46 PAs 1,000, 10,000, even 100,000 times, seeing each time how he would perform. Based on your bootstrapped estimates, you feel like Refsnyder’s numbers from last year are a bit inflated, but that he’d fit nicely as a future utility guy.

Cashman, who’s still in your office, now wants to know about two pitching prospects who were also called up in the 2015 class: James Pazos (5 IP, 0 ER, 3 H, 3BB, 5.4 K/9, 1.20 WHIP) and Caleb Cotham (9.2 IP, 7 ER, 14 H, 1BB, 10.2 K/9, 1.56 WHIP). If the team can only keep on of these pitchers, who should we keep? Who is better?

Normally you’d use a t-test to make comparisons between the two pitchers, but with such a small sample of innings for each guy, the conclusions wouldn’t be reliable. Instead, you decide to use a Mann-Whitney U test, which is basically the same thing as a t-test, adjusted for small samples. In fact, there’s a whole litany of statistical tests that are adept at handling small sample sizes: Wilcox’s t, Fisher’s exact, Chi-square, Kendal’s tau, and McNemar. You conclude that Pazos is slightly better, and that Cotham might be better suited for the bullpen. Cashman holds on to Pazos and deals Cotham to the Reds in the trade that brings over Aroldis Chapman to the Yankees. You pat yourself on the back.

Questions Need Answering
Having an adequate sample size brings confidence to many statistical conclusions, but it is certainly not a binary prerequisite for analyses. It’s easy for your buddy to watch his hindsight bias autocorrect for his previous wait-and-see approach, but organizations need to answer questions accurately. As amateur analysts and spectators, let’s change the lexicon by changing our methods.