Archive for July, 2014

Changes ZiPS Believes In

Mitchel Lichtman’s projection pieces on hitters and pitchers for the rest of the season were discussed quite a lot last month starting with this.  It is hard when you are rooting for a team, and subsequently its players, not to buy in when someone is doing well or poorly.  So let’s look at the heartless projecting system ZiPS to see if it is actually buying into some of the performances of 2014 so far.

To do this I pulled the 2014 pre-season wOBA projections and compared them to the ZiPS (RoS), rest of season, projections.  If you take the RoS wOBA minus what ZiPS was expecting prior to 2014 you should be able to see which players are now expected to hit significantly better or worse the rest of the way.  Here are the top/bottom-five players:

 photo ZIPSros_zpsebe79a2a.jpg

The bottom five, with the exception of Colvin, have been very disappointing and their respective teams would love even the RoS numbers at this point.  The projection still believes Brown can be an above average offensive player despite his putrid play to this point of 2014, but it is starting to look like Raburn’s age might be catching up to him and Gyorko’s rookie year might have been a mirage.  Schierholtz makes less sense, but he has been so bad that ZIPS can’t ignore it, and he was never a great player to begin with.

Others names of note that are projected to finish the year worse may not be surprising.  Raul Ibanez looks done with eyes and statistics, Jean Segura’s lack of plate discipline has really caught up to him, and Brian McCann may not be aging particularly well despite being a lefty with power in the Yankees’ home park.

There are a lot of players on the positive side, and you can see that the nominal and percent wOBA changes are larger for the improvement group too.  There are 31 players with RoS wOBA at least 5% above their pre-season projection while only 17 projected to be 5% or more worse than expected.  Does this mean that ZiPS is actually an optimist?

The Padres believe in Seth Smith as well, having recently signed him to extension.  He is a righty masher, though they only rarely let him face same-handed pitching.  Victor Martinez is 35 years old and decided to have a renaissance, and may end up with his best hitting season ever.  Baseball is weird.  I’m not sure what to make of Steve Pearce.  He has been around since 2007 without ever accumulating more than 200 PAs, but this season he finally has and the Orioles are making out like bandits.  The other two are what you expect on such a list, young players taking a step forward.  JD Martinez was who I was thinking about when I started this.  I have seen him play several times recently, and he seems to put together a quality plate appearance every time up. Mesoraco, like Martinez, is 26 and has had a huge power spike along with a lot more strike outs to the point where he seems like a different player altogether.

Two Cleveland Indians just missed the top five improvers: Michael Brantley and Lonnie Chisenhall seem to have finally taken a step forward too.  There were two notable Brewers as well.  ZiPS seems to have finally decided to believe in Carlos Gomez and Jonathan Lucroy.

Yes, believing in projections sometimes means we need to temper our enthusiasm when a player we like breaks out or be patient with someone slumping.  It can also be a good way to see when players are truly locking into higher levels of play.  For the older players here it is likely that they will come back to the pre-season projections again next year because Victor Martinez is probably not going to turn into a much better hitter year after year at this age, but for the younger guys we may be starting to see who is taking a step forward.

Using Double-A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.

AA Output

Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.

This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:

Player Organization Age MLB Probability
Mookie Betts BOS 21 100%
Francisco Lindor CLE 20 100%
Gary Sanchez NYY 21 99%
Austin Hedges SDP 21 99%
Alen Hanson PIT 21 99%
Jorge Bonifacio KCR 21 98%
Blake Swihart BOS 22 98%
Kris Bryant CHC 22 93%
Ketel Marte SEA 20 91%
Rangel Ravelo CHW 22 90%
Robert Refsnyder NYY 23 86%
Jake Lamb ARI 23 85%
Jake Hager TBR 21 84%
Darnell Sweeney LAD 23 83%
Joey Gallo TEX 20 82%
Preston Tucker HOU 23 81%
Scott Schebler LAD 23 79%
Kevin Plawecki NYM 23 79%
Cheslor Cuthbert KCR 21 78%
Kyle Kubitza ATL 23 77%
Michael Taylor WSN 23 76%
Christian Walker BAL 23 76%
Ryan Brett TBR 22 75%

Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

Do Rookie Hitters Decline in the Second Half?

Do rookies perform worse after the All-Star break?

My claim over this statement is nonexistent, while the original thought of its occurrence was brought to my attention by Adam Aizer on the CBS Fantasy Baseball Podcast.

My judgment dissuaded, I thought that it would be worth the effort to look into the validity of the statement.

From the perspective of an offensive player, rookies infrequently make enough of an impact in the size of leagues (i.e. 10-team and 12-team leagues) that pedestrian Fantasy Baseball players occupy. For those sizes of leagues that the aforementioned owners participate in, a rookie hitter that is worth owning is either an elite prospect or a player that has preformed beyond their true talent level. As a result, the former is rare, while it would make sense for the latter to regress to their true talent level and is more common than the former. The idea that rookie hitters decline throughout the year is just a misevaluation of the player’s true talent level.

To put another way, it is the same logic that comes into play with a recent event: the Home Run Derby. Players that participate in the Home Run Derby are players that have exceptional first halves, which are often beyond their true talent level. These players often perform worse in the second half than they did in the first half, not because they participated in the monotonous and dated event that has become the Home Run Derby, but because, just like the rookies who perform worse in the second half of the season than the first, they have regressed toward their true talent level; when the rookies regress, they have just regressed to the point where they are not ownable.

The research looks at all player seasons between 1988 and 2013 where a batter was in their first season, had 250 plate appearances in the first half of the season, and had 250 plate appearances in the second half of the season.

Screen Shot 2014-07-20 at 8.48.48 PM

The rookie second half decline and the post Home Run Derby slump intuitively make sense, but intuition does not always bear truth. Through cognitive ease we rationalize that “Swinging that hard for that long throws off your timing”; “A rookie is too young to be able to make it through the long hot summer.”

Because most fantasy leagues are small, the only reason that the common rookie was on our teams to begin with is because they had to play beyond their ability in the first half of the season. The rookie who is on our team right now, unless he is a reputable prospect, is probably a safe bet to decline. But as a whole, we can see that there is no decline in rookie performance based on first half/second half splits.

Our desire to perceive a decline is just our desire to hold onto our ability as talent evaluators. We know that Yangervis Solarte is a great player, and the only reason he hasn’t been able to sustain his performance is because he is rookie that can’t play out the season: common baseball logic. In actuality, Solarte was not as good as some originally thought, and his true talent was never good enough to be on a 10 or 12 team league.


Rookie hitters, as a generalization, are not good enough to play in 10 or 12 team leagues, and, as a generalization, those that do play in ten team leagues regress to their true talent level, which is not valuable enough to be ownable.

Devin Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading him, follow him on Twitter @devinjjordan.

Mike Minor and All the Home Runs

Mike Minor just keeps giving up home runs. To be fair, he’s a fly ball pitcher and home runs will come with that. And actually, he’s given up the long ball a little more frequently than he should (10.5% HR/FB) throughout his career, so maybe this shouldn’t come as such a surprise.

His 1.51 HR/9 this season is 7th among pitchers who have thrown as many innings as Minor has (83.1). But he’s had some bad luck this year – .343 BABIP, 14.9% HR/FB – and he’s been stricken with a… different kind of offseason injury plus shoulder tendinitis in Spring Training, so it’s reasonable to think that’s where the issue starts and ends. But after personally seeing him give up four home runs in a rehab game against Reds double-A squad Pensacola, it feels like something may be wrong. So I’d like to examine this a little more, if that’s ok.

I imagine that if the problem is something more than just arm trouble or bad luck, it should show up in his numbers somewhere. So I’ll compare his PITCHf/x, pitch type, and heat map data from this season – a not-so-good one – and last season – a quite good one.

First, I just want to show again that he’s been much less lucky this season. It feels to me like there’s something more to it, but luck could be the problem.

babip minor

While that may be so, giving up more home runs could be the result of a change in the amount he’s throwing each of his pitches and the velocity of those pitches.

pitch type

So there’s actually been a small uptick in Minor’s velocity since last season, and he’s been throwing more sliders and fewer changeups. He’s been showing that same trend since his debut and seemed to find a happy medium last year. Those changes from 2013 to this year seem significant, and I think they might be playing a part in his production.

First, we’ll compare how his pitches have been moving and how effective they’ve been the last two years. Rather than show four more tables with a bunch of numbers, here’s a quick summary: 1) His changeup is moving less than it did last year, and it’s getting crushed. 2) His fastball and slider are both moving more than they did last year – but only by a little – and are getting crushed. So those things aren’t great. The BABIP on his changeup is the only one that isn’t outrageous; it’s .281 this year. The opponent’s BABIP on his fastball and slider are .394 and .350, respectively, which are both pretty crazy. So those are two more points for just a ton of bad luck going Minor’s way, and perhaps some good signs pointing towards better luck in the near future. On to the next thing.

Maybe his issue has been locating the ball. He’s walk rate is up a little bit from last year, so it could be that he’s having trouble pitching where he did in 2013. I thought showing his heat maps might illustrate that, but, well…

2013 heat map 2014 heat map

They don’t. Not really, anyway. A lot of his pitches this year, like last year, are right around the middle of the plate, though they were spread out a little more last year. I’m not sure what exactly that means, but maybe he’s not locating quite as well this year.

From what I can gather, it seems like Mike Minor has seen several little changes. (A little higher release point turns into less movement on a pitch every now and then, which turns into everyone crushing your slider, etc.) And a lot of little changes can make a big difference – if things aren’t the same, they’ll be different, right?

Now for a little good news – though I hesitate to call it that. Minor’s historically been a “2nd half pitcher.” Hitters go from a .330 wOBA against him in the 1st half to a .300 after the break, and his FIP and xFIP see some drops as well. In addition, his xFIP is 3.61, which is actually a little better than it was last season. A turnaround doesn’t seem terribly far off for Minor. Cut out a little of that horribly bad luck, and Atlanta’s rotation gets better. Those things might not mean much at all, but maybe it can give Braves fans some hope.

Roster Doctor: Los Angeles Dodgers

With a payroll north of $200,000,000, you would expect the Los Angeles Dodgers to field a competitive team, and indeed they have. As we emerge from the All-Star break, they are neck and neck with the hated Giants, heading into a pennant chase that could be one for the ages. The Dodgers have four of the most watchable players in baseball (Kershaw, Greinke, Puig, and Ramirez) and a farm system with enough talent to supply reinforcements either directly or via trades. The team is not without needs, however. Like almost any team, the Dodgers has some bullpen depth issues, but just alleviated those somewhat by recalling Paco Rodriguez, a non-flamethrower who nevertheless generates a ton of Ks. Catching has been a riddle for manager Don Mattingly as well.  He’s had to use four backstops, none of whom have amassed enough appearances to qualify for the batting title, and of whom only the stalwart but venerable A.J. Ellis has provided anything even approaching an offensive contribution. (Well, Miguel Olivo made an offensive contribution of a different kind.)

But the biggest problem has been Matt Kemp, who dug a Tunguskan-size crater in center field before Mattingly more or less permanently shunted him to left. Kemp has the worst WAR (-1.3) for any position player qualifying for the batting title except Domonic Brown. Kemp’s hitting about as well as last year’s (modest) effort, but his defense has gone from bad (-0.6 dWAR) to eye-watering (-2.5). Whether you’re new school (zone rating) or old school (range factor), you will find nothing to like in Kemp’s defensive metrics. The move to left has probably mitigated the defensive damage he’s doing, but mainly by reducing his opportunities to come within proximity of the ball. His range in left is almost as far below the league as his range in center, although he’s making fewer errors. Kemp’s agent thinks he can still play center, and so presumably do Matt and his mom. That about exhausts the list.

In one sense this is a simple problem that the Dodgers can solve without any outside help. They could bench Kemp immediately. Center field prospect Joc Pederson is murdilating the PCL’s beleaguered pitchers to the tune of a 1.045 OPS, and yes, that’s good even in the PCL. Pederson is third in the league in OPS, behind two guys who are at least five years older. To the extent Pederson would struggle against major league lefties, he could be platooned with righty Scott Van Slyke, with Andre Ethier sliding between center and left. This is a rare situation where a manager can (almost) unilaterally boost his team’s playoff chances with a single lineup change.

And yet … Kemp can still hit. His .752 OPS is third on the Dodgers among batting qualifiers, and while that’s over 80 points off his career number, it still represents useful offense. At this stage in his career, Kemp’s value would dramatically increase if he didn’t have to put on a glove. The question is how to allocate that increased value among the Dodgers and their potential trade suitors. There are four playoff-contending AL teams whose DHs are either injured, ineffective, or both:

New York Yankees (Carlos Beltran .698 OPS)

Kansas City Royals (Billy Butler .675)

Cleveland Spiders (Nick Swisher .641)

Seattle Mariners (Corey Hart .611)

Kemp would immediately boost any of these teams’ offenses. The Yankees could take much of Kemp’s anvil-like contract ($20 m/yr through 2019), but have few if any prospects to offer. The Royals and Mariners are in the opposite situation: good talent to trade but limited ability to absorb such a huge financial hit. Cleveland, sadly, can’t really employ either approach, and in any case hitting is not their main need.

Dodgers president Stan Kasten’s general strategy upon assuming command was to throw immense amounts of Guggenheim money at the major league roster first, and then reinforce the farm system to ensure a steady stream of cost-controlled reinforcements for the future. Part I of the plan is working well, and Part II is underway with Corey Seager, Julio Urias and Alex “Van Gogh” Guerrero headlining a good collection of upper level minor league talent (non-Pederson division). The Dodgers could go either way here: begin their slow march away from the payroll tax penalty by banishing Kemp to the Bronx, or recharge the lower reaches of their farm system with talent from either of the smaller market franchises who could be in on Kemp. They may not succeed in moving Kemp, but if they can it would provide at least a small edge in a pennant race that looks sure to go to the wire.

Bringing Bill James’ Famous Arbitration Case to 2014

“I helped prepare arbitration cases for George three straight years in the 1980’s… George had led the American League in errors the first year that we prepared a case for him. We were wondering what to do about that, so I drew up an exhibit entitled ‘What Was the Cost of George Bell’s Errors?’ The exhibit showed that while Bell had led the league in errors with 11, none of the errors had actually cost his team anything. Of the 11 errors, only about three led to unearned runs, all had occurred in games which Toronto had won anyway, and in those three games, Bell had driven in something like seven runs.”

Bill James, The New Bill James Historical Abstract


The case that Bill James made for George Bell in 1985, and later informed his readers about when he released his Historical Abstract, always fascinated me. As someone who is a big believer that fielding metrics have a long way to go (especially behind the plate), this arbitration case was my Zihuatanejo, that far away place that always gave me hope that errors were really as pointless a statistic as they seemed.

However, as Bill James points out in the rest of George Bell’s player ranking, the fact that nothing came of Bell’s errors in 1985 (his first arbitration year), as well as 1986 and 1987, when James used the same exhibit, was rather noteworthy. Although errors are definitely not the be all and end all of fielding statistics, one would have to imagine that some ill had to come of them, at some point, right?

With the All-Star break upon us, and sadly no real baseball for the last four days, the chance to finally look into this idea of how much errors actually cost the erring player’s team, presented itself. At the halfway point, there were exactly 20 players who had committed 10 or more errors in 2014. Since there was time to kill without baseball on, I decided to pour over some box scores and figure out just how much each of those leading “error-men” had cost their teams. Using baseball-references fielding game logs, it was easy to find the games in which each player had made their errors, and then going through the play-by-play made it (usually) straightforward as to whether their error led to a run or not.

For this study, I created a chart with columns for all of the parts mentioned in Bill James arbitration case: total errors, unearned runs as a result of those errors, games that the team lost when that player committed an error, and RBI in those games that were lost. The final column (RBI in games lost) was tweaked a tiny bit due to the inclusion of one other column. The column added was one called “true losses.” This was the measure of how many games the team lost by equal to, or fewer runs, than the player’s error cost the team. For example, if Pedro Alvarez made an error that cost his team three runs, and the Pirates lost 4-3, that would be a true loss. Or, if Derek Dietrich made an error that cost his team one run, and the Marlins lost 3-2, that would also be a true loss. Finally, if the game went to extra innings and was a loss, any error worth one run or more was counted as a true loss. Therefore, if Josh Donaldson committed an error which cost his team only one run and then the A’s lost 10-8, but that final came in extra innings, then that would still count as a true loss because the extra innings would have never occurred (hypothetically).

Now this is obviously not a foolproof study. There is no way to say for sure that the error committed for one run was any more the cause of the loss than the pitcher who gave up the home run the next inning. It is also starting to get into a bit of a messy “Butterfly Effect” situation, meaning that there is no way of knowing how the rest of the game (or our lives, bro) would be different if Jose Reyes hadn’t booted that grounder in the fifth inning.

However, it was a fun study to put together, and it can be revealing into how little (or in poor Starlin Castro’s case, how much) errors truly change a game. Here’s the official chart:

What Was the Cost of Player X’s Errors?

Name Errors UER from E Team L’s True L’s RBI in True L’s
Pedro Alvarez 3B 20 11 11 4 4
Josh Donaldson 3B 15 6 5 1 0
Ian Desmond SS 15 10 8 2 2
Asdrubal Cabrera SS 14 12 9 1 0
Jose Reyes SS 13 7 9 2 0
Brandon Crawford SS 13 6 5 0 0
Lonnie Chisenhall 3B 13 6 5 0 0
Everth Cabrera SS 13 7 6 0 0
Brad Miller SS 13 7 5 1 0
Martin Prado 3B 12 13 8 2 2
Jonathan Villar SS 12 14 8 0 0
David Wright 3B 11 5 4 1 0
Starlin Castro SS 11 12 6 5 0
Jean Segura SS 11 8 1 0 0
Elvis Andrus SS 11 7 8 0 0
Yan Gomes C 11 4 6 0 0
Chris Owings SS 11 8 7 2 1
Derek Dietrich 2B 11 6 5 1 0
Jarrod Saltalamacchia C 10 5 7 1 0
Hanley Ramirez SS 10 7 7 1 0

Key: UER from E – unearned runs from errors; Team L’s – team losses; True L’s – true losses (described above); RBI in True L’s – how many RBIs the player had in said True Loss games


Let’s tackle this table column by column.

Well, I don’t think a historiography of each player’s name is necessary in today’s article, so let’s skip over to the position column. It is interesting to note how many left-side of the infield players there are atop the error leaderboard. There’s nobody from the outfield to be found (the “top” outfielder per errors is Sports Illustrated cover boy, George Springer with seven), and there are only three players that don’t hail from third base or short stop as their main position. One branch off of this study that could be interesting would be to look at whether or not there was a correlation between a player’s position on the diamond, and how frequently an error led to runs or “true losses.” My gut instinct would be to guess no, but maybe errors in the outfield are often for more bases, and therefore more likely to lead to a run – just a hypothesis.

Jumping over to the errors column, Alvarez’s 20 errors stood out, as the difference between his total and the second place total is the same as the difference between second place total and the bottom of our table. In fact, seeing that high total made me curious as to just how many errors it would take to get into the record books. Well, if you’re including the entire history of baseball, the answer is: like a bajillion. Obviously the game was entirely different, but it’s hard to imagine that Herman Long’s 122 errors in 1889 weren’t embarrassing even back then. The record for errors in a single season since 1952 is 44 by Robin Yount in 1975, and the record since 1980 is Jose Offerman with 42 in 1992. So while Alvarez’s 20 errors may be pacing the league by a good margin now, it’s fair to say he won’t be joining even the modern record books this season.

The next column looks at unearned runs derived from each player’s errors, and the variance is quite extreme. With a range from only four runs (it’s interesting to note that the catchers have the two lowest unearned runs tallies, maybe that positional study would provide some analysis after all) all the way up to 14, there doesn’t seem to be too close of a connection between the amount of errors and the amount of unearned runs. For instance, Josh Donaldson has committed three more errors than Jonathan Villar in 2014, but Villar’s errors have led to eight more runs. This brings up the question of whether unearned run prevention is simply luck, or whether some teams (and pitchers) respond better after an error is committed in the field.

The A’s are one of baseball’s best teams, and have an excellent pitching staff, so it isn’t too surprising that Donaldson’s unearned runs are among the lowest, especially in comparison to how many errors he has committed. On the other end of the spectrum are players like Altuve and Castro who play on rebuilding teams, and it is unsurprising to see their names next to some of the highest unearned run totals. However, there is most certainly a lot to be said for luck playing a role in how many unearned runs come along after an error. For example, teammates Asdrubal Cabrera and Lonnie Chisenhall find themselves on opposite ends of the spectrum in terms of unearned runs after errors, a definite sign of the role random chance plays in unearned run prevention.

One other note on the extreme variance in unearned runs tied to errors. The variance could also come as the result of what kind of error was made. A bobbled ball that never even gets thrown across the infield does only one base of harm; whereas, an overthrow (many of Alvarez’s errors) may lead to two bases of harm. One could also try to really dig deep into this data and see if younger, more inexperienced players were more likely to commit errors late in games, when the pressure was ratcheted up, and maybe those errors were more likely to be costly. However, with this study, the idea is simply to get a feel for another way of looking at errors, and the main point that remains here is that there is a lot of luck to whether a player’s error costs his team a run or not.

There isn’t a whole lot to be said about the team losses column, as committing an error does indeed swing the pendulum (or WPA chart) towards a loss, but so minimally that it wouldn’t even bother one of Poe’s victims. For instance, implying that Jean Segura (only one team loss in games he committed an error) timed his errors better than Elvis Andrus (eight team losses in games he committed an error) is really just saying that the Brewers are better than the Rangers; which they are, but that doesn’t reflect on the individual player at all. That comparison is especially interesting given that Andrus’ errors have actually led to fewer unearned runs than Segura’s.

The next column, the “true losses” column, is where the fallacy of the error as a statistic truly shows its colors. The only players who cost their teams more than two wins in the first half (with teams having played well over 90 games in 2014, so far) were the league leader, Alvarez, and the incredibly unlucky Starlin Castro. Castro’s case could be an entire article itself, and the poor timing of his errors is remarkable. The fact that the Cubs have only lost six games in which he has committed an error, and five of those can be considered “true losses” is very much a statistical anomaly. Consider that in this chart there are 124 team losses outside of Castro’s Cubs. Of those 124 losses, 19 were true losses, or just over 15 percent. In Castro’s case, over 83 percent of his team losses were true losses, such a far outlier it warrants special attention.

Even when including Castro’s remarkable true loss numbers, the percent of losses that could be considered, even hypothetically, the erring player’s fault is merely 18.5 percent, and that’s not even accounting for all the games that the team’s still won in which one of  the listed player’s committed an error. This is a good time to point out that this study obviously does not take into account any of the good, run-saving plays that these fielders make, and even still the total impact on a team is minimal. As seen in Pedro Alvarez’s row, he drove in plenty of runs in those games in which he cost his team, and with his strong range, some of those errors he made likely would have been singles, with the majority of third baseman failing to even get to the ball. Josh Donaldson and David Wright stand out as particularly strong cases of top-notch fielders who, because of their strong range, get to more groundballs, but get to them in difficult positions, thus increasing the likelihood of an error.

All of this being said, let’s not take too much away from the potential impact of an error. It is indeed a mistake, and can have a negative impact on the team in ways more than just the scoreboard. For instance, for every error made, that is an extra batter that the pitcher has to face, and therefore, more pitches on his final pitch count. If the bases were clear before the error, the pitcher has to pitch out of the stretch now, and the threat of a potential steal is in play. If a certain player is prone to errors, it may also lead to his pitcher not having confidence in his defense behind him, and therefore getting himself in trouble by trying to do too much on the mound. Other fielders may feel that they have to cheat in the commonly erring fielder’s direction if there is likely to be a mistake made, which can mess up a team’s defensive positioning. Finally, there’s the fact that for all of us here at FanGraphs who realize the harm in relying on errors too much as a statistic, there are still those in baseball who do rely on it, and committing enough errors in the field, may lead to a player riding the pine for a few days.

In the end, it’s fair to say that errors are one metric out of many. They have historically been overused, and hopefully the chart above, has made it clear that frequently an error won’t really cost the team anything.

And if your error did cost your team, well, you’re probably Starlin Castro.

Using High-A Stats to Predict Future Performance

Last week, I looked into how a player’s low-A stats — along with his age and prospect status at the time — can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A included: age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll analyze what KATOH has to say about players in class-A-advanced leagues. Here’s the R output based on all players with at least 400 plate appearances in a season in high-A from 1995-2009:

High-A Output

This looks very similar to what I found for low-A players: Walk rate isn’t significant, and everything else has very similar effects on the final probability. However, the coefficients from this model are all a tad bigger than those from the low-A version, implying that high-A stats might be a bit more telling of a player’s future. Intuitively, this makes sense: The closer a player is to the big leagues, the more his stats start to reflect his future potential.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in high-A as of July 7th. I also included a few notable players who fell short of the threshold, namely Joey Gallo (who checks in at a remarkable 99.8%), Peter O’Brien, and Jesse Winker. Here’s an excerpt of the top-ranking players:

Player Organization Age MLB Probability
Joey Gallo TEX 20 100%
Corey Seager LAD 20 99%
Carlos Correa HOU 19 99%
Albert Almora CHC 20 93%
Nick Williams TEX 20 93%
D.J. Peterson SEA 22 93%
Jesse Winker CIN 20 91%
Orlando Arcia MIL 19 88%
Jose Peraza ATL 20 87%
Colin Moran MIA 21 87%
Renato Nunez OAK 20 86%
Tyrone Taylor MIL 20 85%
Hunter Renfroe SDP 22 84%
Josh Bell PIT 21 84%
Raul Mondesi KCR 18 83%
Daniel Robertson OAK 20 83%
Jorge Polanco MIN 20 81%
Dilson Herrera NYM 20 77%
Breyvic Valera STL 21 77%
Peter O’Brien NYY 23 76%
Matt Olson OAK 20 75%
Jorge Alfaro TEX 21 75%
Patrick Leonard TBR 21 75%
Dalton Pompey TOR 21 73%
Billy McKinney OAK 19 73%
Teoscar Hernandez HOU 21 73%
Brandon Nimmo NYM 21 72%
Jose Rondon LAA 20 70%
Rio Ruiz HOU 20 70%
Brandon Drury ARI 21 70%

Next up will be double-A. Unlike A-ball, double-A tends to be a random mishmash of prospects and minor-league lifers, so it will be interesting to see how KATOH handles this wide array of players. And perhaps double-A is where a player’s walk rate finally starts to tell us something about his future success.

Statistics courtesy of Fangraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

Pitch Movement Benchmarks

There are many variables that influence the effectiveness of a pitch. Of these many variables, the way in which the pitch moves contributes to the overall story of a pitcher. And because we all want to know how and why a pitcher is successful, determining benchmarks for movement can be a useful measurement when evaluating pitchers. 

Using 2011-2013 data, with exclusion criteria of at least 50 innings and the pitch being thrown at least 4% of the time, we determined the average horizontal movement, vertical movement, and overall movement (which we refer to using the Z-axis) of each pitch. Horizontal movement is affected by handedness; so to account for that, we used the absolute value of average movement. All of this doesn’t mean much yet, because there are so many factors that goes into what makes a pitch effective. But we can at least look at it to get some idea of how a pitch moves relative to others throwing it. 

Here they are:

FA% vFA FA-X (inches) FA-Y (inches) FC-Z (inches)
38.99% 91.48 4.82 7.73 9.49
FT% vFT FT-X (inches) FT-Y (inches) FT-Z (inches)
21.06% 91.31 8.50 6.76 11.04
FC% vFC FC-X (inches) FC-Y (inches) FC-Z (inches)
19.07% 88.44 1.27 5.47 5.83
SI% vSI SI-X (inches) SI-Y (inches) SI-Z (inches)
36.84% 90.23 8.57 4.91 10.31
FS% vFS FS-X (inches) FS-Y (inches) FC-Z (inches)
14.66% 84.09 5.31 2.82 6.43
SL% vSL SL-X (inches) SL-Y (inches) SL-Z (inches)
21.27% 83.21 2.67 0.44 3.64
CU% vCU CU-X (inches) CU-Y (inches) CU-Z (inches)
14.56% 76.89 5.01 -5.86 8.02
CH% vCH CH-X (inches) CH-Y (inches) CH-Z (inches)
12.82% 83.10 7.01 4.14 8.55

You may notice it to be odd that some of these pitches’ vertical movement appear to rise. However, these measurements do not account for gravity. If gravity was factored into the measurement, then yes, the slider (and the rest of all these pitches) would appear to drop many more inches.

Lets see how Clayton Kershaw’s curveball matches up…

CU% (pfx) vCU (pfx) CU-X (pfx) CU-Y (pfx) CU-Z (pfx)
14.52 74.94 2.7 -8.58 8.99

Versus MLB average.

CU% (pfx) vCU (pfx) CU-X (pfx) CU-Y (pfx) CU-Z (pfx)
14.56% 76.89 5.01 -5.86 8.02

Kershaw’s curveball, although the horizontal movement is less than average, moves much more than league average vertically and more overall.

Work in collaboration with Douglas Wills.

Wait, They’re Good Now?

In the 2008 season the Yankees started the year with two young pitching prospects in their rotation: Phil Hughes and Ian Kennedy. These two pitchers were expected to be the future of the Yankees rotation. That didn’t really go as planned. The two pitchers struggled, and they both earned demotions as they combined for an ERA of 7.44. Hughes and Kennedy were simply not ready for major-league action. They gave up too many walks, didn’t strike out enough guys, and didn’t keep the ball in the ballpark. That’s a recipe for disaster when it comes to trying to succeed as a pitcher at the major-league level.

Nonetheless, these two pitchers showed enough promise as prospects for the Yankees to actually wait on them. In fact, after their demotions, Hughes and Kennedy spent most of their 2008 season in the minors due to mediocrity and injuries. The  Yankees were patient for a year with their young talent, however there is only so much time that goes by before you go from being a developing prospect to struggling major leaguer. The Yankees quickly gave up on Kennedy, and traded him to Arizona, where he showed decent success as a starter. In three seasons with Arizona, Kennedy compiled a WAR of 10.2.

The Yankees saw something in Phil Hughes. Hughes showed some promise in 2009 as a reliever, and then in 2010 as a starter who compiled a WAR of 2.5. However, there was the problem of Yankee Stadium not suiting Hughes’s skill set. Hughes was a fly-ball pitcher in a stadium that was known for being a hitter’s haven. Hughes always struggled as a Yankee when it came to keeping the ball in the park. The lowest home run rate that Hughes posted as a full season Yankee starter was 1.28 in 2010.

Both Kennedy and Hughes had some success over the  years; one could even argue that Kennedy was one of the best pitchers in the league in 2011. However, for the most part their careers have been a mixed bag. But times have now changed. Kennedy is now with his third team, the Padres, and Hughes is with his second team, the Twins. After mediocre 2013 seasons, the two pitchers are actually performing well.

2014 Season K/9 BB/9 HR/9 ERA FIP xFIP WAR
Hughes 7.99 0.81 0.67 3.92 2.62 3.22 3.7
Kennedy 9.67 2.46 0.72 3.47 2.93 3.17 2.3

As of right now, Hughes is fourth in the league for FIP among qualified pitchers. The only pitchers who have been better are John Lester, Adam Wainwright, and Felix Hernandez. Hughes is third in the league for WAR, right behind Lester and Hernandez. For the first half of the season Hughes has pitched like an ace.

Hughes has had the second best walk rate among qualified starters. Any walk rate below two is considered to be good, and Hughes’s rate right now is downright ridiculous. We can’t expect Hughes to be this good at not walking people, however the ZiPs/Steamer projections have him finishing the year with a walk rate between 1.31-1.38. That’s a pretty good projection, considering Hughes has never had a walk rate lower than 2.16. Hughes has also improved his home run problem, as he isn’t letting an egregious number of baseballs leave the park. The main change in Hughes approach has been his implementation of the cutter. Between 2012 and 2013, Hughes had dropped his cutter. This year, he reintroduced the pitch — throwing it 23% of the time — and dropped his usage of a slider. The change has proven to be useful for Hughes, and he no longer needs to rely on his fastball.

Then there is Kennedy. Kennedy has turned himself into the ace of the Padres staff this year. The main difference in Kennedy is that he has actually gained velocity on his pitches. Throughout his career he has always been a soft tosser. For most of his career, Kennedy averaged 89-90 MPH on his fastball. In 2013 he was up to 90 MPH. This year he is averaging 92 MPH.

Not only does Kennedy’s fastball have more velocity, but he’s also throwing the pitch more than he ever has since 2009. He has thrown his fastball 48% of the time this year. The last time he threw it more than 40% was 2010.

While it may be good to have more velocity, it also could be a little bit of concern when it comes to Kennedy because his secondary offering don’t appear to be very good. In fact, all of his pitches have negative wRAA values except for his fastball, which has a wRAA of 12.8. Most of Kennedy’s strikeouts have come off of his fastball. Having a good fastball is nice, but when Kennedy gets older — and his velocity starts to decline — he’s going to have a hard time being successful if he doesn’t have good secondary offerings.

Overall, the changes for these pitchers seemed to have worked. They’re succeeding in their own environments. While the Yankees never were able to see their prized prospects come into fruition, these two pitchers have found success away from New York. Learning to pitch at the major-league level is a learning curve. Some pitchers dominate right away. Other pitchers struggle for their first couple of years, and then things somehow start to click for them. I’m not suggesting that Kennedy and Hughes have figured out pitching, nor are they the best pitchers in the majors. However, they have proved that they are  at least very average starters, or maybe even above average major-league pitchers. Only time will tell.

The Luckiest and Un-Luckiest Pitchers According To Base Runs

On June 3rd Marlins pitcher Henderson Alvarez threw an 88-pitch shutout against the Rays scattering eight hits while not issuing a walk. On July 11th Marlins pitcher Henderson Alvarez also gave up eight hits while not issuing a walk but only made it five innings after surrendering 6 runs. While the circumstances surrounding these two starts aren’t completely the same they do a good job illustrating the phenomena of cluster luck.

Cluster luck, originally discovered and coined by Joe Peta in his book Trading Bases, essentially tells us how lucky teams have been by measuring the difference in the expected number of runs scored by a team based on its power (total bases), and base runners (hits/walks) and its actual number of runs scored. In Alvarez’s July start above he was a victim of poor sequencing, allowing his hits in bunches rather than spreading them out over the course of his start. For a more complete (and easier to understand) definition and some real world examples check out this and this.

What I will be attempting to do in this article is figure out a way to accurately estimate how many runs a pitcher should have allowed, and subsequently what his run average should look like, and then pinpoint certain pitchers who have been lucky or unlucky so far this season. Basically I am trying to normalize a pitcher’s RA by adjusting for sequencing and cluster luck.

Fortunately for me the heavy lifting for part one has already been done thanks to Dan Smyth. His metric, Base Runs (BsR), was developed and popularized in the early 1990’s and is an extraordinarily simple yet accurate way of estimating runs allowed using standard box score statistics. Base Runs for pitchers takes four inputs, innings pitched, hits, walks, and home runs, which are converted into four factors, A, B, C, and D. The final formula looks like A*B/(B+C)+D. For a lengthier piece on Base Runs, it’s properties, and it’s pros and cons consult this and this.

I took these statistics, including run average, for every pitcher in the majors through July 12th and figured his expected runs allowed by Base Runs, then converted it to Base Run Average or BsRA and took the difference between BsRA and his actual RA. I also calculated the pitchers’ RA- and BsRA- by taking the pitcher’s RA or BsRA and divided it by the league RA or BsRA (for reference the league RA is 4.14 and the league BsRA is 4.19). By taking the difference between the two, (BsRA-)-(RA-), we can figure out the percentage of extra runs compared to league average the pitcher should have allowed.

In the tables below you’ll see I’ve given this stat the name Luck%, a poor name admittedly since we’re dealing with percentages and I’m sure the differences aren’t completely due to luck but the name will have to do until I think of something better. For example Max Scherzer’s RA- is 80.92 (RA of 3.35/league RA of 4.14) meaning he has allowed runs at around 81% of the league average, but his BsRA- is 88.62 (BsRA of 3.71/league BsRA of 4.19) meaning he should have allowed runs at around 89% of the league average. We then get a Luck% of 88.62-80.92=7.71, so Scherzer should have allowed 7.71% more runs compared to league average, he has a Luck% of 7.71.

Whew. Now we can get to the names.

First the top ten qualified pitchers who have had their numbers most positively affected by cluster luck.

Name IP RA BsRA BsRA- RA- Luck%
Mark Buehrle 126.1 2.92 3.95 94.3 70.5 23.7
Wei-Yin Chen 104 4.24 5.19 123.8 102.4 21.4
Jason Vargas 125 3.38 4.23 101 81.6 19.4
Zack Greinke 118.2 3.11 3.91 93.4 75.1 18.2
Alfredo Simon 116.2 2.78 3.50 83.5 67.1 16.3
Josh Beckett 103.2 2.6 3.30 78.9 62.8 16.1
Masahiro Tanaka 129.1 2.71 3.41 81.5 65.5 16
Yordano Ventura 101.2 3.36 4.03 96.2 81.2 15
Chris Young 105.1 3.16 3.81 91 76.3 14.7
Henderson Alvarez 120 3.23 3.85 91.8 78 13.8

I like this list since it is very diverse. We have pitchers who have been pleasant surprises this season but who we all know aren’t really that good (Vargas and Simon). Older pitchers experiencing a late career resurgence (Beckett and Buehrle). Great pitchers (Greinke and Tanaka) and not so great pitchers (Chen). Hard throwing (Alvarez) and soft throwing (Young). High strikeout and low strikeout etc. etc. It’s good to see that not just one type of pitcher is affected giving me confidence that cluster luck does play a factor in a pitchers numbers to such a degree even this late in the season.

Now on to the top ten pitchers who have had their numbers most negatively affected by cluster luck.

Name IP RA BsRA BsRA- RA- Luck%
Anibal Sanchez 94.2 3.52 2.44 58.2 85 -26.8
Matt Garza 124.1 4.42 3.37 80.4 106.8 -26.3
Justin Masterson 98 6.06 5.09 121.4 146.4 -25
Tyler Skaggs 91 4.65 3.78 90.2 112.3 -22.2
Charlie Morton 119.1 4.15 3.36 80.1 100.2 -20.1
Roenis Elias 112 4.94 4.33 103.2 119.3 -16.1
Jorge De La Rosa 102.2 4.91 4.32 103.2 118.6 -15.4
Edwin Jackson 105.1 6.07 5.53 132 146.6 -14.7
Jose Quintana 119.1 3.85 3.31 79.1 93 -13.9
Hiroki Kuroda 116.1 4.64 4.19 100 112.1 -12.1

This is a slightly less diverse list. Most of these guys are having disappointing seasons, but perhaps they haven’t been as bad as we think. Four of these guys have a below average RA, but an above average BsRA (or perfectly average in the case of Kuroda). Then there’s Anibal Sanchez who might just be one of the most underrated pitchers in baseball as his BsRA is seventh in all of baseball.

So what does Luck% end up telling us about a pitcher? We know that pitchers have little control over what happens after a ball is put in play, but what we’re doing here is figuring out which pitchers have been victimized by poor sequencing. Perhaps we can look at Luck% the same way we look at BABIP. If the measure is abnormally high compared to a pitcher’s career rate and the pitcher hasn’t made a substantial improvement in his mechanics or pitch repertoire perhaps some regression is in order.

So is Anibal Sanchez due for a spectacular second half? Maybe not. A myriad of factors could be influencing his low Luck%. We know that in general offense goes up when runners are on base and Sanchez could be especially susceptible to allowing runs to score in bunches. He has a slow move to the plate potentially allowing more runners to steal and get in scoring position. Perhaps his stuff is less effective from the stretch due to a breakdown in mechanics. Maybe he focuses too much attention the runners on base and not enough on the one at the plate, I really don’t know.

I only have half a season of data on 100 or so pitchers so obviously more research is needed. One could find the correlation between Luck% and peripheral stats such as K% and BB%, or find year to year correlations for Luck% to find out how much variation is actually luck and how much is skill. I’d definitely be intrigued by those results and I’ll likely revisit these numbers when the season ends.

I’m still relatively new to performing this kind of analysis so any constructive criticism would be greatly appreciated or if you’ve seen something like this done elsewhere on the internet. If you have suggestions for any improvements (especially the name) or further research I’d love to here it. If you think I majorly screwed up somehow I’d love to hear about too.