Archive for Research

A Baseball World Without Intentional Walks

There are at-bats. And the possible positive outcomes of those come down to three: hits, walks and batters hit by pitches. Hits can be separated in singles, doubles, triples and home runs. Hits by pitch are pretty much what they sound like. Walks, on the other hand, are bases on balls awarded by the pitcher to the batter either unintentionally due to lack of control or intentionally to supposedly prevent the hitter for inflicting more than single-valued damage by giving him the first base for free.

The intentional base by balls have always been present in baseball. They have been tracked, though, since 1955. From that point in time to 2016 (the last complete season with data available), a total of 73,272 IBB have been awarded to batters, for an average of around 1,182 per season. If we look at the full picture, though, there have been more than 11 times more BB than IBB in the same period of time. Obviously, hitters are not awarded a base for free if they have not gained a certain status in which pitchers “fear” the possibility of them being punished by a bomb to the outfield that holds high value and could turn into runs for the opposing team.

Even with that, IBB rates are at their lowest since 1955 due to strategical improvements and the study of the game, which has led to the conclusion that awarding bases to hitters for free is more than probably not the best approach. But with more than a thousand instances per season on average, we have a big enough sample size as to have some fun with the numbers and try to think of a baseball world in which IBB had been somehow vetoed by the MLB and therefore not awarded to hitters from 1955 on. What could have this meant for batters during this span? How much could have it impacted the hitting totals for some of the already-great hitters of baseball history? Let’s take a look at the data.

Counting from 1955, only five players have had careers in which they have posted an IBB/PA larger than 2% in at least 10,000 PA. Barry Bonds, Hank Aaron, Ken Griffey, Albert Pujols and David Ortiz. Those are some scary names to have at the home plate staring at you while playing the role of the pitcher. If we lower the threshold to 1% IBB/PA, we end with a group of 39 players, more than enough to get some interesting testing. The first thing that jumps out and we could expect is that only one of the 39 players fell short of the 100 HR mark (Rod Carew, with 92) and that all of them surpassed 2223 hits during their careers (for that matter, only 110 MLB players since 1955 have got to that mark, so players from our group make for 34% of them).

So, back to our group, the correlation between IBB and HR yields an R-value of 0.256, which is more or less significant. This means that power hitters have historically tended to be awarded more bases by balls than any other type of batter. If no IBB had been allowed in baseball, we would only have hits, unintentional walks and hits by pitch left as our possible plate appearance outcomes. By making a simple set of calculations we can come up with how many extra hits, home runs, etc. each of our players could have ended their careers with had they not being walked on purpose during their playing time. It is just about knowing the rates they hit singles, doubles, triples and homers per PA (subtracting IBB outcomes from the total number of PA) and then multiplying those rates for the IBB each of them were awarded in their careers. This way we can have a simple look at how much better numbers those hitters could have reached based on their pure hitting ability.

The case of Barry Bonds is truly unique. The all-time home run leader not only lead the IBB leaderboard with 688, but the difference between him and the second ranked player (Albert Pujols, 302 IBB) is a staggering 386 IBB, more than doubling him. The difference between Pujols and third-ranked Hank Aaron is of 9 IBB, just for comparison’s sake. In order to get a comprehensive list of the most improved players in this alternative world, we can sort them by the number of extra hits (no matter the type) they would have got had they not received a single intentional base on balls. The next table includes the 20 players with the most expected extra hits to gain in this scenario.

Unsurprisingly, Bonds comes out first – and by a mile. Again, Barry doubles the EEH of second-ranked Pujols and would have finished his career with over 3,000 hits, at a 3,104 mark. That would make him the eighth player in terms of hits among those analyzed, while Pete Rose (not in the table above) would have gained 45 hits to surpass the 4,300-hit mark and reach exactly 4,301.

By breaking the hits by category the outcome at the top is the expected, with Barry Bonds always topping the simulations. Clearing him from the picture, Hank Aaron would have hit the most extra singles with 49, followed by Pujols and Tony Gwynn with 48. Speaking of doubles, Pujols would have got an extra 18, and three players would have 13 more than what they reached in their careers. Triples are much less frequent and only two players, Roberto Clemente and George Brett, would have batted for three extra triples. Finally, in the home-run category, Bonds would have hit for an extra 44 homers, followed by Pujols and Aaron (13 plus) and Ken Griffey.

Had all these numbers been real and IBB cleared from the face of Earth, historical career leaderboards would have not changed a lot, at least at the highest positions, but some records would be seen as even more unbreakable than they are now. Someone would have to break the 4,300-hit barrier again to surpass Pete Rose. Bonds’ new mark of 806 HR would be unimaginable to reach by anyone nowadays (Pujols, still active, would be almost 200 HR away while entering his age-38 season next fall).

It may not had been a critical change, but baseball would have been (and be) way more fun to watch. Just looking at our starting 39 guys, we would have seen the ball being hit 1,928 more times (out of 7,423 IBB, which is a 26% more than we have), witnessed 300 more home runs being called and annotated a couple of unthinkable numbers in MLB’s history books. Now just imagine how much baseball-fun we’ve lost if I remind you that there have been 73,272 walks awarded during the past 61 seasons (yes, your calculation is correct, around 19,000 extra hits by our group’s measures).

Giving Players the Bonds Treatment

There is no higher compliment that can be given to a ballplayer than to be given “The Bonds Treatment” — being intentionally walked with the bases empty, or even better, with the bases loaded. It’s called “The Bonds Treatment” because Barry Bonds recorded an astounding 41 IBBs with the bases empty, and is one of only two players to ever record a bases-loaded intentional walk. In other words, 28% of IBBs ever issued with the bases empty were given to Bonds — and 50% of IBBs with the bases loaded. Bonds was great, no denying that — but is there anyone out there today who is worthy of such treatment?

We can find out using a Run Expectancy matrix. An RE matrix is based on historical data, and it can tell you how many runs, on average, a team could expect to score in a given situation. A sample RE matrix, from Tom Tango’s site, is shown below.

RE Matrix

The chart works as follows — given a base situation (runners on the corners, bases empty, etc.) move down to the corresponding row, then move to the corresponding column and year to find out how many runs a team could expect to score from that situation. In 2015, with a runner on 3rd and 1 out, teams could expect to score .950 runs on average (or, RE is .950). If the batter at the plate struck out, the new RE would be .353.

We can take this a step further. Sean Dolinar created a fantastic tool that allows us to (roughly) examine RE in terms of a batter’s skill. Having Mike Trout at the plate vastly improves your odds of scoring more than having Alcides Escobar, and the tool takes this into account. We can use this tool to look at who deserves the Bonds treatment in 2017 (or, to see if anyone deserves the Bonds treatment): defined as being walked with the bases empty, or the bases loaded.

First, we can look at a given player and their RE scores for having the bases empty or full. In this instance, we will use Michael Conforto, who batted leadoff for the Mets against the Texas Rangers on August 9. Conforto’s wOBA entering the game was .404, and the run environment for the league is 4.65 runs per game, so Conforto’s relevant run expectancy matrix looks like this:

Michael Conforto RE Matrix

Batting behind him was Jose Reyes, who, entering the game, had a wOBA of .283. Let’s assume that Conforto receives the Bonds Treatment, and is IBB’d in a given PA with bases empty or loaded. What would the run expectancy look like with Reyes up? In other words, what is Reyes’ run expectancy with a runner on first, or with the bases loaded after a run has been IBB’d in?

To do this, we can look at Reyes’ RE with a runner on first and with the bases loaded. Reyes’ RE with a man at 1B is indicative of what the RE would be like if Conforto had been given an intentional free pass. For a bases-loaded walk, we look at Reyes’ RE with the bases loaded, and then add a run onto it (to account for Conforto walking in a run).

Jose Reyes RE Matrix

Then, we can compare the corresponding cells of the matrices to see if the Texas Rangers would benefit any from walking Conforto. If RE with Conforto up and the bases empty is higher than RE with a runner on first and Reyes up, or RE with the bases loaded and Conforto up is higher than RE with Reyes up and a run already scored, then we can conclude that it makes sense to give Conforto that free pass.

In this instance, we can see that if the Rangers were to face Conforto with the bases empty and two out, it would make more sense for them to IBB Conforto and pitch to Reyes than it would for them to pitch to Conforto, because RE with Conforto up (.172) is higher than RE with Reyes up and Conforto on (.145). As a result, Conforto is a candidate for the Bonds treatment in this lineup configuration, if the right situation arises.

Who else could be subjected to the Bonds treatment? It would take me a few months of work to run through every single individual lineup for every team to figure out who should have been pitched to and who should have gotten a free pass, so to simplify things, I looked at hitters with 400+ PA, looked at when they most frequently batted, who batted behind them most frequently, and whether or not they should have received the Bonds treatment based on who was on deck. While no lineup remains constant throughout the season, looking at these figures gave me a good idea of who regularly batted behind whom.

Three candidates emerged to be IBB’d with the bases empty every time, regardless of outs— Yasiel Puig, Jordy Mercer, and Orlando Arcia. These players usually bat in the eighth slot on NL teams, and right behind them is the pitchers’ slot — considering how historically weak pitchers are with the bat, it makes sense that RE tells us to walk them with the bases empty every single time.

The same could be said of almost anyone batting ahead of a pitcher — according to our model, given an average-hitting pitcher, any hitter with a wOBA over .243 should be IBB’d with the pitcher on deck (only one qualified hitter — Alcides Escobar — has a lower wOBA than .243). The three names above stuck out in the analysis because they were the only players with 400+ PA that had spent most of their PAs batting eighth.

So, an odd takeaway of this exercise is that in the NL, unless a pinch-hitter is looming on deck, the eighth hitter should almost always be intentionally walked with the bases empty, because it lowers the run expectancy. Weird!

The model also identified two hitters who deserved similar treatment to Michael Conforto in the above example (IBB with 2 out and no one on) — Buster Posey and Chase Headley.

Posey has batted with almost alarming regularity ahead of Brandon Crawford, who is running an abysmal .273 wOBA on the season. Headley is a little more curious — Headley is usually a weak hitter, but earlier in the season, Headley batted ahead of Austin Romine frequently, who was even worse than Crawford.

Headley technically isn’t that much of a candidate for the Bonds Treatment since Romine hasn’t batted behind him since June 30, but Crawford has backed up Posey as recently as August 3 — if he’s batted behind Posey again, the situation could very well arise where it becomes beneficial for teams to simply IBB Posey with two out and bases empty.

But ultimately, no one, aside from NL hitters in the eighth slot, emerges as a candidate to be IBB’d every time with the bases empty. And no one, regardless of the situation, deserves a bases-loaded intentional walk. Which raises the question — was it appropriate to give the man himself, Barry Bonds, the Bonds Treatment?

Bonds received an incredible 19 bases-empty IBBs in 2004 (more than doubling the record he set in 2002), so we’ll use 2004 Bonds and his .537 wOBA as the center of our analysis.

In 2004, Bonds batted almost exclusively fouth, and the two men who shared the bulk of playing time batting fifth behind him (Edgardo Alfonzo and Pedro Feliz) had almost identical wOBAs that season (.333 and .334, respectively) — so we’ll assume that the average hitter behind Bonds in 2004 posted a wOBA of .333. This yields RE matrices that look like this:

Barry Bonds RE Matrix compared to 5th Hitter, 2004

Bonds proves himself worthy not only of a bases-empty IBB with two out, but he just barely misses with a bases-loaded IBB. While no one ended up giving Bonds a bases-loaded IBB in 2004, they did give him one in 1998.

For perspective, Bonds was running a .434 wOBA in 1998, and Brent Mayne (who was on deck) was running a .324 wOBA — so this actually wasn’t a move that moved RE or win probability in the right direction.

Win probability, Diamondbacks @ Giants, 5/28/1998
The final spike in WPA is Bond’s IBB — it gave the Giants a better chance of winning. Ultimately, it was a bad idea that didn’t backfire in the Diamondback’s faces.

And of course, I would be remiss in not mentioning the other player to have ever received a bases-loaded IBB — Josh Hamilton.

With apologies to Hamilton, he wasn’t the right guy to get the Bonds treatment here, either — Hamilton ran a .384 wOBA in 2008, and Marlon Byrd, who was on deck, had a .369 wOBA, which means that an IBB in this instance was a really awful move. An awful move that, like Bonds’ IBB, was rewarded by Byrd striking out in the next AB.

Have there been other players deserving of bases-loaded IBBs? It’s possible, but the most likely candidates — Ted Williams and Babe Ruth — usually had good enough protection in the lineup. Of course, there are few hitters that could have protected Bonds from himself — hence why it’s almost a good idea to IBB him with the bases loaded.

Home Runs and Temperature: Can We Test a Simple Physical Relationship With Historical Data?

Unlike most home-run-related articles written this year, this one has nothing to do with the recent home run surge, juiced balls, or the fly-ball revolution. Instead, this one’s about the influence of temperature on home-run rates.

Now, if you’re thinking here comes another readily disproven theory about home runs and global warming (a la Tim McCarver in 2012), don’t worry – that’s not where I’m going with this. Alan Nathan nicely settled the issue by demonstrating that temperature can’t nearly account for the large changes in home-run rates throughout MLB history in his 2012 Baseball Prospectus piece.

In this article, I want to revisit Nathan’s conclusion because it presents a potentially testable hypothesis given a large enough data set. If you haven’t read his article or thought about the relationship between temperature and home runs, it comes down to simple physics. Warmer air is less dense. The drag force on a moving baseball is proportional to air density. Therefore (all else being equal), a well-hit ball headed for the stands will experience less drag in warmer air and thus have a greater chance of clearing the fence. Nathan took HitTracker and HITf/x data for all 2009 and 2010 home runs and, using a model, estimated how far they would have gone if the air temperature were 72.7°F rather than the actual game-time temperature. From the difference between estimated 72.7°F distances and actual distances, Nathan found a linear relationship between game-time temperature and distance. (No surprise, given that there’s a linear dependence of drag on air density and a linear dependence of air density on temperature.) Based on his model, he suggests that a warming of 1°F leads to a 0.6% increase in home runs.

This should in principle be a testable hypothesis based on historical data: that the sensitivity of home runs per game to game-time temperature is roughly 0.6% per °F. The issue, of course, is that the temperature dependence of home-run rates is a tiny signal drowned out by much bigger controls on home-run production [e.g. changes in batting approach, pitching approach, PED usage, juiced balls (maybe?), field dimensions, park elevation, etc.]. To try to actually find this hypothesized temperature sensitivity we’ll need to (1) look at a massive number of realizations (i.e. we need a really long record), and (2) control for as many of these variables as possible. With that in mind, here’s the best approach I could come up with.

I used data (from Retrosheet) to find game-time temperature and home runs per game for every game played from 1952 to 2016. I excluded games for which game-time temperature was unavailable (not a big issue after 1995 but there are some big gaps before) and games played in domed stadiums where the temperature was constant (e.g. every game played at the Astrodome was listed as 72°F). I was left with 72,594 games, which I hoped was a big enough sample size. I then performed two exercises with the data, one qualitatively and one quantitatively informative. Let’s start with the qualitative one.

In this exercise, I crudely controlled for park effects by converting the whole data set from raw game-time temperatures (T) and home runs per game (HR) to what I’ll call T* and HR*, differences from the long-term median T and HR values at each ball park over the whole record. Formally, for any game, T* and HR* are defined such that T* = T Tmed,park and HR* = HR – HRmed,park, where Tmed,park and HRmed,park are median temperature and HR/game, respectively, at a given ballpark over the whole data set. A positive value of HR* for a given game means that more home runs were hit than in a typical ball game at that ballpark. A positive value for T* means that it was warmer than usual for that particular game than on average at that ballpark. Next, I defined “warm” games as those for which T*>0 and “cold” games as those for which T*<0. I then generated three probability distributions of HR* for: 1) all games, 2) warm games and 3) cold games. Here’s what those look like:

The tiny shifts of the warm-game distribution toward more home runs and cold-game distribution toward fewer home runs suggests that the influence of temperature on home runs is indeed detectable. It’s encouraging, but only useful in a qualitative sense. That is, we can’t test for Nathan’s 0.6% HR increase per °F based on this exercise. So, I tried a second, more quantitative approach.

The idea behind this second exercise was to look at the sensitivity of home runs per game to game-time temperature over a single season at a single ballpark, then repeat this for every season (since 1952) at every ballpark and average all the regression coefficients (sensitivities). My thinking was that by only looking at one season at a time, significant changes in the game were unlikely to unfold (i.e. it’s possible but doubtful that there could be a sudden mid-season shift in PED usage, hitting approach, etc.) but changes in temperature would be large (from cold April night games to warm July and August matinees). In other words, this seemed like the best way to isolate the signal of interest (temperature) from all other major variables affecting home run production.

Let’s call a single season of games at a single ballpark a “ballpark-season.” I included only ballpark-seasons for which there were at least 30 games with both temperature and home run data, leading to a total of 930 ballpark-seasons. Here’s what the regression coefficients for these ballpark-seasons look like, with units of % change in HR (per game) per °F:

A few things are worth noting right away. First, there’s quite a bit of scatter, but 75.1% of these 930 values are positive, suggesting that in the vast majority of ballpark-seasons, higher home-run rates were associated with warmer game-time temperatures as expected. Second, unlike a time series of HR/game over the past 65 years, there’s no trend in these regression coefficients over time. That’s reasonably good evidence that we’ve controlled for major changes in the game at least to some extent, since the (linear) temperature dependence of home-run production should not have changed over time even though temperature itself has gradually increased (in the U.S.) by 1-2 °F since the early ‘50s. (Third, and not particularly important here, I’m not sure why so few game-time temperatures were recorded in the mid ‘80s Retrosheet data.)

Now, with these 930 realizations, we can calculate the mean sensitivity of HR/game to temperature, resulting in 0.76% per °F. [Note that the scatter is large and the distribution doesn’t look very Gaussian (see below), but more Dirac-delta like (1 std dev ~ 1.66%, but middle 33% clustered within ~0.4% of mean)].

Nonetheless, the mean value is remarkably similar to Alan Nathan’s 0.6% per °F.

Although the data are pretty noisy, the fact that the mean is consistent with Nathan’s physical model-based result is somewhat satisfying. Now, just for fun, let’s crudely estimate how much of the league-wide trend in home runs can be explained by temperature. We’ll assume that the temperature change across all MLB ballparks uniformly follows the mean U.S. temperature change from 1952-2016 using NOAA data. In the top panel below, I’ve plotted total MLB-wide home runs per complete season (30 teams, 162 games) season by upscaling totals from 154-game seasons (before 1961 in the AL, 1962 in the NL), strike-shortened seasons, and years with fewer than 30 teams accordingly. In blue is the expected MLB-wide HR total if the only influence on home runs is temperature and assuming the true sensitivity to be 0.6% per °F. No surprise, the temperature effect pales in comparison to everything else. Shown in the bottom plot is the estimated difference due to temperature alone in MLB-wide season home run totals from the 1952 value of 3,079 (again, after scaling to account for differences in number of games and teams). You can think of this plot as telling you how many of the total home runs hit in a season wouldn’t have made it over the fence if air temperatures at remained constant at 1952 levels.

While these anomalies comprise a tiny fraction of the thousands of home runs hit per year, one could make that case (with considerably uncertainty admitted) that as many as 59 of these extra temperature-driven home runs were hit in 2016 (or about two per team!).

Altuve Is Defying the Evolution of Baseball

In 1912, the now-known as International Association of Athletics Federations recognised the first record in the 100 metres for men in the field of Olympics’ athletics. Donald Lippincott, on July 6, 1912, became the first man to hold an official record on the discipline with a time of 10.2 seconds from start to finish. He measured 5’10’’ and 159 lbs. It wasn’t until 1946 – 34 years later – that a man broke the 10-second barrier in the 100 meters. James Ray Hines did it at 6’0’’ and 179 lbs. Now fast-forward to 2009 and look up a name: Usain Bolt. There is no one faster on Earth. The Jamaican set the 100 metres world record (9.69 seconds) in Berlin holding a size of 6’5’’ and 207 lbs. I don’t think it is hard to see the evolution of the athletes’ bodies here. We, as human beings, are becoming taller and stronger, physically superior each year. At least some.

While we can’t compare the MLB and baseball as is with Olympic athletes and the demands of track and field, the evolution of sportsmen have been parallel to some extent between both fields. Look at this season’s sensation Aaron Judge. He’s huge. He’s a specimen of his own, truly unique in his size and power. Basically, he’s what we may call the evolution of the baseball player made real. Given that we have height and weight data from 1871 to 2017 provided by we can plot the evolution of both the height and weight of MLB players over the past 146 years. Here are the results.

Unsurprising, if anything. As we could expect, small baseball players populated the majors during the XIX century and the first third of the XX one, only to get reduced to a minimum that has never got past three active players of 67 inches or less for the past 61 years. On the contrary, players taller than 78 inches started to appear prominently in the 60’s and 70’s to reach their most-active peak in 2011 with 72 players spread over multiple MLB rosters. A similar story can be told about the weight of ballplayers, who tended to be lighter in the early days of the game than from the 70’s on, starting to be overcome in presence by heavier players at around the mid-to-late 90’s.

But even with as clear a trend as this is, there are always outliers out there. And in this concrete case of player size, Jose Altuve is defying the rules of evolution by no small margins. At 5’6’’, the Venezuelan is the shortest active MLB player, and he started painting his path to the majors by signing with Houston for a laughable $15,000 international bonus after being rejected earlier by the Astros due to him being too short. This happened in 2007, and by 2011 Jose Altuve was already playing in the MLB and finishing his rookie season with an 0.7 bWAR (good for 5th-best among 21 years old-or-less rookies, tied with RoY Mike Trout). By his second season, Altuve made the All-Star Game, became a staple at Houston’s second-base position and posted a 1.4 bWAR. From that point on he’s had seasons valued at 1.0, 6.1, 4.5, 7.6 and 6.2 bWAR. The next table includes the 20+ bWAR – during their first seven seasons playing in the majors – players of height 5’6’’ or smaller the MLB has seen since 1871.

Look at the debut season of all those players. Of the eight that made the list, two are from the XIX century and five from 1908 to 1941. That is, the closest “small” player with a 20+ bWAR during his first seven seasons of play to Jose Altuve is from more than 75 years ago – and Altuve’s yet to finish the 2017 season, which will probably enlarge his bWAR total.

Focusing on the 2017 season, a total of 1105 position players and pitchers have generated offensive statistical lines and accrued bWAR values by Here’s how they are distributed in terms of height/bWAR.

It is not hard to see how the average MLB player holds a height of around 72 inches (6’0’’), varying from 69 to 76 in most of the cases. There way taller (Chris Young, Alex Meyer, Dellin Betances) and way smaller (Tony Kemp, Alexi Amarista) outliers, and if we add bWAR to the equation, then there is Jose Altuve. Yes, Altuve is the blue dot in the chart, at the bottom right part of it. Not only is he the shortest player of the league, but he’s also the most valuable at this point (6.2 bWAR by Sunday, August 6) and by a good margin over his closer rivals Andrelton Simmons (5.7), Paul Goldschmidt (5.5), Aaron Judge (5.1) and Mookie Betts and Anthony Rendon (both 5.0).

Not just happy with that, Altuve is leading the league in hits (151, with just an 11.9 K% – 16th-best among qualified hitters), batting average (.365), OPS+ (176) and total bases (238). He has improved in virtually every statistical category during the current season, participated in his fourth consecutive All-Star Game, led the MVP race in the AL, and he’s on pace to get also his fourth Silver Slugger award at the second-base position. Even with all that, the likes of Judge and Trout are coming and finishing the year strongly, and there are no guarantees for Jose to become the first Venezuelan to win the MVP since Miguel Cabrera did it five years ago in 2012.

All in all, and looking at how his top rivals stack up in terms of size and production, their numbers could be somehow expected. What Altuve is doing at his size, though, not so much. We have been told that we’re living in the era of the strikeout and that of that of the home-run resurrection, but Jose is determined to turn back the clock and make us all appreciate the wonders of small ballplayers roaming the majors’ fields. Appreciate it while you can, because what he’s doing is truly unique in the history of the sport and its evolution expectations, although it doesn’t seem like anything will be stopping Jose “Gigante” Altuve any time soon.

The Kia Tigers Are Doing Everything Right — Except on the Weekends

The Kia Tigers are doing a lot of things right. At 64-34-1, they are in first place in the Korean Baseball Organization, with a comfortable five-game lead over the second-place NC Dinos. As a team, they are slashing a cumulative .306 / .375 / .479, and are first or second among teams in the KBO in virtually every offensive category.


2017 Kia 1092 213 24 120 658 620 356 0.306 0.375 0.479 116.9
league rank 1 1 2 3 1 1 2 1 1 1 2

But the emphasis on offensive firepower has not come at the expense of pitching; while Kia’s hurlers are not dominating the league the way their hitters are, their pitching staff ranks first in the KBO in WAR (15.8), and has above-average marks in ERA+ (105) and FIP+ (105.6). This is a solid, well-rounded team.

However, Kia has had one major flaw throughout the season: They play significantly worse on the weekends.

The KBO schedule is set up such that each team plays two three-game series per week, one from Tuesday to Thursday, and one from Friday to Sunday. Throughout the 2017 season, Kia players, both pitchers and batters, have performed significantly worse on the weekends. The effect is most noticeable on the hitting side, with a precipitous drop in performance in games that happen in the second, Friday to Sunday, series of the week.

The table below shows the batting splits for the top-10 Kia hitters (by plate appearances), as well as the team as a whole, and clearly shows the distinction between the mid-week and weekend series. From Tuesday to Thursday, Kia hits like, well, Kia. But from Friday to Sunday, Kia’s cumulative batting line is comparable to that of the Lotte Giants and Samsung Lions, who are in seventh and eighth place, respectively.

Kia Tigers 2017 time of week batting splits, descending by △OPS
pos hitter weekday
weekend difference
LF Choi Hyoung-woo 0.440 1.373 0.290 0.883 -0.150 -0.490
SS Kim Seon-bin 0.475 1.135 0.284 0.701 -0.191 -0.434
1B Kim Ju-chan 0.361 0.986 0.192 0.555 -0.169 -0.431
3B Lee Beom-ho 0.308 0.979 0.250 0.781 -0.058 -0.198
CF Roger Bernadina 0.341 1.004 0.301 0.865 -0.040 -0.139
2B An Chi-hong 0.333 0.953 0.317 0.822 -0.016 -0.131
DH Na Ji-hwan 0.327 0.925 0.284 0.954 -0.043 0.029
1B Seo Dong-wook 0.286 0.778 0.311 0.863 0.025 0.085
RF Lee Myeong-gi 0.303 0.797 0.370 0.884 0.067 0.087
team Kia Tigers 0.335 0.935 0.273 0.768 -0.062 -0.167

This stark difference in team performance has borne out in the team’s record. On Tuesday to Thursday games, Kia is 41-9, an .820 winning percentage, or an 118-game-winning pace over a full 144-game season. For comparison, the KBO single-season wins record is 93, set by the 2016 Doosan Bears, and the 90-win mark has only been eclipsed one other time, when the now-defunct Hyundai Unicorns won 91 in 2000.

However, on Friday to Sunday games, Kia is 23-25-1, a .469 winning percentage, or a 68-game-winning pace. If Kia had a .469 winning percentage this season, they would slot in at eighth in the standings between, guess who, Lotte and Samsung.

There are no clear reasons for this drop-off. Kia’s schedule has been fairly balanced between the weekday and weekend series, and they have faced good and bad teams alike. Other teams have some variation between weekday and weekend, but there is no league-wide trend toward weaker weekends, and especially no performance gaps as severe as Kia’s.

However, as Kia is still well in control of the 2017 KBO standings, and performing well overall, this weekend drop-off stands as more of a curiosity than an actual problem. Perhaps it actually makes the team even scarier; despite running roughshod over the rest of the league, the Kia Tigers still have room to improve.

Understanding Roger Bernadina’s KBO Rebirth

A lot of things have clicked for the Kia Tigers this season, chief among them being their offense’s record production. Kia’s fearsome lineup features three of the Korean Baseball Organization’s top-10 hitters by batting average, and five of the top-20 hitters by wRC+, and is a driving force behind the team’s domination of the standings, currently sitting in a comfortable 1st place at 64-34-1, five games up on the second-place NC Dinos.

A major force behind the dominance of the Kia offense has been the unexpected emergence of their new center fielder Roger Bernadina, in his first season in the KBO. Just a season ago, Bernadina was toiling in the minor leagues, playing with the Las Vegas 51s, the New York Mets’ Triple-A affiliate.

The difference between the old Bernadina, a failed prospect who played seven partial seasons in Major League Baseball, mostly with the Washington Nationals, and the current Bernadina, who hits leadoff for the Kia Tigers’ offensive juggernaut, is stark.

Roger Bernadina career stats, 2008-2017
league years G AVG OBP SLG wRC+ WAR
MLB 2008-14 548 0.236 0.307 0.354 81 1.2
KBO 2017 95 0.320 0.383 0.551 135 3.9

In less than a fifth of the games played, Bernadina has already accumulated over three times his MLB WAR and hit over half as many home runs (19 to 28). By wRC+ he has been the 16th most productive player in the KBO this season, and by WAR, he has been the 6th best position player in the league. On Thursday night he hit for the cycle, becoming only the third foreign player to do so in the KBO. Quite a jump for someone who was a career 81 wRC+ hitter in the MLB.

Which of course begs the question: What’s changed? In less than a season, how has Roger Bernadina improved this much?

It isn’t plate discipline; Bernadina is actually walking slightly less (7.7 percent in the KBO versus 8.2 percent in the MLB) and swinging more (50.3 percent vs 42.1 percent). His strikeouts are down from 21.3 percent in the MLB to 17.4 percent in the KBO, but that change may be more a function of the leagues themselves (the MLB’s higher overall K% means Bernadina’s mark is about league average in both leagues) than any adjustment Bernadina himself has made.

Bernadina also still profiles as the same type of hitter, hitting a majority of his batted balls on the ground, with a moderate preference to pull. He never displayed particularly drastic platoon splits, hitting roughly the same against lefties and righties, and this tendency is also unchanged. Though his batted-ball characteristics would have made him a reasonable shift candidate, shifts were almost never employed against him in the MLB, so his increased numbers in the KBO are also not the result of the KBO’s relative lack of defensive shifts.

The biggest difference is the change in Bernadina’s batting average on balls in play. His current KBO BABIP is .353, a drastic increase from his career MLB BABIP of .288.

On one hand, Bernadina profiles as the type of hitter than might naturally run a higher BABIP. He runs well, having rated as a positive baserunner and base-stealer in both his time in the MLB (59 steals, 83% success rate, 8.9 BsR) and the KBO (21 steals, 81% success), and the fact that he is primarily a ground-ball hitter should give him ample opportunity to take infield hits and run a higher BABIP.

However, his track record shows this to not be the case. BABIP is a statistic that takes a long time to stabilize, and as such his career average is more indicative of him as a player than his current 2017 outlier mark. With no other changes in batted-ball profile or batting approach, Bernadina’s increased BABIP, and by extension increased offensive production, is more likely the result of fortunate circumstances and luck than any real change in skill.

That being said, simply acknowledging that Bernadina has been lucky this season does not diminish his performance. Regardless of whether he is performing to his expected outcomes or not, he has been a productive member at the top of the Kia Tigers’ lineup and, perhaps even more interestingly, has hit better as the season has progressed.

Stealing Bases and Splitting the Rewards

The contextual revolution (don’t really know if that’s a thing, but it sounds official) emerged in the MLB the past few years, attempting to control for more situational effects than current sabermetric-driven baseball stats. These models build upon Bill James’s work, Tom Tango’s all-important linear weights, and similar metrics that account for league, park, and positional production.

Baseball Prospectus (BP) writers developed baseball statistics that further quantify performance using mixed models . You can find a good introduction to mixed models in this article written by Jonathan Judge, Harry Pavlidis and Dan Brooks of BP, but if you are familiar with linear or logistic regression, a mixed model attempts to estimate the average performance over the course of the season (fixed linear model) and use the residuals (or error) to simultaneously quantify the contributions of “random” participants in any given play. Now, why do I say random? It isn’t so much that these participants are random, but that the baseball players are always changing and the number of “random” interactions they have throughout a season is endless, while the effect of an 0-2 count on run production stays relatively consistent or fixed throughout a whole season.

Some existing baseball stats based on mixed models include:

  1. Called Strikes Above Average (CSAA) — defensive statistic that measures catcher framing skills controlling for the batter, pitcher, catcher, and umpire
  2. Swipe Rate Above Average (SRAA) — base running metric that attempts to quantify base stealing ability for batters, and stolen base prevention for pitchers and catchers
  3. Take Off Rate Above Average (TRAA) — player specific effects on base stealing attempts
  4. cFIP — a new version of Fielding Independent Pitching (FIP) taking into account many aspects of a plate appearance. Read more about it here.

By the title, you can probably guess this article is about stolen bases, and you are correct. Specifically, I will be discussing Swipe Rate Above Average, or SRAA for short. SRAA is derived from a mixed model that attempts to account for the inning, the stadium, the quality of the pitcher, and the pitcher, catcher, and lead runner involved. SRAA is directly derived from a player’s random effect and is a single number, generally ranging from -10% to 10%, describing the additional probability a player contributes to a successful steal. For example,  Mike Trout had a 4% SRAA in 2016. Given the average stolen-base situation, Trout is 4% more likely to successfully steal than the average baserunner in 2016.

While SRAA accounts for pitcher skill using cFIP (See above link for more information), the quality of a pitcher can’t necessarily control for all variation in a pitcher’s pitch sequence or the occasional mistake in the dirt. Pitches in the dirt, pitch-outs, off-speed, and fastballs are treated equally in SRAA. Consequently, SRAA values may be lacking for runners that disproportionately get thrown out on pitch-outs or for catchers that consistently block balls in the dirt while still throwing out the runner.

Let’s explore some evidence of these effects before we include them in the pitch adjusted (pSRAA) model. I started by subsetting Retrosheet play-by-play data from the 2016 season to only stolen-base attempts by lead runners. For example, events with a steal of second base with a man on third were not included. I only included situations where a pitch preceded a stolen-base attempt. I supplemented the play-by-play data with PITCHf/x data which tracks trajectories of every pitch in the MLB. I aligned the pitch data with each stolen base with minimal missing connections between the two data sets. Only three stolen bases did not have PITCHf/x data since there technically wasn’t a pitch that occurred (e.g., steal of third, then steal home on a passed ball). An additional eight did not have valid trajectory readings in PITCHf/x.  I ended up with 2,809 total attempts. Excluding some of these stolen bases means, for those who are familiar with SRAA, my SRAA numbers will not match up directly with BP’s numbers.

I first examined pitch speed and its effects on stolen-base percentage. It’s no surprise that, in 2016, runners succeeded more often on slower pitches.

Notice a slightly higher success rate for pitch speeds that fall above 95 mph. This phenomenon is not unique to 2016, and Jeff Sullivan hypothesized that good base-stealers are the ones stealing against fireballers. Indeed, while only 8% of stolen bases occur during a pitch that is 95 mph or higher, speedsters Billy Hamilton and Starling Marte attempted over 12% of their stolen bases in these situations. These situations tend to arise later (about one inning later on average) in closer games (stealing team is only .39 runs ahead rather than .46 runs ahead on average), meaning base-stealers ought to be more certain of success before attempting to steal.

In addition to pitch speed, we also have access to pitch location data through PITCHf/x. As you can see in the figure below, the SB probability varies more drastically by location, and therefore, is the most meaningful of the two pitch metrics. The results below mirror the results I would expect. High SB probability along the right side of the plate for left-handed hitters confirms that most catchers (if not all) are right-handed, which makes it hard to throw over left-handed hitters. Similarly, catchers have more success with right-handed hitters and pitches closer to their throwing shoulder. And finally, the most obvious of all: It’s hard to throw a runner out when the ball hits the ground.

I also included the PITCHf/x pitch descriptions since they help improve the model slightly. Some descriptions occurred only a few times, so I combined them into larger categories:

  • Dirt: Ball in Dirt, Swinging Strike (Blocked)
  • Pitch-out: Pitch-out, Swinging Pitch-out
  • Strike/Ball: Ball, Called Strike,
  • Swinging Strike: Foul Tip, Missed Bunt, Swinging Strike

Below is a table detailing the SB success rates in each of the four groups. Dirt and Pitch-out are the most extreme categories, with “normal” pitches falling in-between. Something that jumped out at me was the lower success rate on swinging strikes, as I would expect this to distract the catcher. Two explanations I can come up with are: 1) catchers tend to hold the no-swing pitches a split second longer to get the call from the ump, or 2) swinging pitches occur during a hit and run play where runners tend to be less skilled at stealing bases.

Controlling for the lead runner’s base is the last addition I made to the original SRAA model. Adding this effect improved the model (AIC to be specific), indicating runners stealing third were more likely on average to be successful than runners attempting to steal second and especially home. A likely explanation is that runners stealing third need to be more confident in their ability to steal in the current situation and have a right-handed hitter obstructing the catchers throw about 65% of the time.

So now that we have this new metric pSRAA, lets take a look at how it deviates from SRAA. As you can see in the figure below, the distribution of both metrics are fairly similar.

pSRAA has a slightly tighter distribution for pitchers and runners, meaning pSRAA has absorbed some of the expected SB probability in these new variables and pushed pitcher and runner SB skills closer to the mean. This phenomenon occurs most likely because the variables we are trying to control for are largely out of control for these players and are not rectifiable or exploitable. By that, I mean pitchers can’t control whether the one pitch they throw in the dirt happens to coincide with a runner taking off, but catchers can use this event to prove their skill. While a pitcher “loses control” of the SB situation when the ball is released, a catcher can make a brilliant play, saving a potential wild pitch and converting it into an out. Thus, we see a wider variation in pSRAA for catchers, as pSRAA identifies the increasingly elite talent and the replacement players that struggle to nab runners on pitch-outs.

Examining how players’ metrics improved or worsened after controlling for these additional effects reveals some drastic changes, but mostly small adjustments. The figure below illustrates the change from the old metric to the new metric. The closer a player is to the dotted line (pSRAA = SRAA), the less that player deviated from the original SRAA measure. If a player ends up above this line, it means that pSRAA is higher than SRAA, so when controlling for pitches, pSRAA attributes more success (for runners — less success for pitchers and catchers) to their ability rather than luck.

How does this new pSRAA model help us as baseball fans or analysts? pSRAA can identify where SRAA was under or overvaluing players’ skills. For example, SRAA undervalues catcher Chris Iannetta at a 0.86% SRAA when pSRAA pegs him at whopping -4.19% (negative is good for catchers)!  In other words, Iannetta jumps from the 43rd percentile of catchers to the 70th percentile!

To give you an idea of the kind of adjustments pSRAA makes, here is a sample stolen-base attempt against Iannetta (video has no sound for those of you who are watching at work; for sound go to 1:51:40 here), specifically a SB attempt that the model predicts will happen 85.5% of the time. Actually, it is more like 88.4% if you account for the runner, Lorenzo Cain, the 15th-fastest baseball player according to Statcast’s speed measure.

Now let’s just freeze that frame. The ball is almost on the ground, and not to mention, only thrown at 80 mph, giving Cain almost an extra tenth of a second to get to second base. Regardless, Iannetta guns him out with an impeccable throw.

Not only can we use pSRAA to uncover insights such as above, but we can also abuse pSRAA to easily find awesome plays like this top 5 play. J.T. Realmuto, known for his unbelievable pop time, throws out Ben Revere on this gem of a play. The pSRAA model gives Realmuto a 10% chance of throwing out Ben Revere, but Realmuto pops up in a staggering 1.78 seconds (via Statcast) and throws a perfect 85mph toss to second.

Or this scenario, which had a 92% stolen-base probability. A.J. Pierzynski picks a throw off the ground, then navigates around Brandon Phillips to beat Suarez by a mile.

And finally, here is an example of a successful stolen base the model predicts will happen 15% of the time — not a surprise when you see where the pitch is thrown (actually 43% when you account for the speedy Rajai Davis and the way below average Kurt Suzuki).

pSRAA does well for these purposes, but may not illustrate the total value a player adds to his team’s success. A runner with a high pSRAA value with only a couple stolen-base attempts hasn’t added much value to his team since he didn’t utilize his skill often enough. We can leverage pSRAA and stolen base/caught stealing (CS) run values to come up with a more useful metric, which I have aptly named Pitch Adjusted Swipe Rate Runs Above Average (pSRrAA) —a mouthful, I know. I based pSRrAA upon linear-weights metrics like FanGraphs’ Weighted Stolen Base Runs (wSB). The term linear weights, often used in the world of baseball statistics, translates to the average run value of a certain action and its effect on run scoring over the course of an inning. For example, let’s say there is a man on first base with no outs. The average number of runs scored in an inning in 2016 starting with this exact situation is 0.8744 runs. He gets caught stealing, and now the situation is nobody on and 1 out. Starting in this situation, the run expectancy drops to 0.2737. Thus, the value of this specific play was about -0.6 runs. Examining these situations over the course of the whole season leaves us with average run values that we can assign to SB and CS. Combining the run values for SB (runSB = .2 runs) and CS (runCS = -.41 runs) produced by FanGraphs for the 2016 season, we can use pSRAA to attribute the run values more accurately:

pSRrAA = pSRRA x (runSB-runCS) x Attempts

This method for calculating pSRrAA works because of the following:
  1. pSRRA already determines the probability a certain player adds to a SB above average.
  2. If a player adds 10% probability to a SB, they are contributing runSB 10% more than the average player and runCS 10% less.
  3. pSRRA x (runSB-runCS) quantifies the average attempt value, so then we just multiply by attempts to get a full run value over the course of the season.

Of course, as I alluded to in the beginning, pSRAA doesn’t account for all types of stolen bases, only ones with pitches involved. Consequently, pSRrAA doesn’t account for the total value runners and pitchers contribute to their teams because attempts are excluded in which catcher isn’t involved. Finally, to take a look at the top 10 and bottom 10 performers for each position according to pSRrAA, see my original article here. And as always, you can find the code associated with pSRAA/pSRrAA and the analysis on my GitHub page here. Checkout my new Facebook page to stay up to date on new articles.

A previous version of this article was published at

Newcomers Find Their Way at Home

The Boston Red Sox have been tightly related with highly-touted prospects during the past months and even years. Taking a quick look at’s Top 100 Prospects rankings from 2015 to 2017, we find two names come up fairly consistently. Those belong to infielders Yoan Moncada and Rafael Devers. The former entered the 2015 ranks as “the best teenage prospect to come out of Cuba since Jorge Soler in 2011” and signed with Boston for $31.5 million, which smashed the biggest amount to date registered by the Reds’ signing of Aroldis Chapman for $16.25 million. While Devers’ price ($1.5 million) was nothing close to Moncada’s, he was also praised as “the best left-handed bat on the 2013 international market.”

Multiple names from the 2015 class of prospects have already seen large major-league play time (Byron Buxton, Corey Seager, Joey Gallo and Aaron Judge), and the time has come for Moncada and Devers to start writing their full-time MLB stories. In the case of Moncada, Boston opted to trade him to the White Sox for Chris Sale during the past off-season while keeping Devers in town. Anyway, and as things have turned out, both have practically debuted in parallel during this season for their franchises, being called up for quite different reasons. In the midst of a complete rebuild, Chicago will count on Moncada to take on the third-base position from now on. Boston, on the other hand, wanted to improve their infield a hair and seem to have opted for Devers as an in-house solution to their woes.

As the date of the writing of this article, this is, Tuesday, July 25 (better known as National Rafael Devers’ Day given his major-league debut with the Red Sox), Moncada will have the chance to play as much as 65 games and Devers 60. They will probably not reach those numbers — at least not Devers, knowing Boston’s contender status and probable use of platoon hitters during the rest of the season. Another fact of interest is that Yoan Moncada is 22 years old and Rafael Devers is just 20. So, those numbers will make for a baseline on what to look for during the rest of this article, which will focus on how call-ups perform in their debut seasons, both home and away.

Prospects made huge jumps just going from the minors to the majors, change cities and clubhouses, meet new teammates, and much more, but you would guess that after settling in they’d produce more at home than far away from it. In order to actually know if this holds true, I ran a set of queries on to find out. I’ll be looking at rookie-season splits from 2000 to 2017 in which the players debuting were between 20 and 22 years of age (such as those of Moncada and Devers). A total of 87 players within those parameters have seen major-league action during the selected time span. So we’ll be working with 174 home/away splits in order to know if rookies of ages 20-22 have historically played better at home or away from it as we may expect.

First of all, I’ve looked at “playing time” stats, this is: games, games as a starter and plate appearances. As much as we could expect players to perform better at home than away over their first few games, we could expect teams to “protect” their rookies and deploy them more frequently at home than on the road. As it turns out, though, the statistics for the home and away splits are virtually the same for the three mentioned categories. First myth debunked.

Moving on to what really matters, production, we can try and see how well players have hit in their ballparks compared to other venues, and whether there are or not big differences in this aspect.

Subtle differences start to appear between the games played at home and those played away in terms of runs scored and hitting. There are no big differences between the splits, surely, but it seems that home performances have edged away ones by a hair during the past 17 years on average. The biggest different in any of the studied statistics comes in both the doubles and home-run categories at 0.3 points each in favour of the home split.

Another interesting set of statistics to look at are those related with base-stealing. By logic, players would be expected to feel more comfortable, confident and willing to steal bases at home rather than in other parks. Again, that preconception seems to be wrong. Between the 87 players studied, the average of steal attempts was higher away than at home, and even the success was five points higher when stealing in other ballparks rather than in their own one.

Finally, we must turn our attention to the game of percentages and look at the slash line of the analyzed players in terms of BA, OBP and SLG. On top of that, I included the average tOPS+ and sOPS+ values. The former of those last two is meant to represent the player’s OPS in the split relative to that player’s total OPS during the full season (not accounting for the home/away split), with a value greater than 100 indicating that he did better than usual in the split. The second one is the OPS in the split relative to the league’s split OPS (again, a value greater than 100 indicates the player did better than the league in this split).

And here is where our home/away splits, once for all, truly separate themselves. Not one, not two, not three, but every percentage value posted at home by the average 20-to-22 year-old rookie from 2000 to 2017 has been better than the number registered far from it, and not by little. The difference in BA is of 15 points, in OBP of 23, in SLG of 23, in tOPS+ of 13 and in sOPS+ of 4. That yields an average difference of 20.3 points in the slash line and of 8.5 in the OPS+ metrics, which is huge. It is interesting to see how the average rookie performance is under the league-average level (under 100 sOPS+) both at home and away, but how said average was able to put up much better numbers at home (106 tOPS+) than away (93 tOPS+).

Just in case the rest of the data didn’t make it clear, which it actually didn’t, this leaves no doubt or case for equity open. After all, rookies probably prefer to play at home, sweet home.

But now that we know that newcomers not older than 22 years when they play their first major-league games tend to perform better at home, it is just a thing of curiosity to explore some of the unique cases that have occurred during the past 17 seasons to the 88 players of our study. We have been looking at the average rookie during the past few paragraphs, but as expected, each case is unique in itself and would make for a complete study on its own. Next is a table containing the rookies with a 45+ point differential in tOPS+ (with at least 60 games played), so we can measure how different their production was at home and on the road. Players are ordered by the absolute difference, with negative values meaning their production away was better than that at their home ballpark.

As it turns out, only 16 of 72 players had differences of 45+ points in tOPS+ between their games at home and those played away. Of those 16, though, seven were better far from their team’s stadium, something not really expected, much less in the case of Stanton and his minus-94 differential.

Just for fun, let’s look at Giancarlo’s case, whose split numbers are radically different while having played almost the same amount of games home and away during his rookie season. In 180 PA at home he hit 29 balls, including 7 home runs, for a BA/OBP/SLG line of .182/.272/.599 and 52 total bases. In 216 PA away he hit 64 balls with 15 home runs, posting a .320/.370/1.020 slash line and getting 130 total bases. What could be seen as a terrible entry year by looking at just the production at home (league-relative sOPS+ of 60) turns into a monster season while considering what Stanton was able to do outside of Miami (183 sOPS+). Something similar happened to Jay Bruce, Logan Morrison or more recently Miguel Sano, only in opposite venues.

As a final note, it can also be seen how only six of the 16 players in the table above had a big differential while debuting prior to 2010. The other 10 players made their debuts from 2010 on, which could mean that the trend is for rookies to have much more variable productions in different venues that the average historical newcomer.

We still don’t know how Moncada and Devers will perform during the rest of the season, but if that last supposition holds true, then White Sox and Red Sox fans just can hope for their players to at least do more damage at home than away, so they get to watch their jewels explode in front of their own eyes instead of between different ballparks around the nation.

A Surprisingly Close 18-4 Game

On July 19, 2017, the Colorado Rockies beat the San Diego Padres by a score of 18-4. Padres starter Clayton Richard left the game after 3 2/3 innings, having given up 14 hits and with his team down 11-0. After the game, Richard took responsibility for his rough outing, but also pointed out that the Rockies may have benefited from some luck. “It just seemed like mis-hit balls found the right spots,” said Richard. Let’s see if Richard is right; let’s try to eliminate the effects of luck and see how this game should have turned out.

Because the score of the game affects how teams play, I am only going to predict what the score should have been after four innings, at which point the Rockies had a 12-0 lead. In lopsided games, teams often rest their everyday players (as the Padres did with Wil Myers) and don’t bring in their top relievers (Kevin Quackenbush, who gave up six runs, relieved Richard with two outs in the 4th), so it would be unfair to use what happened after the 4th inning to estimate what the score of the game should have been.

I looked at Baseball Savant’s hit probability and expected wOBA (xwOBA) of every plate appearance in the first four innings of the game. These stats only consider a batted ball’s exit velocity and launch angle. Although I will generally refer to the difference between xwOBA and wOBA as luck, keep in mind that defensive positioning and defensive ability are also factors that can affect this difference (the Rockies are, in fact, an above-average defensive team, while the Padres are one of the worst in the National League). In the first four innings, the Padres had 16 hitters come up to the plate, and they averaged a .254 xwOBA, compared to an actual wOBA of .281, for a difference of .027 per hitter. I gave Manuel Margot’s first-inning plate appearance, in which he walked but was later picked off, an xwOBA and wOBA of 0. Meanwhile, the Rockies’ 29 hitters averaged an xwOBA of .420 and a wOBA of .664, for a difference of .244 per hitter. Two things are immediately clear. First, the Rockies certainly out-hit the Padres in the first four innings of the game. Second, as Richard noted, the Rockies’ hitters benefited from a lot of luck.

First, I will calculate the number of runs each team would have had through four innings if their wOBA was exactly their xwOBA (this estimate will be a little low for both teams, as xwOBA does not take into account that the game was played at Coors Field). To do this, I will find their weighted runs above average (wRAA), and then add that to four times the average number of runs per inning in the National League.


wRAA = ((wOBA – league wOBA) / wOBA scale) x PA

league wOBA = .320

wOBA scale = 1.25


When calculating wRAA, we run into a problem: we can’t use the actual number of PAs each team had because this number depends on the number of baserunners they had, which should change when we convert wOBA to xwOBA.  To come up with an expected number of baserunners, I added the hit probability of all balls put in play and added 1.000 for each walk and hit-by-pitch (with the exception of Margot’s 1st-inning walk). Strikeouts, as you might expect, were worth 0 points. The Padres had 3.24 expected baserunners (.203 xOBP) while the Rockies had 11.70 (.404 xOBP). With a .203 OBP, it would take roughly 15 hitters to get through four innings (15 x .203 = 3.045 baserunners; 15 hitters – 3 baserunners = 12 outs). With a .404 OBP, it would take roughly 20 hitters to get through four innings (20 x .404 = 8.08 baserunners, 20 hitters – 8 baserunners = 12 outs). Therefore, we use 15 PAs for the Padres and 20 PAs for the Rockies (notice that reducing the number of hitters doesn’t ignore what happened to the Padres’ last hitter or the Rockies’ last nine, as I use the average xwOBA of all the hitters that came up and simply apply that to a smaller sample).

The Padres’ expected wRAA through four innings is then -.79 while that of the Rockies is 1.60. The National League averages .5533 runs per inning, which comes out to 2.21 runs per four innings. Add each team’s wRAA to this number and a reasonable score of this game through four innings would be 1.42 to 3.81 in favor of the Rockies. It is still the Rockies’ lead, but nowhere near the 12-run difference that actually took place.

Of course, we know that luck and defense do exist. Let’s say that in one of the oddest trades in MLB history, the Padres and the Rockies decided to swap their luck and their defenses before the game. I will add to the Padres’ xwOBA the difference between the Rockies’ xwOBA and wOBA and vice versa (I will call this new number “swapped wOBA”). I will do the same with the teams’ xOBP and OBP to determine the number of hitters that would have come up through four innings in this scenario.  Here’s a chart summarizing all the numbers:


Padres Rockies
xwOBA 0.254 0.420
wOBA 0.281 0.664
wOBA – xwOBA 0.027 0.244
swapped wOBA 0.498 0.447
xOBP 0.203 0.404
OBP 0.250 0.586
OBP – xOBP 0.047 0.182
swapped OBP 0.385 0.451
PA 19 22


Using the same process as before, we use the teams’ swapped wOBA to calculate their wRAA through four innings and add 2.21 to each. With the Rockies’ luck, the Padres would have been expected to score 4.92 runs (2.71 wRAA + 2.21) through four innings. Meanwhile, with the Padres’ luck, the Rockies would have been expected to score 4.45 runs (2.24 wRAA + 2.21) through four innings. Not only was the game not as lopsided as it appeared, but with the teams’ luck and defense swapped, the Padres would have held the lead (if you round to the nearest whole number) through four innings. That is a 13-run difference solely due to luck and defense!

Now, there is a slight issue with the calculation I performed above. I took data from only 16 Padres hitters and then applied it to 19, assuming the extra three performed at the same level as the first 16. To fix this, we can look instead at the Padres’ expected run value for only the first 16 hitters. We end up with a wRAA of 2.28. Using their swapped OBP of .385, roughly six hitters would have reached base, meaning that these 16 hitters would have come up in 3 1/3 innings. So through only 3 1/3 innings, the Padres would have had basically the same wRAA as the Rockies would have had through four. This is amazing. If only the Padres were given the luck that the Rockies received on this day, they would have at least been tied through four innings, a far cry from the 12-run deficit they unfortunately had to face.

What Went Wrong With Chihiro Kaneko

In the 2014 offseason, many free agents changed teams, some even changed leagues. Hiroki Kuroda went back to Japan to pitch for his hometown team, the Hiroshima Toyo Carp, while the Yankees got an upgrade (when healthy) in Masahiro Tanaka on a seven-year, $155-million deal (with a $20-million posting fee that they spent to talk to him), which he can opt out of after this season.

There was a second pitcher who was almost as good as Tanaka, who had worse stuff but excellent command. He also had some injury concerns after his 2011 injury where he missed a few starts, and in 2012 where only pitched nine starts, albeit with 63 1/3 IP in those starts though. Heading into the 2014 offseason, he had two excellent seasons, with ERAs of around 2 in 2013 and 2014, pitching 223 1/3 IP, with 200 strikeouts and 58 walks allowed, then 191 IP with 199 K and only 42 BB respectively in those seasons. He had a 1.98 ERA in those 191 innings in 2014, and a 2.01 ERA in 2013, generating interest from big-league teams and making an appearance in Bradley Woodrum’s article as a pitcher of note that might come over. He ultimately re-signed with the Orix Buffaloes on a four-year deal.

The injury bug bit him again in 2015 as he pitched in 16 starts, throwing 93 IP, and he had a lower strikeout rate than he had in 2013 and 2014 (7.6 K/9) with an ERA of 3.19. He pitched in 2016 and had a mostly healthy season, save for a declining strikeout rate (6.9 K/9) and an increased walk rate (3.3 BB/9), with an ERA of 3.83 in 162 IP. This year his strikeouts (5.7 K/9) and walks (3.0 BB/9) have stayed bad, with a slightly better 3.57 ERA in 116 IP.

What has caused this drastic downturn in performance? It seems that some of his downturn is because he’s getting older, but that doesn’t explain his increased walk rate or his severe decrease in strikeouts. Most of this is likely due to injuries he sustained in the 2015 season. And given that he hasn’t gotten better, it seems as if he’s been pitching despite an injury which has been sapping his effectiveness. He went from being as good as Alex Cobb was in 2014 (considering the thought of the average active hitter in Japan being slightly better than AAA quality) to performing like Ervin Santana this year.

He was a great pitcher with some downside, like Jered Weaver was, but Kaneko hasn’t declined that far yet. Weaver is too bad to even be on an MLB team until he gets medical help to fix his hip and/or shoulder. Weaver is one of the other pitchers who had declined that quickly. So far, he hasn’t rebounded and has continued to get worse, worse than he was last year when he was the second-worst pitcher qualified for the ERA title. It appears that Weaver is virtually unfixable. I think that Kaneko’s issues can be fixed, though, and if they are fixed, he could be an interesting buy-low opportunity.

After the 2014 season, if I were Dayton Moore (armchair GM ideas away), I would’ve signed him to a three-year, $30-million deal with lots of incentives, which could’ve raised the value to $51 million if all were reached. And I think he would’ve done quite well; we might not have this article at all. I must digress, as what-ifs are all around us. (Look at Yordano Ventura, who died far too young with so much untapped potential left.)

He looks like a potential project for the Pirates if he can show signs of improvement in his performance and peripheral stats. The Pirates and Ray Searage could definitely turn Kaneko into something of value, like they did with A.J. Burnett, Edinson Volquez, JA Happ, Ivan Nova, Juan Nicasio, Joel Hanrahan, Mark Melancon, Tony Watson and more. There’s a good amount of upside in trying for this — some prospects that can help the team in the future.

Here is a link to his player page so you can see it for yourself and make your own conclusions about him, and what he can do to remedy himself.

I don’t own any stats used; all stats are from either FanGraphs or the NPB website linked above.