Prospect Watch: 5 Future All-Stars No One Is Talking About

I chose to stick with hitters in this article, because pitching prospects are extremely difficult to predict, and I think the pitchers who do get the hype are typically deserving. However, I do see a trend of some unnoticed hitting prospects turning out great careers in the majors. Let’s get right to it.

1. Travis Demeritte – 2B – ATL

In 2016, Demeritte went from the Rangers’ to the Braves’ system and spent the entire year in high-A ball, where he dominated at the plate. A 2B with power like Cano, good speed and the ability to get on base is such a rarity.

In my opinion, Demeritte has the highest chance of being a perennial All-Star out of these five prospects. The middle infield in Atlanta has an extremely bright future. I’m predicting that Demeritte will make his splash in 2018, and make his first ASG appearance by 2020 (age 25). Let’s look at his numbers from a season ago:


Travis Demeritte 21 145 547 635 145 33 13 32 78 200 20 4 12.3% 31.5% 0.905 0.283 0.393 139

Let’s compare these to the four All-Star 2B in 2016 and Brian Dozier.

Jose Altuve 161 640 717 216 42 5 24 60 70 30 10 8.4% 9.8% 0.928 0.194 0.391 150
Robinson Cano 161 655 715 195 33 2 39 47 100 0 1 6.6% 14.0% 0.882 0.235 0.37 138
Brian Dozier 155 615 691 165 35 5 42 61 138 18 2 8.8% 20.0% 0.886 0.278 0.37 132
Dustin Pedroia 154 633 698 201 36 1 15 61 73 7 4 8.7% 10.5% 0.825 0.131 0.358 120
Ian Kinsler 153 618 679 178 29 4 28 45 115 14 6 6.6% 16.9% 0.831 0.196 0.356 123

Some things to keep in mind as we compare these players: Demeritte was playing in A+ ball, but he did play an average of 12 less games than these major-leaguers. As you can see, it’s basically a two-man race (other than Dozier’s 42 HRs) between Altuve and Demeritte here. While we cannot expect these A+ ball numbers to translate directly against ML pitching, Demeritte definitely deserves more attention in top-prospect lists. While he’s not quite as speedy as Altuve, he has more power, and he walks at a far higher rate. The one glaring weakness is the K numbers for Demeritte. However, some of the top players in the league K at very high rates. As long as the OPS stays high, it doesn’t really matter how a guy makes outs anymore.

I should note that 2016 was a breakout year for Demeritte; in years past he didn’t quite live up to his potential, and also served an 80-game PED suspension. These could be the main reasons why he hasn’t garnered much attention yet. He still has to prove himself to most. However, I’m sold. I’d pencil him in for the majority of the 2020s’ ASGs right now.


2. Ramon Laureano – OF – HOU

Laureano has all the tools: he can play any OF spot well, he has speed and pop, and he gets on base. Houston’s farm has taken a bit of a hit due to some trades in the last two years, but that’s because they knew they had guys like Laureano who don’t have super high trade value, but have a chance to be great ML players like the guys they traded. Let’s look at Laureano’s 2016 numbers.

Ramon Laureano 21 128 461 555 146 32 9 15 73 128 48 15 13.2% 23.1% 0.943 0.206 0.418 159

The numbers speak for themselves. This is the making of a star; where is the hype? I know it’s not a huge sample size, and we don’t have much to go off from the previous year either, but in A+ and AA last year he put up those phenomenal numbers you see above.

If those aren’t All-Star numbers, then I don’t know what are. Laureano’s ability to play all three OF spots will keep him in the lineup everyday and help his chances of making it to the ASG. When he does get the call-up, if his numbers stay relatively close to this, there’s no way he doesn’t make three to four All-Star Games. As of now, he’s more of a speed threat, but as he develops, the speed/power combo will even out and he will be an Andrew McCutchen-type player. Keep tabs on this guy.


3. Christin Stewart – OF – DET

While researching Stewart, I couldn’t find an article more recent than September of 2015. There’s no one talking about him…why? As we know, Detroit is aging and looking to deal top players. So, I’m assuming we will be seeing a lot of opportunities for young guys to step up and prove themselves. Detroit’s system isn’t super deep, but that could change anytime if they do decide to move some key pieces. Regardless, I see Stewart as the prospect to watch moving forward; he has the tools to be an All-Star. Let’s check out his numbers from 2016.

Christin Stewart 22 147 514 622 132 29 2 31 93 154 4 2 15.0% 24.8% 0.883 0.245 0.407 156

The power is impressive, and by this chart he looks even a bit better than the two previous guys I mentioned. However, with the K numbers pretty high up there, and not a whole lot of speed, Stewart is a player that could fall into slumps. Often times, adjusting to the majors can be challenging, and some top prospects never quite figure it out. While Stewart’s MiLB numbers are pretty insane, his slump potential makes him a pretty risky pick here. However, I do believe that if he does indeed figure it out, he will make it to a few ASG and serve as an everyday player in this league for a decade. HRs and BBs get it done. Keep an eye on Stewart.


4. Jason Martin – OF – HOU

Another Houston OF prospect…another future All-Star? I think so. The future is certainly bright over at Minute Maid Park: Altuve is a cornerstone, Correa is a centerpiece, Springer is a baller, and they have prospects for days. If they can just figure out how to pitch, they could be a WS contender for the next eight years.

Why Martin, though? Let’s check out his 2016 numbers from high-A ball.

Jason Martin 20 121 431 502 114 25 7 23 63 112 22 12 12.5% 22.3% 0.874 0.251 0.382 131

Impressive, to say the least. At just 20 years old, he pumped out 23 homers in 121 games. He walks every eight at-bats, and he also grabbed 22 bags on the season. The ability to walk and run (lol) will typically keep guys out of major slumps. While Martin is not a highly-touted prospect at this point, I think he will be a household name by 2022. I expect him to get the call-up in 2019 and play a significant role during a pennant race that year. In 2020, he will burst onto the scene and prove his worth to this franchise.

With Houston’s current build, this might be a guy we see dealt if they are trying to add talent at the deadline this year. That doesn’t change my prediction, however. I see Martin suiting up for the ASG a few times throughout his career. Stay posted.


5. Tom Murphy – C – COL

You can’t keep putting Yadier Molina in there every year. And with Buster Posey most likely making that change to 1B full-time within three years, Jonathan Lucroy getting dealt to the AL, Kyle Schwarber playing OF, etc, pathways for guys like Tommy Murphy open up. Making the All-Star Game as a C is not saying as much as other positions, in my opinion. A decent hot streak in the first half will inflate your hitting numbers. For example, Derek Norris in 2014. It may seem like he was the best catcher in the league at the halfway point, but, as usual, it evened out by season’s end.

With that being said, Murphy has proven he has pop, and playing in Colorado is a huge advantage for him. While I don’t think he will be a Hall-of-Fame catcher, I do think he’s flying under the radar right now and will probably open some eyes in 2017. I’d say he makes two appearances in the ASG before 2022. However, once he gets up near 30 and he’s no longer playing in Colorado, I think he will have trouble keeping a job.

I have him on the list, first of all, because he meets the criteria, and also because I think people should pay attention to him, and lastly because he’s ML-ready, unlike the rest of these guys. Trevor Story didn’t have a whole lot of hype; most people didn’t expect him to make the team out of spring, but with the Jose Reyes situation, the kid got a shot and as we all know, he ran with it. I’m not saying Murphy will make a cannonball-esque splash like Story, but I think he will turn some heads and maybe even get some ASG votes this year. Anything can happen, especially in Colorado. Keep tabs on him.

Honorable Mentions

Dylan Cozens – OF – PHI

There’s not a lot of buzz surrounding Cozens, which is surprising to me, because usually when we see 40 HR in 134 games, we really perk up. In his age-22 season, he played all 134 games at the AA level for the Phillies affiliate, Reading Fightin’ Phils, a place where most Phillies prospects prosper. The reason why Cozens doesn’t quite make the cut here is because of the words, “future All-Star.” He is one of those lefties that mash in the right ballpark and against RHP, but usually career platoon hitters, even if they are highly effective, don’t make the ASG.

Rhys Hoskins – 1B – PHI

Hoskins is another AA player in the Phillies system. He probably has a little bit more of a well-rounded hitting ability than does Cozens, but he’s a 1B, and that’s an overloaded position. You have to be incredible to crack that ASG squad, and I just don’t think Hoskins will ever be quite at that level. I do believe he will pan out to be an everyday guy for a good amount of time in this league. He has really good power and he gets on base, two things that will keep you in the lineup more often than not.

Bobby Bradley – 1B – CLE

Bradley is another guy I would keep an eye on; I’m just not sold on him yet. He has a a lot of raw power, but a really high K rate in the low levels of the minors. Also, he’s a 1B, so once again, really hard to make the ASG at that position.

James Paxton Is Going to Win the 2017 AL Cy Young

Mariners starter James Paxton is going to win the 2017 American League Cy Young award. You heard it here first.

In baseball, there is no better time of year to have bold, lofty, and irrational expectations than in spring training. But there are numbers to back up this claim, even though he is a 28-year-old who has never made more than 20 starts in a major-league season.

Here is why this is going to happen.

Paxton has always pitched at the level of a top-of-the-rotation starter

There has never been a question about his talent. Paxton debuted in September of 2013, and took the league by storm immediately, posting a 1.50 ERA over 24 innings in four starts. In 2014, his ERA was 3.04 in 74 innings. His worst season, 2015, still featured a decent 3.90 ERA in 13 starts. Not ace-like numbers, but numbers that would put him in the top two or three of most rotations in baseball.

Paxton’s ERA was similar in 2016 (3.79) to his 2015 number, but he made dramatic improvements.

Utilizing a new arm slot taught to him by Tacoma pitching coach Lance Painter, his average fastball velocity rose from 94.2 in 2015 to 96.8 in 2016 — an almost unprecedented gain for a starter. Paxton gained newfound command with his new arm slot, walking just 1.8 batters per nine innings, one walk fewer than his already-good career mark of 2.8.

Digging a little deeper into advanced stats, Paxton’s numbers are similar to the game’s elite. Looking at the FIP of pitchers who threw at least 250 innings from 2013-2016 (the four seasons Paxton has spent time in the majors), Paxton’s 3.32 is 25th in the league. Teammate Felix Hernandez No. 22 with a 3.27 FIP. The chart below shows where Paxton stands among other left-handed starters.

Paxton’s FIP over the past four seasons is eighth-best among major-league left-handers, and third-best among just the southpaws currently in the American League. That’s consistency.

Looking at 2016, Paxton’s 2.80 FIP ranked fourth-lowest in all of baseball among pitchers with at least 120 innings, and first in the American League. The next-closest American League pitcher, Corey Kluber, had a 3.26 FIP.

When Paxton is on the hill, he’s as good as just about anyone in the league. And his best numbers have come in his most recent season.

At 28, Paxton might still have room to improve. Paxton improved dramatically in 2016 in three major areas that he was already good at — strikeouts, limiting walks, and preventing home runs. In any case, Paxton’s ability to be a top-tier starter is obvious.

About that injured elephant in the room

It’s hard not to notice that Paxton has by far the fewest innings pitched among elite left-handers. It’s true, Paxton hasn’t been able to stay on the field. But his injury history doesn’t reveal the types of injuries one would expect to be recurring or career-derailing.

Paxton has been on the disabled list three times in his career, for a strained left oblique and shoulder inflammation in 2014, a strained tendon in his left middle finger in 2015, and for a sore pitching elbow after getting hit with a line drive in 2016. He also had start pushed back a day due to a torn fingernail.

This paints a picture of bad luck as much as being chronically injury-prone. Paxton has had trouble staying on the field, but it hasn’t been one faulty limb or ligament that just won’t get right. Perhaps he’ll suffer another major injury in 2017, but his injury history alone doesn’t include enough evidence to see it as an inevitability.

The 2017 AL Cy Young field isn’t that intimidating

Clayton Kershaw doesn’t pitch in the American League, so why can’t Paxton reach the summit of the junior circuit? The competition all have their own flaws.

2016 Cy Young winner, Boston’s Rick Porcello, is coming off the best season of his career by far. Not to mention, his teammates and fellow Cy Young contenders David Price and Chris Sale will take turns stealing the spotlight from one another.

It also remains to be seen how Sale adjusts to the right-handed-hitting haven of Fenway Park; teammate David Price saw his surface numbers suffer moving into the hitters’ paradise that is Fenway Park — his ERA ballooned to 3.99.

Among other contenders, Detroit’s Justin Verlander will be turning 34 and is coming off of his best season since 2013. It’s probably more likely that his current ability falls somewhere in between his very good 2014-15 and his Cy Young-caliber 2016.

The most credible threat to Paxton is Cleveland’s Corey Kluber, and he’s now on the wrong side of 30. Kluber also benefited from an above-average defense in 2016, while Paxton had one of the league’s worst defensive teams playing behind him.

As it stands, a thin field, as well as three top contenders’ home ballparks playing against them, gives a healthy Paxton as good of a chance as anyone.

Don’t forget the new outfield defense

Despite his outstanding FIP, Paxton’s ERA was a good-not-great 3.79, and his record was just 6-7. Certainly not Cy Young numbers.

But with a much-improved defense behind him, shaving a run off of his ERA isn’t unrealistic, and would likely increase his win and innings totals.

In 2016, the Mariners outfield defense was atrocious. Nori Aoki took the scenic route to every fly ball. Seth Smith and Nelson Cruz turned in defensive efforts that would be hard to call average in a slow-pitch softball league.

In The Fielding Bible’s defensive runs saved (DRS) stat, the Mariners 2016 outfield had a -27 DRS, making them better than just the Twins, Tigers, and Orioles.

Jarrod Dyson (+19 DRS), Mitch Haniger (+1) and a healthy Leonys Martin (-2) could help turn one of the worst outfields in baseball in 2016 into one of the very best. Paxton will certainly be one of many pitchers benefiting from a greater number of fly balls being turned into outs.

It’s also worth noting that the infield’s three worst gloves — Adam Lind (-2), Dae-Ho Lee (-3) and Ketel Marte (-2) — will be wearing different uniforms in 2017.

With the Mariners upgrading so many spots on defense, Paxton’s ERA should drop significantly. The difference between a 3.80 ERA and 2.80 ERA over 200 innings is 22 runs. If the defense saves him anywhere close that many runs, the additional wins would certainly follow.

Okay, so how does this make him a Cy Young contender?

Everything is in place for Paxton to take his rightful place in the upper echelon of major-league starters. He has the talent, and now a defense behind him that will help him cash in on his nearly limitless potential.

What he needs more than anything is a little good luck with the injury bug. Considering his luck over the past few years, he seems due for that. If that happens, American League hitters will certainly notice.

Paxton is one of the league’s five or 10 best pitchers. Pairing his ability with what should be one of the league’s best defenses should reduce his record and ERA to put him in a peer group with elite guys like Chris Sale, Corey Kluber, and Madison Bumgarner.

(I didn’t mention Clayton Kershaw because he is, of course, peerless.)

James Paxton will be your 2017 American League Cy Young award winner. See you at the award ceremony — or the loony bin.

Brett Miller does the agate page for the print edition of the Seattle Times. He is also a proud Washington State University alum, and good at drinking beer and taking criticism. Complain about this article directly to him at

xFantasy, Part IV: “Projecting” Breakouts and Busts in 2017

Back in December, I introduced “xFantasy” through a series of entries here at the FanGraphs Community blog. At its inception, xFantasy was a system based on xStats that integrated hitters’ xAVG, xOBP, and xISO in order to predict expected fantasy production (HR, R, RBI, SB, AVG). The underlying models are put together into an embedded “Triple Slash Converter” in Part 2. Part 3 compares the predictive value of xFantasy (and therefore xStats) vs. Steamer and historic stats, ultimately finding that for players under 26, xStats are indeed MORE predictive than Steamer!

To quote myself from the first piece, Andrew Perpetua over at the main blog has developed a great set of data using his binning strategy, which has been explained and updated this offseason, including some additional work since then to include park factors and weather factors. He produces xBABIP, xBACON, and xOBA numbers based on Statcast’s exit velocity/launch angle data, along with the resulting ‘expected’ versions of the typical slash-line stats, xAVG/xOBP/xSLG. Recently, Andrew has published a set of “2017 estimates” that takes the past two years of Statcast data and weights them appropriately to come up with the best estimate for a player’s xStats moving forward. After a bit of back and forth on Twitter with Andrew discussing how exactly these numbers get weighted, I think they are looking really good. I’m now adopting these numbers as the basis for xFantasy from this point on.

There are a few key takeaways from xFantasy so far that will tell us where to go next:

  1. xFantasy is not *truly* a projection. We don’t have minor-league data. We don’t have data from before 2015. At this point, xFantasy for 2017 is a weighted average of player performance from 2015-2016, so keep in mind that things like injuries or down years might have tanked a player’s xStats.
  2. More data is always better than less data. Steamer projections do a better job with established players than xFantasy does, likely due to having more info about past performance.
  3. Players under 26 have short track records, and xFantasy beats Steamer in projecting them going forward! For young players, or players that have undergone some significant, recent transformation at the MLB level, xFantasy could give us better info than traditional projections.

So what’s it mean? At this time, I will echo Andrew’s repeated recommendations that you should *not* use xFantasy as your projection system of choice in 2017. On average, Steamer will do better (at least for now…I think 2017 could be the year where we finally have enough Statcast data to put up a challenge). But xFantasy could be very useful in helping you to identify players (on a case-by-case basis) with short track records that might deserve a bump up or down from the projections spit out by the traditional systems.

For now, I’ve identified 10 (five up, five down) hitters aged 26 and under heading into 2017 that might deserve a second look based on xFantasy. Included below is each player’s xFantasy line and Steamer-projected 2017 line, both scaled to 600 PA, along with the 5×5 $ values, and at the far right, the difference between the two.

While the Billy Butler/Danny Valencia debacle was definitely the most interesting thing going on with the A’s late in 2016, Ryon Healy was a pretty good story himself. He came seemingly out of nowhere to hit over .300 with 13 HR in 283 second-half PAs, playing his way into a spot as the everyday 3B and likely No. 3 hitter for the 2017 A’s. xStats says you should believe it, with a .324 xAVG and 30 xHR. Steamer hasn’t bought into the average/power yet, but the relatively low ~20% K rate looks real.

Trevor Story was the best player in baseball for a couple of weeks this past year, and it seems to me that the late-season injury has made people forget that. xFantasy didn’t forget, though, and even with the huge K-rate, is seeing a .281 xAVG with 39 HR and 12 SB. Based on this line, I’m slotting Story comfortably into the same tier of SS’s as Correa, Seager, and Lindor for 2017. Downgrade in weekly H2H leagues where the away games can kill him a bit.

Gary Sanchez and Trea Turner have been well covered by Andrew here and here. I’ll just add that even though both are expected to regress from their lofty 2016 performances, xFantasy backs up the idea that they’ll both still be among the best players in baseball. Steamer is missing the boat on both guys.

I personally had a love/hate relationship with Tyler Naquin in 2016, who bounced on and off my roster in the “Beat Paul Sporer” NFBC league and always seemed to hit well when he was on the wire, and never when he was on my team. He’s been a trendy topic this offseason among people still using “Sabermetrics 1.0” to point at his BABIP and say he’ll be terrible in ’17. Statcast says he actually hit well enough to earn a .370 BABIP! Combine that with what seems to be a developing power profile and something like 15 SBs and you’ll have a nice little player for your fantasy squad. Just hope Cleveland plays him!

On the downside, we have quite a few players that have been trendy ‘sleeper’ picks in the lead-up to 2017 drafts so far. Javier Baez, even if he manages to find playing time in a crowded Cubs infield, just hasn’t hit the ball well enough to overcome the poor plate discipline. Mitch Haniger hit .229 in limited time (123 PA) but Statcast says he hit even worse than that — let’s hope it’s just a sample-size thing, because a .213 xAVG won’t cut it if you’re only getting 20 HR from him.

Yasiel Puig has been in the major leagues longer than many of these guys, so at this point maybe we should just believe Steamer, but I figured it would be worth including him here because it’s an interesting case to study. He hit .255 and .263 in 2015 and 2016 respectively, and that wasn’t bad luck according to Statcast, with a .249 xAVG in that time. Steamer still buys a bounceback to his pre-2015 ways with a .284 projection. I’m actually leaning toward Steamer here, because I believe that Puig’s stats have been heavily influenced by his various leg injuries over the past two years. Maybe I should see repeated injuries and use that to project future injuries, but in this case I’m going to give a 26-year-old the benefit of the doubt and say that a healthy Puig should match this Steamer projection in 2017.

Two more 24-year-olds close us out:  Max Kepler was very, very good in July and very, very bad after that, en route to an xFantasy line that doesn’t believe in the power, and *does* believe in the very poor BABIP and AVG. Staying away from that garbage pile, and moving on to another…A.J. Reed! He was supposed to be the chosen one last year, and instead he gave us his best 2014 Melvin Upton impression…without the speed. His playing-time picture is even more unclear than Baez’s, and even if he plays, Statcast tells me he has some work to do.

And finally, for an honorable mention of a player that’s new on the scene, but too old to qualify, I have to bring up Ryan Schimpf:


Next time…

I closed out Part 3 by promising xFantasy for pitchers was coming, and it is! Using a model based on scFIP, xOBA, and xBACON, xFantasy for pitchers v1.0 now exists. There’s still work to be done in order to determine how useful it actually is, though!

As I said last time, it’s been fun doing this exploration of rudimentary projections using xFantasy and xStats. Hopefully others find it interesting; hit me up in the comments and let me know anything you might have noticed, or if you have any suggestions.

The Least Interesting Player of 2016

Baseball is great! We all love baseball. That’s why we’re here. We love everything about it, but we especially love the players who stick out. You know, the ones who’ve done something we’ve never seen before, or the ones that make us think, “Wow, I didn’t know that could happen.” It’s fun to look at players who are especially good — or, let’s face it, especially bad — at some aspect of this game. They’re the most interesting part of this game we love.

But not everyone can be interesting. Some players are just plain uninteresting! Like this guy.

OMG taking a pitch? That’s boring. You’re boring everybody. Quit boring everyone!

You caught a routine fly ball? YAWN! Wake me when something interesting happens.

But it’s hopeless; nothing interesting will ever happen with Stephen Piscotty. I’m sure the two GIFs above have convinced you that he was the least interesting player in baseball last year. But, on the off-chance that you have some lingering doubts, we can quantify it. I’ve made a custom leaderboard of various statistics for all qualified batters in 2016. For each of these statistics, I computed the z-score and the square of the z-score. In this way, we can boil down how interesting each player was to one number — the sum of the squared z-scores. The idea is that if a player was interesting in even one of these statistics, they’d have a high number there. Here are the results:

Click through for an interactive version

I don’t need to tell you who the guy on the far right is. On the flip side, though, there are two data points on the left that stick out. The slightly higher of the two is Marcell Ozuna, with an interest score of 1.627. The one on the very far left is Stephen Piscotty, with an interest score of 0.997. That’s right — if you sum the squares of his z-scores, you don’t even get to 1! This is as boring and average as baseball players get.

Where the real fun begins, though, is when you start making scatter plots of these statistics against each other. I’ve made an interactive version where you can play around with making these yourself, but here are a few highlights:



ISO vs. wRC+

Pretty boring, right? But wait, there’s more! Let’s investigate a little further what went into his interest score. Remember how we summed his squared z-scores and got a value below 1? Well, let’s look at the individual components that went into that sum.

The Most Boring Table Ever
Statistic Squared z-score
LD% 0.108
GB% 0.002
PA 0.296
G 0.220
OPS 0.001
BB% 0.057
SLG 4.888e-05
WAR 0.007
BABIP 0.141
K% 0.103
IFFB% 0.0004
ISO 5.313e-05
FB% 0.007
wOBA 0.022
AVG 1.69e-29
wRC+ 0.025
OBP 0.006

Yes, you’re reading that right — where he stood out the most was in games played and plate appearances. Yay, we got to see that much more boring! Also, I think it is especially apt that his AVG was EXACTLY league average.

All right, time to step back and be serious for a second. As Brian Kenny is always reminding us, there is great value in being a league-average hitter. Piscotty was worth 2.8 WAR last year, just his second year in the league. He’s already a very valuable contributor to a very good team. Maybe it’s time we started noticing guys who do everything just as well as everyone else, and value their contributions too?

(Nah, I’m going to go back and pore over Barry Bonds’s early-2000s stats for the next few hours.)

All the code used to generate the data and visualizations for this post can be found on my GitHub.

Which MLB Hitters Have Gotten Off the Ground?

Following up on excellent recent pieces by Travis Sawchik and Jeff Sullivan, I had a hypothesis: If there is truly a swing-path revolution underway in MLB, perhaps the best hitters by wOBA and wRC+ showed more marked FB+LD%’s (Air%) tendencies in 2015-2016 than in years past? If not them, then perhaps there is a trend among the middle and/or lower classes of hitters?

The hypothesis was wrong, but the investigation still gave some interesting context to the 2016 power spike and the profiles of recent successful/unsuccessful MLB hitters in general.

Here’s a plot of the average FB%+LD% (Air%) for each year, 2009-2016, for all qualifying MLB hitters per FanGraphs leaderboards, divided into three roughly even buckets of 40-50 players by wRC+ (<100wRC+ left, 100-120wRC+ center, >120 wRC+ right):

Here’s a plot of the average FB%+LD% (Air%) for each year, 2009-2016, for all qualifying MLB hitters per FanGraphs leaderboards, divided into three roughly even buckets of 40-50 players by wOBA ( <.320 left, .320-.350 center, >.350 right):

The consistency of these numbers is remarkable. The writing has been on the wall for some time with regards to the benefits of hitting it in the air.

Perhaps plenty of hitters are (and always have been) trying to hit it in the air more often and are either failing to make the change stick, or not finding success quickly enough to stick with the change / stay in the league?

We aren’t seeing across-the-board nor player-class-specific changes that stand out beyond random variation by this method (yet).

There could be an equilibrium point here where given the best pools of pitching and hitting talent available (regardless of how they arrived at said status), the outcomes will be pretty similar at a macro level, save for major fundamental changes to how the game is played.

This does not mean that individual players cannot aspire to find more optimal approaches. Surely there have always been hitters finding success via these means, and only recently have we been focusing on batted-ball data and focusing on these traits of the transformations.

Preach on, Josh Donaldson: Ground balls? They call those outs up here.

Hardball Retrospective – What Might Have Been – The “Original” 1993 Angels

In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition.  Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.

Expanding on my research for the book, the following series of articles will reveal the teams with the biggest single-season difference in the WAR and Win Shares for the “Original” vs. “Actual” rosters for every Major League organization. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at

Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.


OWAR – Wins Above Replacement for players on “original” teams

OWS – Win Shares for players on “original” teams

OPW% – Pythagorean Won-Loss record for the “original” teams

AWAR – Wins Above Replacement for players on “actual” teams

AWS – Win Shares for players on “actual” teams

APW% – Pythagorean Won-Loss record for the “actual” teams


The 1993 California Angels 

OWAR: 39.3     OWS: 277     OPW%: .533     (86-76)

AWAR: 27.8      AWS: 212     APW%: .438     (71-91)

WARdiff: 11.5                        WSdiff: 65  

The “Original” 1993 Angels placed runner-up to the Rangers for the division title, yet the ball club held a fifteen-game advantage over the “Actual” Halos. Tim Salmon garnered 1993 AL Rookie of the Year honors with a .283 BA, 31 dingers, 95 ribbies and 93 runs. Devon White collected his fifth Gold Glove Award and posted career-bests with 42 doubles and 116 runs scored. “Devo” successfully swiped 34 bags in 38 attempts. Dante Bichette provided a .310 BA while clubbing 43 two-base hits and launching 21 moon-shots. Wally Joyner aka “Wally World” contributed 36 doubles along with a .292 BA. Chad Curtis tallied 94 runs and pilfered 48 bases in his sophomore season. Brian Harper (.304/12/73), Mark T. McLemore (.284/4/72) and Paul Sorrento (.257/18/65) augmented the Angels’ attack.

Wally Joyner ranked thirty-seventh among first basemen according to “The New Bill James Historical Baseball Abstract” top 100 player rankings. “Original” Angels registered in the “NBJHBA” top 100 ratings include Dickie Thon (57th-SS), Tim Salmon (72nd-RF), Devon White (81st-CF), Tom Brunansky (85th-RF), Dante Bichette (90th-RF) and Brian Harper (99th-C). Furthermore, the list includes Gary Gaetti (34th-3B) and Chili Davis (64th-RF) from the “Actual” Angels ’93 roster.

Original 1993 Angels                                      Actual 1993 Angels

Chad Curtis LF/CF 2.16 16.51 Luis Polonia LF -0.17 10
Devon White CF 4.47 21.28 Chad Curtis CF 2.16 16.51
Tim Salmon RF 4.36 24.61 Tim Salmon RF 4.36 24.61
Dante Bichette DH/RF 1.71 19.35 Chili Davis DH 0.33 11.91
Wally Joyner 1B 3.14 18.09 J. T. Snow 1B 0.66 10.09
Mark McLemore 2B/RF 2.19 13.37 Damion Easley 2B 1.15 8.38
Gary Disarcina SS -1.15 5.73 Rene Gonzales 3B 0.29 7.04
Damion Easley 3B/2B 1.15 8.38 Gary Disarcina SS -1.15 5.73
Brian Harper C 1.27 15.66 Greg Myers C 0.59 4.26
Paul Sorrento 1B 1.03 13.23 Torey Lovullo 2B 0.39 7.35
Erik Pappas C 1 8.23 Stan Javier LF 1.17 7.1
Dickie Thon SS 0.02 4.88 Eduardo Perez 3B -0.21 3.25
Eduardo Perez 3B -0.21 3.25 Rod Correia SS -0.15 2.84
Dick Schofield SS -0.15 2.43 Chris Turner C 0.6 2.25
Ruben Amaro CF 0.44 2.29 Kelly Gruber 3B 0.2 2.19
Chris Turner C 0.6 2.25 Kurt Stillwell 2B -0.19 1.33
Tom Brunansky RF -0.6 1.56 Ron Tingley C -0.47 1.24
Doug Jennings 1B 0.17 1.46 John Orton C 0.05 1.03
John Orton C 0.05 1.03 Jim Edmonds RF -0.13 0.78
J. R. Phillips 1B 0.17 0.87 Ty Van Burkleo 1B -0.03 0.5
Jim Edmonds RF -0.13 0.78 Jim Walewander SS 0.04 0.41
Larry Gonzales C 0.06 0.24 Larry Gonzales C 0.06 0.24
Jeff Manto 3B -0.23 0.09 Gary Gaetti 3B -0.39 0.12
Gus Polidor 3B -0.04 0.02 Jerome Walton DH -0.03 0.06

Chuck Finley (16-14, 3.15) whiffed 187 batsmen and paced the Junior Circuit in complete games with 13. The Halos compensated for a pedestrian rotation with a stellar bullpen consisting of Bryan Harvey (1.70, 45 SV), Roberto Hernandez (2.29, 38 SV) and Alan Mills (5-4, 3.23). Mark Langston (16-11, 3.20) topped the “Actuals” in strikeouts (196) and innings pitched (256.1) while earning his fourth All-Star invitation.

  Original 1993 Angels                              Actual 1993 Angels 

Chuck Finley SP 4.9 18.94 Mark Langston SP 6.16 20.37
Jim Abbott SP 1.34 9.75 Chuck Finley SP 4.9 18.94
Frank Tanana SP 1.03 7.07 Scott Sanderson SP 0.65 5.75
Phil Leftwich SP 1.5 5.13 Phil Leftwich SP 1.5 5.13
Kirk McCaskill SP -0.43 2.35 Joe Magrane SP 0.26 2.58
Bryan Harvey RP 3.46 17.47 Joe Grahe RP 0.86 7.28
Roberto Hernandez RP 2.49 15.5 Steve Frey RP 0.67 6.92
Alan Mills RP 1.45 9.45 Mike Butcher RP 0.33 4.35
Joe Grahe RP 0.86 7.28 Gene Nelson RP 0.32 4.31
Mike Fetters RP 0.25 4.25 Ken Patterson RP 0.19 2.92
Hilly Hathaway SP 0.04 2.15 Hilly Hathaway SP 0.04 2.15
Scott Lewis SP 0.3 1.61 Scott Lewis SP 0.3 1.61
Mike Witt SP -0.13 1.23 Brian Anderson SP 0.17 0.63
Brian Anderson SP 0.17 0.63 Darryl Scott RP -0.22 0.42
Mike Cook RP 0.08 0.47 Chuck Crim RP -0.27 0.4
Darryl Scott RP -0.22 0.42 John Farrell SP -1.65 0
Marcus Moore RP -0.56 0.36 Mark Holzemer SP -0.83 0
Mark Holzemer SP -0.83 0 Doug Linton RP -0.81 0
Dennis Rasmussen SP -0.62 0 Jerry Nielsen RP -0.61 0
Paul Swingle RP -0.37 0 Russ Springer SP -1.03 0
Paul Swingle RP -0.37 0
Julio Valera SP -1.13 0

Notable Transactions

Devon White 

December 2, 1990: Traded by the California Angels with Willie Fraser and Marcus Moore to the Toronto Blue Jays for a player to be named later, Junior Felix and Luis Sojo. The Toronto Blue Jays sent Ken Rivers (minors) (December 4, 1990) to the California Angels to complete the trade. 

Dante Bichette

March 14, 1991: Traded by the California Angels to the Milwaukee Brewers for Dave Parker.

November 17, 1992: Traded by the Milwaukee Brewers to the Colorado Rockies for Kevin Reimer.

Wally Joyner

October 28, 1991: Granted Free Agency.

December 9, 1991: Signed as a Free Agent with the Kansas City Royals. 

Bryan Harvey

November 17, 1992: Drafted by the Florida Marlins from the California Angels as the 20th pick in the 1992 expansion draft.

Brian Harper 

December 11, 1981: Traded by the California Angels to the Pittsburgh Pirates for Tim Foli.

December 12, 1984: Traded by the Pittsburgh Pirates with John Tudor to the St. Louis Cardinals for Steve Barnard (minors) and George Hendrick.

April 1, 1986: Released by the St. Louis Cardinals.

April 25, 1986: Signed as a Free Agent with the Detroit Tigers.

March 23, 1987: Released by the Detroit Tigers.

May 12, 1987: Purchased by the Oakland Athletics from San Jose (California).

October 12, 1987: Released by the Oakland Athletics.

January 4, 1988: Signed as a Free Agent with the Minnesota Twins.

November 4, 1991: Granted Free Agency.

December 19, 1991: Signed as a Free Agent with the Minnesota Twins. 

Mark T. McLemore 

August 17, 1990: the California Angels sent Mark McLemore to the Cleveland Indians to complete an earlier deal made on September 6, 1989. September 6, 1989: The California Angels sent a player to be named later to the Cleveland Indians for Ron Tingley.

December 13, 1990: Released by the Cleveland Indians.

March 6, 1991: Signed as a Free Agent with the Houston Astros.

June 25, 1991: Released by the Houston Astros.

July 5, 1991: Signed as a Free Agent with the Baltimore Orioles.

October 15, 1991: Granted Free Agency.

February 5, 1992: Signed as a Free Agent with the Baltimore Orioles.

December 19, 1992: Released by the Baltimore Orioles.

January 6, 1993: Signed as a Free Agent with the Baltimore Orioles.

Honorable Mention

The 2001 Anaheim Angels 

OWAR: 37.4     OWS: 267     OPW%: .467     (76-86)

AWAR: 31.1      AWS: 225     APW%: .463     (75-87)

WARdiff: 6.3                        WSdiff: 42  

The “Original” and “Actual” 2001 Angels finished in the American League West basement. Perennial Gold Glove center fielder Jim Edmonds socked 38 doubles and 30 long balls. “Jimmy Baseball” supplied a .304 BA with 95 runs scored and 110 ribbies. Mark T. McLemore batted .286 and nabbed 39 bags in 46 attempts. Troy Glaus crushed 41 circuit clouts and 38 two-baggers as he topped the century mark in runs and RBI. Garret Anderson rapped 194 base knocks including 39 doubles and 28 round-trippers while establishing a personal-best with 123 RBI.  Jarrod Washburn delivered 11 victories with an ERA of 3.77. Troy Percival (1.65, 39 SV) made his fourth appearance in the Mid-Summer Classic and furnished a 0.988 WHIP with more than 11 strikeouts per 9 innings pitched. Glaus, Anderson, Washburn and Percival appear on the “Original” and “Actual” Angels rosters in 2001.

On Deck

What Might Have Been – The “Original” 1999 White Sox

References and Resources

Baseball America – Executive Database


James, Bill. The New Bill James Historical Baseball Abstract. New York, NY.: The Free Press, 2001. Print.

James, Bill, with Jim Henzler. Win Shares. Morton Grove, Ill.: STATS, 2002. Print.

Retrosheet – Transactions Database

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “”.

Seamheads – Baseball Gauge

Sean Lahman Baseball Archive

Quantifying Bullpen Roles: The 2016 Season

Author’s Note: This is the second of a two-part article, both of which are intended to stand on their own. The first introduces terminology and a mathematical framework used to derive statistics; the second uses these new ideas to draw conclusions which are hopefully intriguing to the reader. If you need it as a reference, you can refer back to the first article (here).

Below, I’ll use some metrics – average and weighted-average Euclidian distance between relievers – to look at the 2016 season. Ideally, we’d like to be able to associate a covariate with these metrics. That is, we’d like to be able to say “bullpens with lower weighted-average distances are (blank),” where we fill in the blank with some common-sense concept or truism about the way we know the game to work. Short of that though, maybe we can just get an understanding of why the bullpens at either extreme have found themselves there.

So, without further ado, here are the bullpens of all 30 teams as sorted by weighted average Euclidian distance in 2016.

2016 WAED Leaders

How can we interpret this? There’s no real obvious trend here: there are “good” and “bad” bullpens on both ends of the table, along with “good” and “bad” teams. At the extremes are good case studies, though: A subpar Phillies bullpen on a subpar Phillies team, a solid Orioles bullpen on a solid Orioles team, and of course, the Cubs. What can we learn from looking at them in more detail?

The 2016 Phillies Bullpen: An Ode to Brett Oberholtzer

Most people reading this know how the Phillies season went last year. They were supposed to be bad. Then, briefly, they appeared to be good. People did what they could to explain why the Phillies appeared to be good, including looking at their overachieving bullpen. As it turns out, the Phillies were bad after all. Baseball is fun.


The Phillies being bad explains part of what you see above. They tended to employ a lot of guys in the middle innings when they were already behind in the game. That’s a product of circumstance, and not an indictment of those guys. Elvis Araujo, Severino Gonzalez and Colton Murray weren’t great pitchers, and it’s sort of odd to have three of those guys rotating into your bullpen at various points in the season. Then again, the Phillies were bad, and those three guys were young, and they could afford to give young guys longer runs than a competing team could have.

There are those three guys, and then there’s Brett Oberholtzer, a slightly older, more experienced pitcher, whose MLB time before 2016 was mostly as a starter. He can be considered the quintessential mop-up guy in 2016. He’s way over there to the left – in fact, he had the lowest average score differential when entering the game out of any relief pitcher in 2016. Here’s what his inning-score matrix looked like:


This doesn’t even do Brett Oberholtzer justice, though. Here’s a histogram of score differential by appearance that puts it into context.


Oberholtzer made 26 appearances for the Phillies in 2016, and most of them were in garbage time. Then, there was the one appearance where the Phillies actually led when he came into the game. It was the 10th inning, and most of the Phillies bullpen had already been spent. Pete Mackanin had little choice but to bring Oberholtzer in to protect a one-run lead in the 10th. Which he did, earning a save. Brett Oberholtzer has no “regular” mode, no “normal” days. Baseball is wonderful. Baseball is weird.

Getting back to the Phillies bullpen as a whole: It’s not so atypical outside of Oberholtzer and an abundance of negative-score pitchers. Jeanmar Gomez was used in a fairly typical “closer” role, with Hector Neris and Edubray Ramos in higher-leverage setup roles. This all seems to comport with how we think of modern bullpens.

The 2016 Orioles: A Well-Oiled Machine

The Orioles had a very effective bullpen by most measures in 2016. Certainly, it helps to have Zach Britton churning out ground ball after ground ball, but overall the group was very effective, registering a league-leading 10.22 WPA for the season (with second place not being particularly close). Their 53 “meltdowns” were also fewest in the league. This was a playoff team, largely because of their bullpen. That is to say, this is a very different team than the 2016 Phillies.

That said, there are some similarities here.


The general shape is the same, although the Orioles were giving their bullpen a lead more often than the Phillies. One striking similarity is the presence of a “mop-up” guy, in this case, Vance Worley. Worley logged an impressive 64.2 innings in just 31 relief appearances. He was also never given the ball with a lead of less than six (!).


Worley soaked up a lot of innings for the O’s, and he did so in a rather effective way, ending with an ERA of 3.53 – a number which, while partially luck-driven, probably doesn’t suffer from quite as much inherited-runner variance as the average reliever. He created his own messes, and was allowed to clean them up, because Buck Showalter mostly thought the game was over anyway. The overall structure of a bullpen may be related, by necessity, to the depth that the starting rotation can get on a regular basis.

One item of interest here: The unweighted average distance is actually higher in the O’s bullpen than in the Phillies bullpen. When weighting by inverse variance, the Phillies show an even larger average distance, while the average distance narrows for the Orioles. This speaks to more rigid roles, particularly for the setup guys. Darren O’Day was very seldom called upon when the team was behind (four out of 34 appearances, none when trailing by more than three runs), whereas Hector Neris was used a bit more fluidly (18 out of 79 appearances, five appearances when trailing by five or more runs). There may again be a team effect at work here: Maybe the Phillies found themselves needing to get Neris work more often during long losing streaks, and were set on throwing him on a certain day regardless of score.

The 2016 Cubs: An Embarrassment of Riches

If you’ve been under a rock or are currently time traveling, this may shock you: The Cubs were really good last year. They even won the World Series! The Cubs!

OK, with that out of the way, this graph is going to look quite different than the previous two.


Did the Cubs ever not have a lead going into the seventh inning? Well, yes, I assure you that they did. Multiple times, in fact! However, they didn’t do it often enough to give anyone in their bullpen a “mop-up” role, or anything that resembles one. Look at that graph! The Cubs had Aroldis Chapman and Hector Rondon, and then they had seven other guys hanging out in the O’Day / Neris / Brad Brach neighborhood of the graph. What’s going on here?

There’s another thing that’s different about the Cubs which can help explain this. A lot of members of their bullpen have very high variances by score. Whereas O’Day, Neris and Brach have score variances in the single digits, many of the Cubs relievers have score variances north of 10. Take another look at the score variances in the Phillies and Orioles bullpen. Double-digit numbers are typically reserved for long men, mop-up guys, and lower-leverage relievers. Here’s Justin Grimm, who represents this pretty well:


Maybe this was a conscious decision by Joe Maddon, matching up in high-leverage situations with different arms. Maybe this was simply a necessary decision to keep everyone fresh in the face of repeated high-leverage situations: If you have late-game leads for five or six consecutive games, the same three arms can’t be used in all of them. It’s not as if Justin Grimm was used a lot in these situations, and no one would refer to him as a “high-leverage reliever.” He did have a dozen or so appearances in the high-leverage areas of the graph, though, and that’s not nothing.

You can chalk this up to the Cubs being really, really good in 2016, and likely, there’s some merit to that. But it also probably doesn’t tell the whole story. Out of 279 relievers with 20 or more appearances in 2016, only 18 of them had an average inning of 7 or later, an average score differential of 1 or more, and a score variance of 10 or more. Five of those 18 were on the Cubs. The Nationals, Rangers, Red Sox and Dodgers – all good teams in their own right, if not quite as dominant as the Cubs – had one such player each. The Indians had none.

It’s safe to say that Joe Maddon managed his bullpen differently than any of these teams in 2016. It’s also hard to argue with the results.

Quantifying Bullpen Roles: The Math

Author’s Note: This is the first of a two-part article, both parts of which are intended to stand on their own. The first introduces terminology and a mathematical framework used to derive statistics; the second uses these new ideas to draw conclusions which are hopefully intriguing to the reader. If you’re not into math, you can skip to the second article (here) and refer back to this one as needed.

Recently, I wrote about the inning-score matrix, and how we could refine the concept to put a finer point on when and how certain relief pitchers are used. Statistical oddities and outliers are always fun topics of conversation, and certainly, appearance data can give us that.

But can it give us more than that? I don’t care so much that Will Smith was used differently after he was traded or that Brett Oberholtzer was the closest thing to a true mop-up man in the game last year – OK, actually, those things are really interesting too – so much as I care to define how managers are employing bullpens. This may not even give rise to why managers are doing what they’re doing; it’s difficult to attribute intent when looking at numbers abstracted away from the human elements of the game. However, the decision to bring a specific relief pitcher into the game is a conscious one by the manager, largely influenced by game situation. To that end, appearance data can also be aggregated by team — and, if what we care about is the managerial decisions that give rise to bullpen roles, we should really be focused at the team level.

To gain insight into, and ultimately quantify, how bullpens are constructed, we need to define a few concepts. As we go through, I’ll do my best to explain the concept that we’re trying to quantify in baseball terms, before diving into the nuts and bolts of how I’m quantifying them.

Concept 1: Center of gravity

Your personal center of gravity is probably around your belly button – it’s the point at which half of your mass is above, half is below, half is left, half is right.

In addition to their physical centers of gravity (which they work so hard on, Bartolo Colon notwithstanding), relief pitchers have another “center of gravity”: the one at the center of their inning-score matrix. The inning-score matrix has two dimensions (score differential on the X-axis, inning on the Y-axis), and each appearance can be plotted in these two dimensions.

If we treat all appearances equally, a reliever’s center of gravity can be defined as the average inning and score when entering the game. This tells us a great deal about how the pitcher is being used on its own. For example, without looking at the names, you can probably guess which of these guys was a high-leverage reliever in 2016 and which was a mop-up guy.

Player A: Vance Worley; Player B: Zach Britton

The center of gravity is a snapshot of a player’s role. It doesn’t tell you everything – you can’t pick out a lefty specialist, for example, or a guy whose game situations changed drastically over the course of a season. In fact, in the latter case, a player’s center of gravity for an entire season may actually be misleading. Still, it’s the most information you can get about the player’s usage in a couple numbers. We’ll think of it as where the player “lives” in the inning-score matrix.

Concept 2: Euclidian distance

If you’re not a math person, ignore the word “Euclidian.” This is just “distance” in the way you think about it in everyday life. If I have two points in space, a straight line between them has a distance, and in layman’s terms, we’d say that the size of that distance constitutes “how close” or “how far apart” the two points are. Mathematically, for two points with coordinates (xi, yi) and (xj, yj), the Euclidian distance between them can be calculated as:

ED formula

A bullpen lives in the two-dimensional space that we used to define center of gravity: For every appearance a member of the bullpen makes, there is an inning (y), and there is a score (x). In this space, each member of the bullpen has a center of gravity. As such, we can say the two pitchers in our earlier example were far apart, but that these two are close together:

Player A: Shane Greene; Player B: Justin Wilson

In fact, you can start to look at entire bullpens graphically, in order to form an image of how the bullpen is constructed. Our “twins” from above are easy to pick out when we do this:


Nice to look at, and the trend makes intuitive sense: guys who pitch later in games are generally also trusted with leads. But how can we use it to compare bullpens? We need metrics to quantify what we’re seeing above, to describe how similar or dissimilar the roles are in a bullpen. Then we can compare that to other bullpens and give context to how a team is managing their pen relative to the rest of the league.

Concept 3: Average Euclidian distance

The simplest thing one could do would be to sum the distances of the lines connecting each player’s center of gravity. This has the disadvantage of being biased: Bullpens which have more qualifying players will have more dots to connect and, therefore, more total distance.


Naturally, we can calculate an average of these distances instead. This requires us to know how many unique distances there are between distinct pairs of relievers. We can deduce this logically: From the first of n relievers, there are (n – 1) lines, connecting that reliever to all the others. From the second reliever, we’ve already drawn the line to the first reliever, so we can draw (n – 2) more lines, connecting him to the remaining relievers … and so forth. Thus, for n relievers in a bullpen, there are (n – 1) + (n – 2) + … + 2 + 1 distances between them, and we can calculate the average Euclidian distance as:

AED Formula

This looks intimidating, but the numerator is really just the sum of all the distances of all the lines that we drew. The denominator is the number of lines that we drew. Voila: an average!

Concept 4: Weighted-average Euclidian distance

You may be tiring of all this talk about Euclidian distance. It’s important, though, to take this one step further. To use the average distance between all members of the bullpen as a basis of comparison is to make the assumption that all relievers are created equal – that, if you’re a fan of the Indians, you care about the distance between Kyle Crockett and Dan Otero as much as you do about the distance between Bryan Shaw and Cody Allen. You probably don’t, and that makes sense – the former duo isn’t nearly as important to the makeup of the Indians’ bullpen as the latter. We should, therefore, be emphasizing certain relievers and the distances associated with them.

How do we characterize certain members of a bullpen as important, numerically? We could weight them by, say, the average Leverage Index at the time they entered the game; players who are trusted in critical situations are surely more important, right? The issue with this idea is that leverage is highly correlated with the inning and score – in fact, it’s derived from them. Weighting by Leverage Index would tell us that players in a certain area of the graph are more important to team success. This is intuitive and not very interesting.

What do we want to measure? It might be interesting to know how rigid or fluid a team’s bullpen is; that is, do they have a “seventh-inning guy” or a “mop-up guy” who is consistently called on in certain situations? In this case, we want to give more weight to relievers who have lower variance by game situation when entering the game. If the manager gives someone a highly-specific role by inning and score, that reliever is important insofar as the structure of the bullpen is concerned. That may not translate to how important they are with respect to the outcome of games, but presumably, that reliever has a fixed role because they have a skillset that in some way lends itself to his residence in a certain part of the graph.

Fortunately, the concept of inverse-variance weighting is an established mathematical concept. The idea is that players with lower variance by inning and score should be weighted more heavily. In short, this works in three steps:

  1. For each pair of players, divide the Euclidian distance between them by the sum of score and inning variances associated with their centers of gravity;
  2. For each pair of players, divide 1 by that very same sum of score and inning variances;
  3. Divide the sum of results of (1) by the sum of results of (2).

Mathematically, this looks like this:

WAED Formula

Portrait of a Modern Bullpen

If you’re still with me, you may be wondering what the use of all this is. Let’s summarize what we’ve done so far:

  • The average Euclidian distance between members of the bullpen tells us how clustered or spread out that bullpen is as a whole.
  • Using a weighted average refines that metric in order to emphasize members of the bullpen that have well-defined, rigid roles – usually a closer and a setup man or two, but sometimes a surprise as well.

We can summarize a bullpen with these metrics and a plot of all members of a bullpen (as represented by their centers of gravity). Here’s how the 2016 Marlins bullpen looks in a snapshot. The 2016 Marlins have been chosen because they were a very average bullpen in terms of performance as well as structure, on a very average team overall. I couldn’t find anything at all that stood out about them.


We can use this framework to compare bullpens going forward: Which teams have very large distances between relievers? Which are more clustered? Which are oriented differently? We can not only compare bullpens within a single season, but also how bullpen structures have changed over time across the league. We can explore whether the structure of a bullpen is consistent from year to year on a single team, or if certain managers have ways of managing their bullpens which consistently show up in the data associated with their teams. There are a lot of exciting possible applications.

And of course, we can point out statistical oddities along the way. Why wouldn’t we?

Basic Machine Learning With R (Part 2)

(For part 1 of this series, click here)

Last time, we learned how to run a machine-learning algorithm in just a few lines of R code. But how can we apply that to actual baseball data? Well, first we have to get some baseball data. There are lots of great places to get some — Bill Petti’s post I linked to last time has some great resources — but heck, we’re on FanGraphs, so let’s get the data from here.

You probably know this, but it took forever for me to learn it — you can make custom leaderboards here at FanGraphs and export them to CSV. This is an amazing resource for machine learning, because the data is nice and clean, and in a very user-friendly format. So we’ll do that to run our model, which today will be to try to predict pitcher WAR from the other counting stats. I’m going to use this custom leaderboard (if you’ve never made a custom leaderboard before, play around there a bit to see how you can customize things). If you click on “Export Data” on that page you can download the CSV that we’ll be using for the rest of this post.

View post on

Let’s load this data into R. Just like last time, all the code presented here is on my GitHub. Reading CSVs is super easy — assuming you named your file “leaderboard.csv”, it’s just this:

pitcherData <- read.csv('leaderboard.csv',fileEncoding = "UTF-8-BOM")

Normally you wouldn’t need the “fileEncoding” bit, but for whatever reason FanGraphs CSVs use a particularly annoying character encoding. You may also need to use the full path to the file if your working directory is not where the file is.

Let’s take a look at our data. Remember the “head” function we used last time? Let’s change it up and use the “str” function this time.

> str(pitcherData)
'data.frame':	594 obs. of  16 variables:
 $ Season  : int  2015 2015 2014 2013 2015 2016 2014 2014 2013 2014 ...
 $ Name    : Factor w/ 231 levels "A.J. Burnett",..: 230 94 47 ...
 $ Team    : Factor w/ 31 levels "- - -","Angels",..: 11 9 11  ...
 $ W       : int  19 22 21 16 16 16 15 12 12 20 ...
 $ L       : int  3 6 3 9 7 8 6 4 6 9 ...
 $ G       : int  32 33 27 33 33 31 34 26 28 34 ...
 $ GS      : int  32 33 27 33 33 30 34 26 28 34 ...
 $ IP      : num  222 229 198 236 232 ...
 $ H       : int  148 150 139 164 163 142 170 129 111 169 ...
 $ R       : int  43 52 42 55 62 53 68 48 47 69 ...
 $ ER      : int  41 45 39 48 55 45 56 42 42 61 ...
 $ HR      : int  14 10 9 11 15 15 16 13 10 22 ...
 $ BB      : int  40 48 31 52 42 44 46 39 58 65 ...
 $ SO      : int  200 236 239 232 301 170 248 208 187 242 ...
 $ WAR     : num  5.8 7.3 7.6 7.1 8.6 4.5 6.1 5.2 4.1 4.6 ...
 $ playerid: int  1943 4153 2036 2036 2036 12049 4772 10603 ...

Sometimes the CSV needs cleaning up, but this one is not so bad. Other than “Name” and “Team”, everything shows as a numeric data type, which isn’t always the case. For completeness, I want to mention that if a column that was actually numeric showed up as a factor variable (this happens A LOT), you would convert it in the following way:

pitcherData$WAR <- as.numeric(as.character(pitcherData$WAR))

Now, which of these potential features should we use to build our model? One quick way to explore good possibilities is by running a correlation analysis:

cor(subset(pitcherData, select=-c(Season,Name,Team,playerid)))

Note that in this line, we’ve removed the columns that are either non-numeric or are totally uninteresting to us. The “WAR” column in the result is the one we’re after — it looks like this:

W    0.50990268
L   -0.36354081
G    0.09764845
GS   0.20699173
IP   0.59004342
H   -0.06260448
R   -0.48937468
ER  -0.50046647
HR  -0.47068461
BB  -0.24500566
SO   0.74995296
WAR  1.00000000

Let’s take a first crack at this prediction with the columns that show the most correlation (both positive and negative): Wins, Losses, Innings Pitched, Earned Runs, Home Runs, Walks, and Strikeouts.

goodColumns <- c('W','L','IP','ER','HR','BB','SO','WAR')
inTrain <- createDataPartition(pitcherData$WAR,p=0.7,list=FALSE)
training <- data[inTrain,goodColumns]
testing <- data[-inTrain,goodColumns]

You should recognize this setup from what we did last time. The only difference here is that we’re choosing which columns to keep; with the iris data set we didn’t need to do that. Now we are ready to run our model, but which algorithm do we choose? Lots of ink has been spilled about which is the best model to use in any given scenario, but most of that discussion is wasted. As far as I’m concerned, there are only two things you need to weigh:

  1. how *interpretable* you want the model to be
  2. how *accurate* you want the model to be

If you want interpretability, you probably want linear regression (for regression problems) and decision trees or logistic regression (for classification problems). If you don’t care about other people being able to make heads or tails out of your results, but you want something that is likely to work well, my two favorite algorithms are boosting and random forests (these two can do both regression and classification). Rule of thumb: start with the interpretable ones. If they work okay, then there may be no need to go to something fancy. In our case, there already is a black-box algorithm for computing pitcher WAR, so we don’t really need another one. Let’s try for interpretability.

We’re also going to add one other wrinkle: cross-validation. I won’t say too much about it here except that in general you’ll get better results if you add the “trainControl” stuff. If you’re interested, please do read about it on Wikipedia.

method = 'lm' # linear regression
ctrl <- trainControl(method = 'repeatedcv',number = 10, repeats = 10)
modelFit <- train(WAR ~ ., method=method, data=training, trControl=ctrl)

Did it work? Was it any good? One nice quick way to tell is to look at the summary.

> summary(modelFit)

lm(formula = .outcome ~ ., data = dat)

     Min       1Q   Median       3Q      Max 
-1.38711 -0.30398  0.01603  0.31073  1.34957 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.6927921  0.2735966  -2.532  0.01171 *  
W            0.0166766  0.0101921   1.636  0.10256    
L           -0.0336223  0.0113979  -2.950  0.00336 ** 
IP           0.0211533  0.0017859  11.845  < 2e-16 ***
ER           0.0047654  0.0026371   1.807  0.07149 .  
HR          -0.1260508  0.0048609 -25.931  < 2e-16 ***
BB          -0.0363923  0.0017416 -20.896  < 2e-16 ***
SO           0.0239269  0.0008243  29.027  < 2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4728 on 410 degrees of freedom
Multiple R-squared:  0.9113,	Adjusted R-squared:  0.9097 
F-statistic: 601.5 on 7 and 410 DF,  p-value: < 2.2e-16

Whoa, that’s actually really good. The adjusted R-squared is over 0.9, which is fantastic. We also get something else nice out of this, which is the significance of each variable, helpfully indicated by a 0-3 star system. We have four variables that were three-stars; what would happen if we built our model with just those features? It would certainly be simpler; let’s see if it’s anywhere near as good.

> model2 <- train(WAR ~ IP + HR + BB + SO, method=method, data=training, trControl=ctrl)
> summary(model2)

lm(formula = .outcome ~ ., data = dat)

     Min       1Q   Median       3Q      Max 
-1.32227 -0.27779 -0.00839  0.30686  1.35129 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.8074825  0.2696911  -2.994  0.00292 ** 
IP           0.0228243  0.0015400  14.821  < 2e-16 ***
HR          -0.1253022  0.0039635 -31.614  < 2e-16 ***
BB          -0.0366801  0.0015888 -23.086  < 2e-16 ***
SO           0.0241239  0.0007626  31.633  < 2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4829 on 413 degrees of freedom
Multiple R-squared:  0.9067,	Adjusted R-squared:  0.9058 
F-statistic:  1004 on 4 and 413 DF,  p-value: < 2.2e-16

Awesome! The results still look really good. But of course, we need to be concerned about overfitting, so we can’t be 100% sure this is a decent model until we evaluate it on our test set. Let’s do that now:

# Apply to test set
predicted2 <- predict(model2,newdata=testing)
# R-squared
cor(testing$WAR,predicted2)^2 # 0.9108492
# Plot the predicted values vs. actuals

View post on

Fantastic! This is as good as we could have expected from this, and now we have an interpretable version of pitcher WAR, specifically,

WAR = -0.8 + 0.02 * IP + -0.13 * HR + -0.04 * BB + 0.02 * K

Most of the time, machine learning does not come out as nice as it has in this post and the last one, so don’t expect miracles every time out. But you can occasionally get some really cool results if you know what you’re doing, and at this point, you kind of do! I have a few ideas about what to write about for part 3 (likely the final part), but if there’s something you really would like to know how to do, hit me up in the comments.

dSCORE: Pitcher Evaluation by Stuff

Confession: fantasy baseball is life.

Second confession: the chance that I actually turn out to be a sabermetrician is <1%.

That being said, driven purely by competition and a need to have a leg up on the established vets in a 20-team, hyper-deep fantasy league, I had an idea to see if I could build a set of formulas that attempted to quantify a pitcher’s “true-talent level” by the performance of each pitch in his arsenal. Along with one of my buddies in the league who happens to be (much) better at numbers than yours truly, dSCORE was born.

dSCORE (“Dominance Score”) is designed as a luck-independent analysis (similar to FIP) — showing a pitcher might be overperforming/underperforming based on the quality of the pitches he throws. It analyzes each pitch at a pitcher’s disposal using outcome metrics (K-BB%, Hard/Soft%, contact metrics, swinging strikes, weighted pitch values), with each metric weighted by importance to success. For relievers, missing bats, limiting hard contact, and one to two premium pitches are better indicators of success; starting pitchers with a better overall arsenal plus contact and baserunner management tend to have more success. We designed dSCORE as a way to make early identification of possible high-leverage relievers or closers, as well as stripping out as much luck as possible to view a pitcher from as pure a talent point of view as possible.

We’ve finalized our evaluations of MLB relievers, so I’ll be going over those below. I’ll post our findings on starting pitchers as soon as we finish up that part — but you’ll be able to see the work in process in this Google Sheets link that also shows the finalized rankings for relievers.

Top Performing RP by Arsenal, 2016
Rank Name Team dSCORE
1 Aroldis Chapman Yankees 87
2 Andrew Miller Indians 86
3 Edwin Diaz Mariners 82
4 Carl Edwards Jr. Cubs 78
5 Dellin Betances Yankees 63
6 Ken Giles Astros 63
7 Zach Britton Orioles 61
8 Danny Duffy Royals 61
9 Kenley Jansen Dodgers 61
10 Seung Hwan Oh Cardinals 58
11 Luis Avilan Dodgers 57
12 Kelvin Herrera Royals 57
13 Pedro Strop Cubs 57
14 Grant Dayton Dodgers 52
15 Kyle Barraclough Marlins 50
16 Hector Neris Phillies 49
17 Christopher Devenski Astros 48
18 Boone Logan White Sox 46
19 Matt Bush Rangers 46
20 Luke Gregerson Astros 45
21 Roberto Osuna Blue Jays 44
22 Shawn Kelley Mariners 44
22 Alex Colome Rays 44
24 Bruce Rondon Tigers 43
25 Nate Jones White Sox 43

Any reliever list that’s headed up by Chapman and Miller should be on the right track. Danny Duffy shows up, even though he spent most of the summer in the starting rotation. I guess that shows just how good he was even in a starting role!

We had built the alpha version of this algorithm right as guys like Edwin Diaz and Carl Edwards Jr. were starting to get national helium as breakout talents. Even in our alpha version, they made the top 10, which was about as much of a proof-of-concept as could be asked for. Other possible impact guys identified include Grant Dayton (#14), Matt Bush (#19), Josh Smoker (#26), Dario Alvarez (#28), Michael Feliz (#29) and Pedro Baez (#30).

Since I led with the results, here’s how we got them. For relievers, we took these stats:

Set 1: K-BB%

Set 2: Hard%, Soft%

Set 3: Contact%, O-Contact%, Z-Contact%, SwStk%

Set 4: vPitch,

Set 5: wPitch Set 6: Pitch-X and Pitch-Z (where “Pitch” includes FA, FT, SL, CU, CH, FS for all of the above)

…and threw them in a weighting blender. I’ve already touched on the fact that relievers operate on a different set of ideal success indicators than starters, so for relievers we resolved on weights of 25% for Set 1, 10% for Set 2, 25% for Set 3, 10% for Set 4, 20% for set 5 and 10% for Set 6. Sum up the final weighted values, and you get each pitcher’s dSCORE. Before we weighted each arsenal, though, we compared each metric to the league mean, and gave it a numerical value based on how it stacked up to that mean. The higher the value, the better that pitch performed.

What the algorithm rolls out is an interesting, somewhat top-heavy curve that would be nice to paste in here if I could get media to upload, but I seem to be rather poor at life, so that didn’t happen — BUT it’s on the Sum tab in the link above. Adjusting the weightings obviously skews the results and therefore introduces a touch of bias, but it also has some interesting side effects when searching for players that are heavily affected by certain outcomes (e.g. someone that misses bats but the rest of the package is iffy). One last oddity/weakness we noticed was that pitchers with multiple plus-to-elite pitches got a boost in our rating system. The reason that could be an issue is guys like Kenley Jansen, who rely on a single dominant pitch, can get buried more than they deserve.