Are We Putting Too Much Stock In Matchups?

No matter your league’s settings for starting pitchers, you’re forced to make decisions about whether to start or bench pitchers each time their turn in the rotation comes up. If you’re in a league with an innings or games started cap, you have to try to maximize the limited opportunities you have to start a pitcher. If you play in a league with no such limitations, the natural friction between the counting categories and the ratio categories forces you to make similar decisions. Sure, you can start Jake Odorizzi and take advantage of the tenth best strikeout rate among qualified starters. But his 4.23 ERA and 1.31 WHIP count, too.

The number one factor I consider when making such a decision is who the pitcher is facing that day. The strength of the other team’s offense against pitchers of my starter’s handedness usually determines whether I start or bench a pitcher I’m on the fence about. Matchup is probably the biggest factor for most fantasy owners when it’s not a must-start pitcher. I asked Twitter where the cutoff is for must-start guys. I got answers ranging from only Felix and Kershaw all the way to the top 25-30 starters.

Top 30 was sort of the number I had in my head. I recently wrote a fantasy football piece in which I examined how matchups affected fantasy production for the top 25 wide receivers last year. I just tested the correlation between a player’s weekly production and the strength of the opposing defense measured by pass defense DVOA from Football Outsiders. Some interesting results came from that exercise, but one thing that wasn’t a surprise was the lack of correlation between production and matchup for the top receivers. To amass enough points to finish the season among the best, a receiver has to accumulate points each week regardless of the opponent. Likewise, I thought the top starting pitchers would be matchup proof with the correlation growing stronger as we moved away from the elite guys.

Turns out that’s not the case. Well, the second part isn’t. The top pitchers didn’t have much of a correlation at all as expected, but neither did many of the pitchers on down the list. I was so expecting my hypothesis to be correct that I’m questioning whether the methodology for testing this was correct. What I did was take each pitcher’s individual game scores and compared them to how far above or below league average the opponent was against pitchers of the same handedness. For example, here’s a graph showing the relationship between John Lackey’s game scores and the strength of his opponents.

Lackey Correlation

Lackey is an interesting example for two reasons. The first reason is anecdotal. I own Lackey in my big money league, and I’ve had him active for 135 of his 169.1 innings. In those 135 innings he’s had a 4.33 ERA and 1.33 WHIP. For the year he has a 3.77 ERA and 1.25 WHIP. I can assure you decisions to start or sit him were based purely on matchup. That clearly hasn’t worked out for me.

But that’s just one guy, right? Well, no. That brings me to the second reason. I tested the correlation between matchup and production for the top 50 starters on ESPN’s player rater. Again, I was expecting the relationship between the two to get stronger after the top 30 or so starters. But Lackey ranks 47th on the player rater. And as you can see below, Lackey is far from an outlier among pitchers outside the top 30.

PR Rank Player r-squared
1 Felix Hernandez -0.0072
2 Clayton Kershaw -0.0022
3 Johnny Cueto -0.0166
4 David Price 0.0732
5 Corey Kluber 0.0828
6 Chris Sale 0.0123
7 Adam Wainwright -0.0025
8 Garrett Richards 0.0273
9 Jon Lester 0.0046
10 Max Scherzer -0.0035
11 Masahiro Tanaka 0.1358
12 Julio Teheran 0.0742
13 Hisashi Iwakuma -0.0201
14 Madison Bumgarner 0.0011
15 Zack Greinke 0.0683
16 Tyson Ross 0.0778
17 Tanner Roark -0.0042
18 Scott Kazmir 0.0005
19 Rick Porcello -0.0145
20 Jake Arrieta 0.1818
21 Sonny Gray 0.0001
22 Doug Fister 0.1924
23 Lance Lynn 0.0549
24 Cole Hamels 0.0051
25 Jordan Zimmermann 0.0755
26 Stephen Strasburg 0.0192
27 Phil Hughes 0.0023
28 Yu Darvish 0.0165
29 Danny Duffy 0.0971
30 Hyun-Jin Ryu 0.1653
31 Jeff Samardzija 0.0053
32 Alfredo Simon 0.128
33 Chris Young -0.0108
34 Kyle Lohse 0.0178
35 Dallas Keuchel 0.1184
36 Henderson Alvarez -0.0213
37 Bartolo Colon 0.0071
38 James Shields 0.0153
39 Jered Weaver 0.0389
40 Ervin Santana 0.1597
41 Alex Cobb 0.2246
42 Tim Hudson 0.0146
43 Chris Tillman -0.0001
44 Alex Wood -0.0953
45 Matt Shoemaker -0.009
46 Wily Peralta 0.0013
47 John Lackey 0.0005
48 Collin McHugh 0.0125
49 Mike Leake 0.0239
50 Chris Archer 0.0611

According to that, my hypothesis could not have been more wrong. I guess I could have gone past the top 50 to see if the relationship strengthened at some point, but given that both the Twitter responses and my own practices considered top 30 to be about the cutoff for must-start pitchers, I didn’t think I needed to go further to prove that matchup may not be as decisive as I thought it was. That, and I doubt the relationship would have gotten any stronger.

Again, because it seems so intuitive that lesser pitchers will perform better against lesser opponents and vice-versa, I’m questioning my methodology. Is game score the right measure of production? I guess I could have used game-by-game point totals from a daily fantasy scoring system, but given that game score is just a points system that rewards pitchers similar to a daily scoring system, I doubt that would have changed the results all that much. Or maybe there is something else I’m completely overlooking.

But assuming I haven’t missed something here, matchup may not be as important as I/we think it is. Maybe it’s a better practice to just ask ourselves if the pitcher is any good. The definition of good is a movable target based on replacement level in your league, but that may be the only question you need to ask. In my big money league there is no innings or start cap. The 12 teams in the league carry about 10-11 starters on average. Fear of getting destroyed in the ratio categories is the only reason to leave a guy on the bench becuase the races for strikeouts and wins are so fierce. So in a league with with about 130 starting pitchers owned, why am I ever sitting Lackey, even if he’s facing the best offense in the league? He’s so far above replacement level that matchup should go right out the window.

Print This Post

You can find more of Brett's work on or follow him on Twitter @TheRealTAL.

33 Responses to “Are We Putting Too Much Stock In Matchups?”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Belle of the League says:

    Strange coincidence…I was doing the same research this morning on my roster and was also looking at Lackey who I also own.
    We are a custom points league. No emphasis on wins or losses. QS’s, K’s, and innings drive the points with obviously subtractions for hits, walks, runs allowed. Our league’s pitcher ranking is slightly different, but not significantly.
    We are also a deep keeper league where we are scraping the bottom of the free agent barrel when you need an injury replacement.
    I arrived at the identical conclusion.

    Vote -1 Vote +1

  2. Bill says:

    Thank you for this very interesting analysis. I’m in a 10-team H2H, and have for the most part been starting/sitting pitchers based on the matchup with the opposing offense. My outcomes have not been any better than I would have gotten by throwing darts at names tacked up on a wall. I feel a little better knowing that I’m not the only one.

    You don’t mention whether you are using YTD statistics for team offense. Assuming you did, it would be interesting to know whether there exists a higher correlation if you were to use only the most recent 4-6 weeks.

    Vote -1 Vote +1

    • mymaus says:

      Excellent point! In fact, if it were possible, I’d like to see the same study done but instead of using the team statistics, use the stats of the actual lineup that the pitcher faced that day. Team personal changes during the season (I’m not sure but I think the Cubs are a much better hitting team now than they were in April and May). Lineups can even change a lot day to day (mid week and Sunday day games especially). If this new study starts to show correlation among more highly ranked pitchers I guess you’d have to wait for the lineup to come out before starting your pitcher.

      Vote -1 Vote +1

      • BB says:

        I totally agree it would be interesting to see if it is more useful to base decisions on opponents’ recent performance. If it is too hard to query the stats of the actual lineup, maybe you could approximate it by using the last 30 (or 45 or 60) days against pitchers of that handedness.

        Also, FWIW, I also assumed the Cubs must be hitting better with these new callups, but I happened to check their last 30 days/post-break stats the other day, and although they were slightly better, they didn’t really seem meaningfully different from the year-long stats.

        Vote -1 Vote +1

  3. Matt says:

    It would be interesting to see a similar study with hitters. Although most people use the handedness of the hitter/pitcher to decide who to start, I often use the quality of the pitcher as a tie breaker of sorts. It would be interesting to see how well some of the top hitters perform against the top pitchers compared the mid-tier pitchers.

    Vote -1 Vote +1

  4. mymaus says:

    I may be completely wrong (it’s been 50 years since my last math/statistics course) but after a BRIEF amount of thinking the only way your assertion could be correct is if the pitchers ranked 51+ made up for the top 50’s uncorrelated performance by being extremely correlated. After all, the teams who mash opposite handed pitching must be hitting somebody on a consistent basis or their good hitting wouldn’t be correlated with facing an opposite handed pitcher. They have to get those stats against somebody.

    Maybe the top 30 line that many of us approximately use needs to be adjusted to top 60 or 70! It would be interesting to find that out by studying all the pitchers.

    My brain hurts.

    Vote -1 Vote +1

    • Brett Talley says:

      I could definitely keep going, but it takes quite a bit of time to do all these one-by-one. I stopped at 50 because I had to get some sleep last night. And things didn’t seem to be trending in that direction. Maybe I’ll dig a bit deeper at some point and see if anything turns up.

      Vote -1 Vote +1

  5. Matt says:

    Personally, with as much as we know about DIPS theory, this result isn’t surprising in the least. We know that pitchers only have a certain amount of control over their results, and that earned run values are not reliable at all in the short term. So, why should we be able to accurately predict when a pitcher will have a great start? However, maybe there’s a better way to look at performance other than game score? From a fantasy perspective, I guess game score works well since it encompasses the results fantasy is based on, but maybe you would see the kind of correlation you were expecting if you looked at strikeout or walk rates instead of game score.

    Vote -1 Vote +1

    • dude says:

      Doesn’t DIPS say that pitchers only have limited control over balls in play? K/BB/HRs are more predictive/under their control and they are often affected more by pitcher vs batter handedness.

      Vote -1 Vote +1

      • Matt says:

        Yes, but aren’t fantasy baseball results entirely dependent on those balls in play? If the only stats your league counts are K% and BB% then fine, but when you use matchups to try to take advantage of those things, the ERA and WHIP randomness counts just as much as the K totals.

        Vote -1 Vote +1

  6. Ryan says:

    I’m willing to bet that if you broke this down into fantasy value by category, you’d start to see correlation. Maybe ERA fluctuates more wildly, leading to high gamescore variance, but wouldn’t you think that K’s, WHIP, and W’s would show some relationship to matchup?

    Vote -1 Vote +1

  7. zaneman89 says:

    Sorry if I missed this, but are you looking at park factors at all in this? Such as the neutralized wRC+ to judge the offense and then adjusting based on park? Maybe this would provide better correlation.

    Vote -1 Vote +1

  8. blargg says:

    Technical, and probably won’t change the results much, but how are you getting negative R^2’s? Are you using the adjusted R^2? As far as I can tell there’s no need to do that, since you’re just using one X (strength of opponent).

    Vote -1 Vote +1

  9. Brett Talley says:

    You’re right. I wasn’t getting those. Just manually changed those to show some were an inverse relationship.

    Vote -1 Vote +1

  10. Patrick says:

    There is no mention of win probability. That factors into a good match up for fantasy baseball.

    Vote -1 Vote +1

    • BB says:

      Do you have a source you use for win probability? Or is this something you calculate yourself? I am curious because I take this into account but in a really ad hoc way. For example, in terms of wins, in a league with weekly lineups, am I better off starting Fiers with one start or Odorizzi with two starts? If Odorizzi has a 1 in 3 chance of a win and Fiers has 1 in 2, then I think the answer is (barely) Odorizzi. But I don’t have any reliable way to calculate those win probabilities, so it sort of becomes a garbage in, garbage out scenario if I am just making a semi-educated guess about the odds of a guy winning a particular game…

      Vote -1 Vote +1

  11. stonepie says:

    your league sounds like mine, so in an effort to counter the race, i’ve decided to go with a couple of aces next year coupled with elite relievers. im in a h2h league with a IP minimum of 25 innings, so 2-3 starters plus an army of relievers should allow me to dominate era/whip/saves and then deploy a legion of platoon hitters.

    Vote -1 Vote +1

  12. buddyglass says:

    You might check the strength of opposition vs. the pitcher’s pitching hand. Some teams have pretty extreme splits. The Dodgers, for example, have the 6th best OPS against RHP but the 22nd best OPS against LHP. The Blue Jays are 2nd in OPS against RHP but 24th in OPS against LHP. The Cubs are the reverse; they’re 5th in OPS against LHP but 28th against RHP.

    Vote -1 Vote +1

  13. Another Angle says:

    I’m in a roto league that uses all players – there is no bench. And I stream based off of matchup – but it’s very extreme. For example, I stream waiver wire types strictly against bottom 5 offenses.. Does the theory hold up if you use a more extreme method to pick your matchups? If you cherry picked Lackey’s 30 innings against the worst of the worst, how did he do? Hopefully my question makes sense. End of a long and horrible workday…

    Vote -1 Vote +1

    • BB says:

      This would be interesting to know. In both directions. There might be guys you always start, guys you never start, guys you stream only against bottom-5 opponents, and then guys you sit only against top-5 opponents…

      Vote -1 Vote +1

  14. Mike says:

    The plot for Lackey makes me curious if a simple linear regression is the right tool for the question. The dispersal makes it seem that the range of variability might be the interesting thing to focus on. The question would then be: does the range of possible game score increase with the strength of the opponent? It could be that the uncertainty increases with stronger opponents (i.e., could do really well or really poorly), and decreases with weaker opponents (i.e., typically do well against weaker teams).

    Vote -1 Vote +1

  15. Only Padres Fan Ever says:

    Fresh article Brett thanks for the work!

    Vote -1 Vote +1

  16. NBH says:

    Great read. Quintana, Leake, Kennedy, J Chavez and W Peralta all have ERAs in the 3s but their ERAs when I used them (becase of good matchups) are in the 4s and 5s. I saw no correlation between opponent and outcome.

    Vote -1 Vote +1

  17. matt says:

    Very interesting article. However, when I consider matchups I look at more than just how an offense performs against the handedness of my starter. I also consider the offense’s performance over the past few days (they may be slumping or hot hitting), the park factor, and if its an interleague game. I feel like these are important factors in matchups.

    Also, I was wondering if you used the offense’s end of season wRC or if you used their wRC at the time of the matchup? It would make more sense to use wRC at the time of the matchup because that’s what a fantasy player would look at when making the decision to start or bench.

    Vote -1 Vote +1

    • BB says:

      I haven’t seen stats on interleague games. Is it known/statistically clear that pitchers tend to do better in that scenario? Is it just AL pitchers in NL parks (no DH), or across the board? It wouldn’t be that surprising if it was across the board, since the hitters haven’t seen the pitchers as often. But it also wouldn’t be surprising if NL pitchers actually did worse in AL parks due to the DH… would appreciate if you could point to data on this. thanks!

      Vote -1 Vote +1

  18. ralph says:

    As mentioned above, the top 5 or so offenses scare me, and the bottom 5 or so offenses have me licking my chops.

    That said, I’ve often wondered how legit making decisions on even extreme matchups is, especially when you’re talking about pitchers who already are pretty good.

    What I come back to is that pitchers with some frequency say they had their stuff working today (or didn’t), and I wonder if that’s actually so important that it overwhelms matchups. This would seem to support that hypothesis. Unfortunately, we have no way to know in advance whether a pitcher will bring his stuff with him that day.

    Vote -1 Vote +1

  19. joey baseball says:

    Something that might be interesting to consider is the vegas line for the game in which a pitcher is starting. This might help to calculate the likelihood of that starting pitcher winning a game against the opposing starting pitcher (versus just measuring how good at hitting the team he is facing is).

    Sure wins are much harder to predict, but in most fantasy settings, winning games is still important. I also imagine a betting line will have other things baked into it (i.e. how hot an opposing team is as suggested above).

    Vote -1 Vote +1

  20. David says:

    Very interesting. It’s entirely likely I am thinking about this wrong, but if you pull back from these individual trees to loo at the forest I get confused (maybe it’s vertigo from pulling back too fas).
    I think in the aggregate these numbers must match up
    Given wrc+ is accurately measuring runs created above league average (extra runs)
    Given game scores accurately measure pitcher performance in terms of runs prevented (reduced runs)
    Then on an AGGREGATE, total game scores must correlate negatively to wrc+ of opponent
    Which means: one of the measures is not working properly; the measures do not work well together; your sample is not big enough; or above average hitting teams really mash relief pitcher. Also possible — none of the above I missed an obvious forest fire.

    Vote -1 Vote +1

  21. BB says:

    Even if overall performance is not predictable, I wonder if it is different for strikeouts? I am in a situation where I am now streaming mainly for Ks and Wins (my ratios are terrible and I am unlikely to move up or down in those categories). I tend to look for matchups against the high K teams (Braves, Astros, Marlins) and consider sitting even pretty good strikeout pitchers against the extreme low K teams (Royals, Cards). My sense is that this seems to work, at least for this one category. (Although obviously there are limits — I would assume that Scherzer against the Royals still has a higher K expectancy than Joe Kelly against the Astros…) I haven’t really studied it, though.

    Do you think it is possible/likely that Ks are more predictably variable than the other categories?

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>