Reviewing the Preseason Standings Projections

The FanGraphs staff made its obligatory preseason picks before the season (naturally), and I think it’s safe to say that none of us have psychic powers. My picks of the Angels and Blue Jays to win their divisions — they’re not looking so hot right now. In my defense, I was just blindly going along with what our preseason WAR estimates told me. OK, not the greatest defense, but I figured Steamer + ZiPS + FG-created depth charts could produce better guesses than I could on my own. Especially with the roster changes that have happened lately, I thought it would be a good time to revisit our projections. The Angels came up the series victors against the Blue Jays in their recent four-game Battle of the Disappointments, but both teams are still far below the expectations put on them.  However, let’s examine: could they actually be good teams who have just been unlucky?

Most teams have played somewhere around 110 games this season. That leaves plenty of room for unpredictability. If you flipped a coin 110 times, you’d expect to get about 55 heads, right? Well, the binomial distribution says there’s only about a 49.5% chance of the heads total being within even three of that (somewhere between 52 and 58 times). MLB teams are pretty different from coins — they’re a lot more expensive — but I think you can apply the same principle to them. The above calculation for the coin assumes the “true” rate of heads is 50%. What would we see if we were to presume our projections’ estimated preseason win totals are actually representative of the “true” win rates for each team? The following table will show you:

Team Expected Win Rate Actual Win Rate Chance of Fewer Wins Chance of Exact Wins Chance of More Wins Chance of being w/in 3 of actual wins
Angels 0.556 0.464 2.12% 1.18% 96.70% 9.80%
Astros 0.395 0.327 5.90% 2.74% 91.37% 20.60%
Athletics 0.506 0.577 91.77% 2.53% 5.70% 19.09%
Blue Jays 0.525 0.459 7.07% 2.94% 89.99% 21.78%
Braves 0.543 0.598 85.87% 3.85% 10.28% 27.50%
Brewers 0.494 0.423 5.70% 2.53% 91.77% 19.09%
Cardinals 0.519 0.591 92.31% 2.41% 5.27% 18.34%
Cubs 0.481 0.441 17.39% 5.32% 77.29% 36.41%
Diamondbacks 0.519 0.505 34.78% 7.23% 57.98% 47.52%
Dodgers 0.525 0.555 70.20% 6.27% 23.53% 41.97%
Giants 0.519 0.445 5.16% 2.36% 92.48% 17.95%
Indians 0.463 0.559 97.27% 1.00% 1.73% 8.45%
Mariners 0.457 0.468 56.13% 7.34% 36.52% 48.15%
Marlins 0.420 0.391 23.99% 6.44% 69.58% 42.93%
Mets 0.457 0.450 40.31% 7.58% 52.11% 49.44%
Nationals 0.543 0.486 9.79% 3.69% 86.52% 26.53%
Orioles 0.463 0.545 94.92% 1.69% 3.39% 13.40%
Padres 0.469 0.464 42.25% 7.50% 50.25% 49.03%
Phillies 0.512 0.450 8.08% 3.24% 88.68% 23.67%
Pirates 0.494 0.604 98.69% 0.52% 0.79% 4.81%
Rangers 0.549 0.554 49.64% 7.54% 42.82% 49.22%
Rays 0.525 0.595 91.65% 2.57% 5.78% 19.36%
Red Sox 0.519 0.602 95.37% 1.57% 3.06% 12.57%
Reds 0.531 0.545 57.75% 7.24% 35.01% 47.52%
Rockies 0.494 0.460 20.92% 5.81% 73.27% 39.30%
Royals 0.494 0.519 66.17% 6.72% 27.11% 44.59%
Tigers 0.580 0.587 51.75% 7.66% 40.59% 49.91%
Twins 0.432 0.444 56.59% 7.45% 35.95% 48.79%
White Sox 0.494 0.367 0.29% 0.23% 99.49% 2.32%
Yankees 0.525 0.518 40.77% 7.52% 51.71% 49.17%

Now, I’m not saying the preseason projections are great estimates of the “true” win rates of these teams, but I’m not saying they aren’t, either. The correlation between the projected and actual win rates is 0.570, which is pretty decent, considering the inherent unpredictability.  I don’t think you can rule out the validity of any of these estimates, though.

What we see is that even a colossal disappointment such as the Angels could legitimately be a great team that just hasn’t had things go its way. They’ve only won 51 out of 110 games; a legit 0.556 team would be expected to win somewhere between 48 and 54 games out of 110 somewhere around 9.8% of the time, says the binomial distribution (not getting into complicated things like uneven schedules, of course). It had to happen to some teams, you’d think.

The Angels aren’t even the biggest disappointment in the majors this year — that would be the White Sox, who instead of being mediocre, have been horrendous.

On the flip side, we have Pirates, who are the biggest surprise. Dave Cameron points out they aren’t regressing; however, as you can see, there’s still that small chance that they’re actually an overachieving, average team.  In the AL, the Indians and Orioles have far exceeded expectations.

But back to the losers: pretty much nobody was expecting the White Sox to win the division, so let’s talk about the Angels. What has gone wrong with this assortment of highly-paid ne’er-do-wells? Did I mention I’m actually an Angels fan?

For this anticipatory postmortem, let’s take a look at the WAR by position for the Angels:

Preseason Estimate Current Extrapolated Extrap. – Preseason RoS Current + RoS C+RoS – Preseason
C 3.7 2.0 2.9 -0.8 1.1 3.1 -0.6
1B 4.8 1.4 2.1 -2.7 0.8 2.2 -2.6
2B 3.0 2.4 3.5 0.5 1.1 3.5 0.5
SS 3.1 0.8 1.2 -1.9 0.9 1.7 -1.4
3B 2.6 0.2 0.3 -2.3 0.3 0.5 -2.1
LF 5.4 2.4 3.5 -1.9 1.7 4.1 -1.3
CF 5.0 6.5 9.5 4.5 1.7 8.2 3.2
RF 4.1 0.9 1.3 -2.8 0.9 1.8 -2.3
DH 1.9 0.7 1.1 -0.8 0.2 0.9 -1.0
SP 12.7 6.2 9.1 -3.6 3.1 9.3 -3.4
RP 2.6 1.2 1.8 -0.8 -0.1 1.1 -1.5
WAR 48.9 24.7 36.4 -12.5 11.7 36.4 -12.5

The “extrapolated” WARs are the current WARs multiplied by 162/107, to project them out to a full season. The RoS WARs are rest-of-season projections according to ZiPS and Steamer.

So, what can we say about this? Well, first of all, Mike Trout is a huge bright spot. An 8.2-9.5 WAR out of one position is phenomenal. LF has suffered relative to the preseason projections, however, as Trout has played a lot more CF than we expected, due to Trout covering CF more than expected due to Peter Bourjos‘ extended stays on the DL.

Howie Kendrick has not been a disappointment, being on pace to beat out his expected WAR by 0.3. Congrats on not being a failure, Howard.

Hardly anything else has gone well for the team. C.J. Wilson, Jered Weaver and Jason Vargas have gotten good results, but the latter two have missed significant time due to injuries.  Tommy Hanson missed most of the season due to a family tragedy, but hasn’t put up good numbers while active.  Joe Blanton has been a home run-allowing machine.  Offseason acquisition and intended closer Ryan Madson hasn’t pitched at all.  We all know about Albert Pujols‘ and Josh Hamilton‘s underachieving this season, but, really, most of the team performed worse than expected. Is that the result of bad luck or bad expectations? Who can be sure?

I’ll leave you with my spreadsheet. It’s currently outdated by a few days, but you can see how things have changed a little bit since then.  You’ll can change the win range I used (3) in the box on the right, or download the whole thing and mess around with it, update it, etc.

Print This Post

Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?

14 Responses to “Reviewing the Preseason Standings Projections”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Roger says:

    “They’ve only won 51 out of 110 games; a legit 0.556 team would be expected to win somewhere between 48 and 54 games out of 110 somewhere around 9.8% of the time…”

    I absolutely don’t know why you decided to present it in this way. Why not go with the standard statistical convention, which is to report the %chance that a .556 team would win less than or equal to 51 games? Then it’s comparable to standard hypothesis testing.

    Vote -1 Vote +1

    • Steven Ellingson says:

      Roger, it’s a confidence interval, which is, to me at least, a much better way of presenting the data than a 1 -tailed hypothesis test.

      Vote -1 Vote +1

    • Steven Ellingson says:

      Nevermind, I read that wrong. You’re right, that is an odd way of presenting the data. A confidence interval would be centered around the expected value (61), not the actual value (51). I’ve never seen anyone present data in this way before.

      Vote -1 Vote +1

    • kdm628496 says:

      i agree, i’d like to see which preseason hypotheses we can reject with 95% confidence.

      Vote -1 Vote +1

      • Konoldo says:

        Using today’s numbers and using Chi-square tests of independence (no continuity correction) on 2×2 tables for Expected Wins and Losses by Actual Wins and Losses, none of the current preseason expectations can be rejected with a 95% level of confidence. However the White Sox are awfully close with a p-value of 0.0555 (Expected Record:54-55, Actual Record: 40-69).

        Vote -1 Vote +1

        • Awesome, thank you.

          Vote -1 Vote +1

        • I just performed my own chi-square tests on the rounded off expected and current wins and losses and here’s what I came up with:
          Angels 0.055
          Astros 0.171
          Athletics 0.129
          Blue Jays 0.183
          Braves 0.255
          Brewers 0.129
          Cardinals 0.127
          Cubs 0.447
          Diamondbacks 0.704
          Dodgers 0.567
          Giants 0.127
          Indians 0.036
          Mariners 0.849
          Marlins 0.562
          Mets 0.848
          Nationals 0.253
          Orioles 0.088
          Padres 0.850
          Phillies 0.184
          Pirates 0.023
          Rangers 1.000
          Rays 0.128
          Red Sox 0.090
          Reds 0.705
          Rockies 0.452
          Royals 0.564
          Tigers 0.846
          Twins 0.846
          White Sox 0.007
          Yankees 0.849

          There’s a discrepancy between mine and Konoldo’s findings on the White Sox… can anybody weigh in on this? I haven’t done any hypothesis testing in way too long…

          By the way, everybody, the values listed above [are supposed to] relate to the chances that the expected records are no different from the actual records. So, the Angels, for example, just barely make the typical 95% confidence cut (1 – 0.055 = 94.5%), and therefore we can’t quite say that the Angels aren’t really a 0.556 team with 95% confidence (only with 94.5% confidence…).

          Vote -1 Vote +1

        • Hank says:

          you should read the new book by nate silver, the signal and the noise. confidence levels and rejection thresholds are bunk. don’t talk about almost crossing an arbitrarily chosen p=.05, talk about being more or less likely. it’s highly likely the white sox will not hit their preseason expectations.

          Vote -1 Vote +1

        • Konoldo says:

          Steve – After doing a little research, I replicated your results in excel. The difference between our results was you used the chi-square goodness of fit test directly testing against expected win rate. I used 2×2 contingency tables to test whether there was difference in the two groups (actual vs. expected). The null hypothesis assumes both actual and expected comes from the same distribution and calculates an expected 2×2 table. It compares the original 2×2 with the expected 2×2 table. Sorry if this isn’t making much sense.

          Basically, the way you performed the test is actually better then what I did. It directly tests the hypothesis in question.

          Hank – I agree that crossing some arbitrary chosen level of p is silly in many cases (including this one). But p-values themselves give you information on the probability of being correct (or incorrect). I would say, (Using Steve’s correct p-values) there is 99.3% probability they will not hit their preseason expectations.

          Vote -1 Vote +1

        • Konoldo says:

          Sorry correction on my last statement to Hank. Steve said it better in his comment. There is 99.3% chance that White Sox are not a .494 team. There is even a higher chance that they will not meet there preseason win-loss record given that we are 110 games into the season and they would need to go 39-13 in there last 52 games to reach .494 winning percentage. You could probably do some kind of simulations to estimate the probability of that occurring.

          Vote -1 Vote +1

        • Phew, thanks Konoldo.

          Hank, yeah, I’m with you and Nate Silver on that one (haven’t picked up his book yet, though). It doesn’t make sense to view a p-value of .049 much differently than a 0.051. The 0.05 threshold is indeed arbitrary.

          Anyway, if you use the 95% confidence level as your guide, then you can say there’s a about a 5% chance you’ll incorrectly say that a team isn’t legitimately on the level of their preseason expectations. Drawn out over 30 teams, that would mean there’s about a 78.5% chance you’ll come to at least one incorrect conclusion in the bunch.

          Like Konoldo says, this isn’t about how likely they are to eventually reach their expectations this season, if that’s unclear. The White Sox have practically no chance of going 39-13 the rest of the way. If their true win rate is 0.494, they’d have a 0.015% chance of doing that; if their true rate is their current 0.367, their chance is 0.000002% (according to the binomial distribution). By the way, Konoldo, I made a simulator last week that’s based on the binomial distribution. I’ll hopefully be sharing that with you all in the near future.

          Vote -1 Vote +1

    • Yeah, I thought this was an interesting take on it… maybe not. I thought the chance that a win total would be somewhere in the current vicinity would be a relatable concept.

      What you’re asking for is just the result of adding “Chance of Fewer Wins” and “Chance of Exact Wins,” though.

      Vote -1 Vote +1

  2. RA says:

    The question I have is how projection systems consider both PAs and Performance. Are they set up so the mean projection for both is the spit out stat line?

    For example, take two simple elements of the Jose Reyes Steamer projection:
    657 ABs
    .342 wOBA

    So, if I’m correct in how a projection system is set up (and please please, correct me if I’m wrong)–this indicates that the probability of Reyes exceeding .342 wOBA is 49ish% and the probability of missing .342 is 49ish% with a small chance he hits it exactly.

    Likewise, it’s also saying that there is a 49ish% chance Reyes hits 657 ABs and a 49ish% chance he misses.

    So to find the offense part of Reyes total production we’d multiply his production times his playing time.

    Doing so, we get 4 outcomes with this method, each of equal likelihood:
    >.342, >657 ABs
    657 ABs
    >.342, <657 ABs
    <.342, <657 ABs

    Now, this would be ok if all players were the same. But veterans get PA benefits more than rookies do in projection systems, right? Except, I suppose, in the Bill James projection system? But we see again and again that younger players–even especially gifted younger players–don't improve linearly. Their performance is, simply, more highly variable.

    Plus, are injuries functionally random? Based on history, it seems more likely that Reyes plays 120 or 160 games–think of this probability distribution as almost slightly bimodal. For someone like Prince Fielder, it will likely be a tighter, more normal distribution.

    That's why (at least to me) without knowing what the 25th or 75th percentiles for a performance measure (wOBA) AND PAs…these projections aren't really providing a good picture. Which is generally what happens when we just consider the means and not both the means and variances of a data set, right?

    Vote -1 Vote +1

    • kdm628496 says:

      i think you have your means and medians confused. the 50th percentile is the median, not the mean. i can’t speak to how projection systems arrive at their numbers, but i can make you more confused by putting forth the possibility that they actually show the mode, or most likely outcome, after some sort of monte carlo simulation.

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *