Weighting Past Results: Hitters

We all know by now that we should look at more than one year of player data when we evaluate players. Looking at the past three years is the most common way to do this, and it makes sense why: three years is a reasonable time frame to try and increase your sample size while not reaching back so far that you’re evaluating an essentially different player.

 The advice for looking at previous years of player data, however, usually comes with a caveat. “Weigh them”, they’ll say. And then you’ll hear some semi-arbitrary numbers such as “20%, 30%, 50%”, or something in that range. Well, buckle up, because we’re about to get a little less arbitrary.

 Some limitations: The point of this study isn’t to replace projection systems—we’re not trying to project declines/improvements here. We’re simply trying to understand how past data tends to translate into future data.

 The methodology is pretty simple. We’re going to take three years of player data (I’m going to use wRC+ since it’s league-adjusted etc., and I’m only trying to measure offensive production), and then weight the years so that we can get an expected 4th year wRC+. We’re then going to compare our expected wRC+ against the actual wRC+*. The closer the expected to our actual, the better the weights.

 *Note: I am using four-year spans of player data from 2008-2013, and limiting to players that had at least 400 PA in four consecutive years. This should help throw out outliers and to give more consistent results. Our initial sample size is 244, which is good enough to give meaningful results.

 I’ll start with the “dumb” case. Let’s just weigh all of the years equally, so that each year counts for 33.3% of our expected outcome.

 Expected vs. Actual wRC+, unweighted

Weight1

Weight2

Weight3

Average Inaccuracy

33.3%

33.3%

33.3%

16.55

 Okay, so we’re averaging missing the actual wRC+ by roughly 16.5. That means that we’re averaging 16.5% inaccuracy when extrapolating the past into the future with no weights. Now let’s try being a little smarter about it and try some different weights out.

 Expected vs. Actual wRC+, various weights

Weight1

Weight2

Weight3

Average Inaccuracy

20%

30%

50%

16.73

25%

30%

45%

16.64

30%

30%

40%

16.58

15%

40%

45%

16.62

0%

50%

50%

16.94

0%

0%

100%

20.15

Huh! It seems that no matter what we do, “intelligently weighting” each year never actually increases our accuracy. If you’re just generally trying to extrapolate several past years of wRC+ data to try and predict a fourth year of wRC+, your best bet is to just unweightedly average the past wRC+ data. Now, the differences are small (for example, our weights of [.3, .3, .4] were only .03 different in accuracy the unweighted total, which is statistically insignificant), but the point remains: weighing data from past years simply does not increase your accuracy. Pretty counter-intuitive.

Let’s dive a little deeper now—is there any situation in which weighting a player’s past does help? We’ll test this by limiting our ages. For example: are players that are younger than 30 better served by weighing their most previous years heavily? This would make sense, since younger players are most likely to experience a true-talent change. (Sample size: 106)

 Expected vs. Actual wRC+, players younger than 30

Weight1

Weight2

Weight3

Average Inaccuracy

33.3%

33.3%

33.3%

16.17

20%

30%

50%

16.37

25%

30%

45%

16.29

30%

30%

40%

16.26

15%

40%

45%

16.20

0%

50%

50%

16.50

0%

0%

100%

20.16

Ok, so that didn’t work either. Even with young players, using unweighted totals is the best way to go. What about old players? Surely with aging players the recent years would most represent a player’s decline. Let’s find out (Sample size: 63).

 Expected vs. Actual wRC+, players older than 32

Weight1

Weight2

Weight3

Average Inaccuracy

33.3%

33.3%

33.3%

16.52

16%

30%

50%

16.18

25%

30%

45%

16.27

30%

30%

40%

16.37

15%

40%

45%

16.00

0%

50%

50%

15.77

0%

55%

45%

15.84

0%

45%

55%

15.77

0%

0%

100%

18.46

Hey, we found something! With aging players you should weight a player’s last two seasons equally, and you should not even worry about three seasons ago! Again, notice that the difference is small (you’ll be about 0.8% more correct by doing this than using unweighted totals). And as with any stat, you should always think about why you’re coming to the conclusion that you’re coming to. You might want to weight some players more aggressively than others, especially if they’re older.

In the end, it just really doesn’t matter that much. You should, however, generally use unweighted weights since differences in wRC+ are pretty much always results of random fluctuation and very rarely the result of actual talent change. That’s what the data shows. So next time you hear someone say “weigh their past three years 3/4/5” (or similar), you can snicker a little. Because you know better.




Print This Post

Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.


12 Responses to “Weighting Past Results: Hitters”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. J. Cross says:

    I replicated this with seasons post 1960 to get a larger dataset (4020 predicted seasons)

    unweighted by PA:

    system, RMSE
    even weights, 19.75
    5:3:2, 19.72
    1:1:0, 20.38
    5:4:3, 19.65

    weighed by PA

    system, RMSE
    even weights, 19.73
    5:3:2, 19.69
    1:1:0, 20.43
    5:4:3, 19.63

    so, in this dataset 5/4/3 has a slight edge over even weights and 5:4:3 does a bit better than that.

    Vote -1 Vote +1

    • Brandon says:

      Awesome! The larger sample size is definitely a good thing.

      The point remains, though, that weighting years doesn’t dramatically improve your predictions. An increase of (roughly) .1% accuracy is pretty darn small, even though it’s significant with that sample size. The benefits of weighting seem to be mostly cancelled out by the fact that you’re increasing the noise of single-season data.

      Vote -1 Vote +1

      • J. Cross says:

        Agreed. Using absolute error instead of RMSE, the even weighing edges out 5/3/2 although 5/4/3 is still a touch ahead.

        Vote -1 Vote +1

  2. Garbanzo says:

    Only using players with 400+ PA in all 4 years basically guarantees your pool is all at-least-decent players playing at-least-decently, and among that group of players, most of the fluctuation is evidently just noise. You’re basically cutting off all the player who imploded in year 4 and were kept from 400 PA because they sucked (or got seriously hurt, which is less interesting), and those are the guys it would be most interesting to test on.

    Vote -1 Vote +1

    • Brandon says:

      Definitely, and this is something I knew when I was preparing the numbers. The numbers presented above should be considered more strongly when evaluating players that are similar to the players I sampled (players with roughly 400 PA several years in a row). I should have stated that more explicitly in my sampling note.

      It would be interesting to do a follow-up study on ONLY players that had large fluctuations in production/playing time, to see the extent of predictive value to recent fluctuations for those types of players. I’ve got some other pieces I want to get to first, though.

      Vote -1 Vote +1

  3. ronusiah says:

    Hello, everybody, the good shoping place, the new season approaching, click in.
    ( w/w/w.sheptrade.c/o/m )
    (Discount Air jordan shoes) $36,
    (Air Max shoes) $35,
    (Nike shox shoes) $36,
    (Handbags) $39,
    (Sunglasses) $16,
    (wallet) $18,
    (Belt) $17,
    (T-shirts) $20,
    (Jeans) $37,
    (NFL/MLB/NBA)Jerseys $25,
    ( w/w/w.sheptrade.c/o/m )

    Vote -1 Vote +1

  4. samyoung says:

    Good stuff! I’m curious also about how much to weight the current season versus the start-of-season prior. (Obviously for “mid-season” use.)

    Vote -1 Vote +1

    • Brandon Firstname says:

      This would be interesting indeed. I imagine the answer is to weight the current season’s plate appearances very, very slightly more than past seasons.

      A better way to diagnose current seasons though is probably to just look at peripheral stats that stabilize much faster than wRC+ to determine true talent change. There’s just so much noise in wRC+ due to it being affected by the results of batted balls that it takes a pretty big sample size to start becoming reliable.

      Vote -1 Vote +1

  5. rgeryhtr says:

    ▬▬▬▬▬▬▬▬▬▬▬ஜ۩۞۩ஜ▬▬▬▬▬▬▬▬▬▬▬▬▬
    Hi friend, we are a prefession online store(company), you can see more photos and price in our website which is show in the photos
    we take credit card,westernunion,bank transfer,cash,T/T as payment, and free shipping.shoes shox af1 $28-42 free shiping.hellow we operate a good online mall, our website is see our website in the photos attached, we have large of brand new shoes,clothing, handbag,sunglasses,hats etc for sale, our product is 10000000% best quality with the amazing price. You can find the more photos and the price for our product in our website, if interested please email me by we are selling all brand new products.
    OUR WEBSITE: WWW . GOBUYSTYLE2 . COM
    ▬▬▬▬▬▬▬▬▬▬▬ஜ۩۞۩ஜ▬▬▬▬▬▬▬▬▬▬▬▬▬

    Vote -1 Vote +1

  6. samyoung says:

    Dude, this is really cool! Good work, Brandon.

    Vote -1 Vote +1

  7. Jonathan Judge says:

    Nice job.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>