## Estimating True Talent in Past Years

Often I would like to have an estimate of a player’s true talent in a past year. Projection systems are always only focused on predicting future performance based on past results, but what I wanted was the best estimate of the expected performance for a player in a given year, based on his results in that year and the surrounding years.

I wanted to find suitable weights to assign to performance in the given year, plus the years immediately before and after, and have the right amount of regression to the mean. But I kept running into the same mental block; how to assign a weight to the given year’s performance, since that is exactly what I am trying to “predict”?

A suggestion from J.Cross, and help from Tangotiger, on the blog for The Book, was to try splitting the data from the year in question in half, and using one half to “predict” the other half. I could then build a multiple regression model, using the data from the surrounding years, and the league average, that would give proper weights to each year, as well as the necessary amount of regression to the league average.

I split the data in half by taking each player’s performance in games on odd- and even-numbered dates, and then used the even-numbered dates to predict performance on the odd-numbered dates. (The original suggestion was to use odd- and even-numbered innings, but for many players this doesn’t split the data in half. For instance, players who hit in the top few slots of the lineup could end up with over 60% of their plate appearances in odd-numbered innings.)

### Details of the Model

First I will describe details for a model for wOBA; models for On-Base Percentage and Slugging Percentage were similar, and helped to confirm the general framework. The initial model uses the year immediately prior to, and the year immediately after, the year of interest; that is, a three-year window. For the previous and succeeding years, I also used only plate appearances on even-numbered dates, so that the wOBA in each year was estimated using similar numbers of plate appearances. Using Retrosheet’s data, my population was all three-year windows of player’s plate appearances from 1950 to 2010 (so the earliest three-year window was 1950-52, where the year of interest would be 1951). A player would need to have plate appearances in all three years in the window to make it into the model; this gave me 25,038 observations for the model.

For example, the first row in the regression model is Cal Abrams in 1951. His wOBA in odd-numbered dates in 1951 was .382 in 92 plate appearances. His wOBA in even-numbered dates was .315 in 1950, .388 in 1951, and .269 in 1952 (in 27, 94, and 53 plate appearances respectively). League wOBA in 1951 was .329. Each row was weighted in the model by the total number of plate appearances for that row; i.e., 266 plate appearances for Cal Abrams (1951).

This model produced the following estimation equation:

0.151 x (wOBA in Year-1) + 0.336 x (wOBA in Year) +

0.200 x (wOBA in Year+1) + 0.321 x (League wOBA)

All terms were significant, with standard errors of approximately 0.006 for all four terms. The average number of PA used in finding the wOBA for each term was 140 for Year-1, 147 for Year, and 140 for Year+1.

It is customary to convert these regression coefficients to weights that can be applied to each year’s performace, by dividing each coefficient by the 0.336 for (wOBA in Year). Doing this gives factors of 0.45 for Year-1, 1 for Year, and 0.60 for Year+1. The r for the model was 0.60, so using the method suggested by Tangotiger, this implies that we need to add 200 PA of league-average wOBA for the regression to the mean component. 3-year models for OBP and SLG produced similar factors.

I repeated this model with 5-year windows, so each observation consisted of performance from Year-2, Year-1, Year, Year+1, and Year+2, which left about 16,600 observations. I did the same for the models for OBP and SLG.

### Conclusions and Example

Taking all of these models as a whole, they suggest round weighting factors of .5^n for Year-n, and .6^n for Year+n, with 200 PA of league-average performance added for regression to the mean. Since it may offend some of our sensibilities to have different factors for Year-n and Year+n, and to make things easier, we can use 0.55^n for both Year-n and Year+n. (I could not come up with an explanation for the fact that Year+n has a greater weight than Year-n, but the phenomenon persisted in every regression and subset I tried. Since I like fractions I will probably use 5/9 in the future.)

Returning to the Cal Abrams (1951) example, we would estimate his true talent On-Base Percentage in 1951 as

.5 x (18 times on-base in 1950) + (78 times on-base in 1951) +

.6 x (67 times on-base in 1952) + (200 x .336 league-average OBP)

divided by

.5 x (53 PA in 1950) + (186 PA in 1951) +

.6 x (189 PA in 1952) + (200 league-average PA)

which gives a final estimate of .370.

A more accurate true talent estimate could be found by applying an age adjustment to the performance from the preceding and succeeding years. For most players this will not change much, since for young players, the adjustment for the preceding years will be positive and that for succeeding years will be negative, and vice versa for older players. But certain players, such as those at age 27 (where the age adjustments on both sides would need to be positive), and players at the beginning or ending of their careers (where data is not available before or after the year in question), the age adjustments could be more important.

Print This Post

Great stuff, thanks for doing all that work.

Yes, very nice work!

I would like to see this for something like WAR per plate appearance.

Regarding the weights for later years being greater than weights for prior years:

I could come up with a few reasons. Survivor bias is a likely culprit to investigate. Aging, of course, is another. Injuries are another possible issue. I’m still trying to wrap my head around how injuries would effect this, but I think they might.

Excellent. I’ve always wanted to have something like a Diamond Mind replay of the true talent level — say, if I estimated the true talent of everyone in 1986 with 1985, 1986 and 1987 data and some MLEs, how do the standings look? How many games would the Mets have won on average if I simulated 100 seasons of this?

DSMok1, I thought about those, but I didn’t check anything like that yet. I couldn’t understand how aging or survivor bias could affect the results in this way. It’s not that the year+1 seasons are *better* than the year-1 seasons, it’s that they are more predictive of performance in year n.

Charles, that would be great, and something similar is what started me thinking about the whole deal. Not Diamond Mind in particular, but what if simulations used some estimate of true talent instead of actual statistics. One complication, which would affect how interesting of a game it would be, is that regressing to the league mean is not exactly right. Probably it would be better to regress to a positional average. Otherwise you would get stuff like: catchers with no playing time would be better than ones that actually played, because the ones with no playing time would be regressed to the league average, which is higher than a normal catcher. Unless you could use MLEs as you say, although that’s not easily available for most years.

It’s strange, it doesn’t seem like this section has “threaded” replies like the normal Fangraphs articles.

You’d want to regress to a playing time-based mean. One thing with which I’ve been piddling is how to do that. Players who get 600 PAs collectively hit vastly better than players who get 5 PAs, and the effect is too consistent to be selective sampling, since players who get 25 PAs outhit players with 5 PAs. The real work is how much predicative value there is in that, what the true talent level of those players getting X PAs. Non-pitchers with less than 10 PAs (I used 1980s AL data) hit like pitchers, but how much do they hit the next year?