Projection vs Projection

March 30, 2009

It’s almost opening day, and it seems like everyone is talking about projections.

When considering a projection, there are really two questions to be answered – what is the player’s “True Talent Level” right now, and how will he perform next year? Between now and the end of next year, his talent level very well might change, as he’s a year older and might recover from or succumb to injuries. Even then, there’s still the random variance of a single season performance. In this article I’d like to explore how some of the major projection systems work when predicting different subgroups of players.

I tested the following projections: PECOTA (2006-2009), ZiPS (2006-2009) CHONE (2007-2009) and my own Oliver (2006-2009).

By wOBA

The first test was to group the yearly projections to the nearest .010 of wOBA, and then see how that group of players actually performed. There were 468 players who had projections from all four systems, and had at least 350 plate appearances in the major leagues in the following season. As 2009 is yet to be played, and CHONE is not available for 2006, these projections to next year comparisons are for the 2007 and 2008 seasons. All four projections were tested on the same 468 players. The observed results were unadjusted major league stats, so that the results of the test would not be influenced by which park factors or MLE formulas I chose to normalize stats.

To read the results, CHONE of the players would have a wOBA between .375 and .385, averaging .380, 25 of them had 350 or more PAs in MLB in the following seasons, and those 25 players had an average wOBA of .363, so at that level CHONE was .017 high. Oliver was .008 high on 21 projections, PECOTA .027 high on 26, and ZiPS .014 on 26. The last line of the table shows the root mean square error (weighted by number of players). Oliver had the lowest mean error at .006, followed by CHONE .011 and PECOTA and ZiPS at .012 each.

wOBA	CHONE		Oliver		PECOTA		ZiPS
Obs	Players	Error	Players	Error	Players	Error	Players	Error
0.250	0	0.000	0	0.000	0	0.000	1	-0.067
0.260	0	0.000	0	0.000	1	-0.041	1	-0.018
0.270	2	-0.057	1	0.001	3	-0.013	1	-0.043
0.280	2	-0.018	4	-0.036	2	-0.045	4	-0.022
0.290	8	-0.033	9	-0.017	11	-0.030	13	-0.020
0.300	14	-0.005	23	-0.010	20	-0.013	20	-0.012
0.310	29	-0.006	33	-0.002	31	-0.007	19	0.003
0.320	44	-0.005	53	-0.005	37	0.002	51	0.000
0.330	74	0.004	81	-0.002	58	0.003	56	0.000
0.340	91	0.000	87	-0.003	66	0.004	66	0.002
0.350	57	0.004	68	0.001	80	-0.004	74	0.001
0.360	50	0.009	48	-0.003	56	0.011	55	0.012
0.370	34	0.011	21	-0.004	33	0.012	36	0.012
0.380	25	0.017	21	0.008	26	0.027	26	0.014
0.390	9	0.003	10	-0.002	17	0.014	19	0.020
0.400	13	0.019	5	0.020	15	0.019	7	0.017
0.410	7	0.017	2	0.011	5	0.017	4	0.019
0.420	4	0.037	1	-0.049	5	0.027	6	0.029
0.430	2	0.047	1	0.001	1	-0.035	5	0.041
0.440	2	-0.009	0	0.000	1	0.018	3	0.023
0.450	1	0.025	0	0.000	0	0.000	1	0.026
rms	468	0.011	468	0.006	468	0.012	468	0.012

By Age

The same 468 players, same rules, but now the players are grouped by age. The combined rms error is about the same for all, at .007 for Oliver and .008 for the other three. CHONE and ZiPS are a few points of wOBA high for most ages. Oliver under projects younger (pre-peak) players at .005-.010 points of wOBA, and over projects older players about the same amount. PECOTA is the opposite, being a little high for the younger players and a little low for the older ones. Oliver shows the lowest total error (bias) of -.002, but because of it’s error correlating with age, Oliver shows the highest r2 correlation factor of .206 (for ages 21-35, which have 12 or more players each).

Age	Players	PA	CHONE	Oliver	PECOTA	ZiPS
19	1	411	-0.004	-0.007	0.016	-0.022
20	3	1485	0.017	0.022	0.026	0.014
21	12	6587	0.003	-0.002	0.005	0.006
22	23	12205	0.002	-0.006	0.011	0.004
23	38	21423	0.001	-0.009	0.001	0.001
24	36	20677	-0.002	-0.009	0.002	0.002
25	37	20538	0.000	-0.006	0.001	0.002
26	39	21891	0.011	0.005	0.014	0.016
27	44	23580	0.003	-0.004	0.007	0.007
28	35	19038	-0.010	-0.011	-0.008	-0.005
29	34	17434	0.010	0.001	0.006	0.008
30	32	18491	0.007	-0.006	0.003	0.004
31	37	19013	0.020	0.008	0.011	0.015
32	24	13975	0.002	-0.004	-0.004	0.000
33	18	9702	0.003	0.004	-0.001	0.003
34	17	8545	0.005	0.004	0.012	0.014
35	13	7063	-0.001	-0.003	-0.003	-0.002
36	7	3714	0.000	0.001	-0.004	0.001
37	5	2295	0.007	0.010	-0.011	0.005
38	5	2580	-0.010	0.009	0.008	0.000
39	6	2699	0.009	0.008	0.023	0.007
40	1	548	0.026	0.031	0.037	0.036
41	1	434	0.016	-0.061	-0.005	0.011
rms	468	254328	0.008	0.007	0.008	0.008
bias	468	254328	0.004	-0.002	0.004	0.005
r2	468	254328	0.031	0.206	0.037	0.033

In the final part of this series, I’ll look at how minor league performances are evaluated.

19 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Paul Scott

15 years ago

Just a few suggestions:

I think two other “projection” systems should always be added to these “projection surveys.” The first is just a mean of each players last three years. This would serve as a baseline test. Secondly, Marcels should be included, given it’s simplicity. My feeling on this one is that if you can’t do better than Marcels by a reasonable degree then you really need to evaluate whether it is worth your time to do the projection.

Finally, though I understand this is much harder, I like the look you are taking on your age correlation. Since all of these systems are very likely to be close in overall accuracy, the most interesting and meaningful factor is bias. You should consider breaking out more categories – a recent (if dense) study (linked to from and discussed on The Book Blog) indicated PECOTA likely has a bias overvaluing speed. I’d like to see a lot more done in evaluating biases of projection systems as this sort of thing could lead to an understanding of some real effect in baseball.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG