The Fans’ Playing Time Projections
Tango has already covered the optimism in the FAN projections a couple of times — and why being so optimistic might not be that bad — but I wanted to look at it from another angle. I noticed that the fans project much more playing time than other projection systems. This is particularly evident at the top. CHONE projects Jimmy Rollins to have the most PAs at 682, the fans have 12 players with more PAs than that (including three with over 700), and, while CHONE has only three players with over 650 PAs, the fans have 53. Note: I understand that fans project number of games played and batting order and the system extrapolates number of PAs. I am using PAs because that is what is displayed on the projection page and makes for the easiest comparison.
How reasonable is it to project any given player will get 700 PAs? Obviously, every year some players get 700 PAs, but can we identify them beforehand? There will be more than three players who get over 650 PAs in 2010, but before the season starts can we really pick 50 players more likely than not to get 650 PAs?
I am going to assume that most people use past performance (number of games or PAs) to project how many a player will get in 2010. To see how well this works I got three groups of players. For group one I found players who had three consecutive years averaging between 725 and 675 PAs, then I looked at how many PAs they got in the next year. Group two I did the same thing for players who had three years averaging between 675 and 625 PAs. And for group three 625 to 575 PAs. Here are the number of PAs each of these three groups got in the next year. Along the x-axis is number plate appearances and along the y-axis the fraction of the group that got that many or more PAs. So each curve monotonically decreases as if you got over 101 PAs then you must have also got over 100.

The first thing to note is that the group of players with more PAs in the previous three years had more in the given year. That is, a greater fraction of group one had 400 or more PAs than group two and a great fraction of players in group two had 400 or more PAs than group three. And this trend holds for almost any number of PAs. This should not be surprising. Players in group one are probably better, healthier and hit higher in the order, on average, than those in groups one or two. So it seems perfectly valid to use past PAs as a predictor for number of PAs in the future.
But that doesn’t mean that you just use the past average as your prediction. The horizontal line at p=0.5 shows the median number of PAs for each group. These values are ticked off on the x-axis. They are 667 for group one, 630 for two and 557 for group three. Although players in all three groups still go a larger number of PAs it was less than they had averaged in the past three years — they regressed to the man. There is nothing special about three years (I just chose it to get players with a history of playing in lots of games) you would see the same drop off if you just chose players who averaged a large number PAs in the two previous or just the last year.
Next I highlight the fraction of each group that gets 700 or more PAs. That is the vertical line. The tick marks on the y-axis show the intersection points: 24% for group one, 12% for group two and 1.5% for group three. So less than a quarter of players who averaged about 700 PAs for three years got 700 PAs the next year.
Part of this is aging. In a given year, a player is older than he was three years ago, duh, and probably more likely to be injured and have fewer PAs. But part of it is that getting 700 PAs is part skill, being a good, healthy player who bats at the top of an order, and part is luck, not having a fluke injury. CHONE knows that there is a chance that any player has a crash in his playing time even with a long history of over-700-PAs, over-150-game seasons and a starting job leading-off. So it rarely projects over 650 PAs. Just because someone has played in over 150 games for a number of years we cannot expect him to play over 150 games in the next.

29


Awesome post, Dave. I really liked it.
A small part of this is also having a superior offense – a team with a better offense will have players who make fewer outs and thus allow their team to hit more.
I hope the FANS projections are not modified in any way. I prefer to do the adjusting myself. I want to know what playing time the FANS projected.
I would say this:
1. I and/or David will adjust as best as we think it should be
2. We’ll create a dump of the raw data file (minus any user-specific information), so the reader can do anything he wants with the data
This keeps everyone happy, right? And, at the end of the season, we can see how we SHOULD have done the adjustments that minimized the errors, so we don’t have to make this an annual discussion.
Cool?
Sounds good to me. Thanks for saving off the raw non-adjusted data.
vr, Xei
Here’s the problem. Since 2006 there has not been any LESS than 14 players to reach 700 PA. That is not out of line with the analysis presented here, there is a 25% probability that a player that made that plataeu (or came close) for 3 years will make it again. Every year more than 30 players fall into the 675+ PA group, so we will have 6-8 repeat performances.
That said, if CHONE has a max projection PA of 684 then it is understating the potential PA for the group. Whether or not we can predict WHO will make 700 PA is one question, but several people WILL make 700 PA every year and the projections should account for that reality just as much as injury risk and age regression.
Projection PA should be set as if they were a Vegas Over/Under line imo.
vr, Xei
I disagree (with thumble not with Xei who snuck in between). Just because we know that some players will reach 700+ PAs does not mean that we should predict that any one specific player will.
The point of projections is not to guess how may PAs (or HRs, RBIs, etc.) the league leader will have, but to guess as best we can each player’s performance.
“The point of projections is not to guess how may PAs (or HRs, RBIs, etc.) the league leader will have, but to guess as best we can each player’s performance.”
Wouldn’t that still shed a poor light on CHONE? At the very least, one would assume that 25% of last year’s 700PA group would be there again. How can CHONE claim to have the “best guess” for all players when exactly zero reach that benchmark?
I’d think the distribution of 500/600/700 pa players should be relatively consistent, and I’d rather see wrong ‘guesses’ at who reaches 700 versus no guesses at all.
I’m much more interested in projections than predictions though, so perhaps I’m on the wrong side of this…
Yeah we know that some players in the group will get over 700 PAs, but that does not mean that we can expect one specific player to. So projecting 700 PAs for that any one player means we will almost always give him too many PAs and get his projection wrong by a large amount.
If what you want out of a system is to minimize the average difference between each player’s projection and performance then doing this will hurt the projection system. It will cause you to miss madly because you cannot predict before hand who are those players that will get 700 PAs and who are the players, like Jose Reyes, who will get just 150 PAs after four straight 700+ PA seasons. I think that more people are interested in a system that looks to predict each player as close as possible than a system that guesses the correct number of >700 PA players even if gets the wrong ones.
Jimbo (and thimble), I think both of you should read this article by Tango about this issue:
http://www.hardballtimes.com/main/article/forecasting-2006/
“The highest forecasted RBIs were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.
But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that’s not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.”
Right, and the projection has an confidence range associated with it. If the projection system has everyone under 685 PA then the ranges used for the projections are very large (probably true) or the top percentile probability of this particular range is smaller than the demonstrated results.
In other words, the article states that a player like Rollins may have a 25% probaility to achieve 700 PA, but the projection of 684 would mean either a range of 724-644 PA (guessing here, but it looks reasonable) or a probability that is lower than 25%. In the first case, you can argue that Rollins IS projected to make 700 PA provided he is injury free, but the second case will consistantly under-project the performance.
Zach,
Thanks for linking that article. It looks like Tango made this same point in a clearer manner and four years ago.
Excellent. It would be fun to break it down by usual batting order spot (1-2, 3-4, 5-6, 7-8) might be a logical way. You are going to get a disporportionate number of 675-725ers who bat 1-2 and are likely to bat 1-2 again.
For most productive players, predicting plate appearances is equivalent to predicting injuries. It is essentially a futile exercise, but the best predictor is still probably past injury history and age-injury correlation in recent history, as well as team depth and projected line-up spot.
I completely disagree with this article (not that it’s bad, this is just my opinion), and this is why I am against Chone being used to project individual players. It is completely pointless. It is designed to predict group performance, not individual player performance. What is the point of having a projection system that docks everyone performance/playing time because a certain number of players in a group will perform below expectations due to injury/ineffectiveness and a certain number will perform above expectations?
For example, Ichiro Suzuki had 700+ PA’s 9 years in a row, and had 678 last year. CHONE’s projection of 648 is simply asinine. It is of no value whatsoever. The FANS’ projection seems about right, at 710 PA’s. That is still less than the rest of his career, and CHONE is projecting him to basically fall off a cliff. We can’t just look at what groups of players have done and say that Ichiro is unlikely to reach 700 PA’s, we have to look at Ichiro as an individual.
Rollins is another case. His career has been injury-free other than a slightly shortened 2008. 2008 was the only time he has ever dipped below 689 PA’s, and only the second time he has dipped below 700. Yet you are praising Chone for seeing 2008 and projecting 680 PA’s and calling out the fans for projecting 700? First of all that seems like nit-picking to me, and second of all, when Rollins has reached 700 in 7 of his 9 seasons, why is it so unreasonable to predict that he does it next season?
I am much more interested in the Fans’ projections, because they actually attempt to project the individual player.
Also, you take a group of players who averaged 675-725 PA’s and make the minimum threshold for the next year 700 PA. That seems like a little bit of tweaking to me. What group of players did you use during which years? I would be curious to see.
To project the expected outcome for a player to over 700 PA, you would have to be almost certain that his chance of injury, even a minor injury, is almost zero, which is pretty much never the case. Even if a player hasn’t been injured or has only had minor injuries over several years, there is still always a chance the player will miss time in the future.
In baseball history, there have been 7 players who have had 5 straight 700 PA seasons (Lou Brock, Dave Cash, Juan Pierre, Cal Ripken, Pete Rose, Ichiro, and Miguel Tejada). What would have been “asinine” PA projections for these players? Surely anything much below 700, since these are the most consistent track records in the history of the game. Even more consistent than Rollins, for whom, even with time missed in 2008, 680 PA seems unreasonable. So these must be guys that you would have absolutely known would have given you 700+ PAs the next year.
Here are their actual PA totals in the year following their 5-year runs:
675 Brock
203 Cash
406 Pierre
689 Ripken
770 Rose
752 Ichiro
568 Tejada
……………….
580 Average
Only 2 of them ended up with 700 PAs in their 6th year. Similarly, these 7 players represent the only 7 of the 23 players to have 4 straight 700 PA seasons to repeat the feat in a fifth year. Those 23 were the only ones of the 41 players to get to 3 straight 700 PA seasons to repeat for a fourth year, and those 41 were the only ones of the 120 players to have 2 straight 700 PA seasons and then repeat for a third year.
So the problem is, how do you tell that Jimmy Rollins and Ichiro are the ones for whom sub-700 PA projections are asinine and that Brock and Cash and Pierre and Ripken and Tejada are not those kind of players? Or were they all in the same category, and it is just that most players’ actual PA figures end up being asinine?
R M,
Yeah 700 PAs was just an example that it is very hard to reach such a high number of PAs.
The player pools were pulled from all years back to 2000. The representatives from to the 675 to 725 group for 2009, meaning they averaged that number of PAs in years 2006-2008, were Brian Roberts (717 PAs in 2009), Adrian Gonzalez (681), Miguel Cabrera (681), Dan Uggla (668), Justin Morneau (590), David Wright (618), Jimmy Rollins (725), Michael Young (593), Garrett Atkins (399), Ryan Howard (703), Jeff Francoeur (632), Derek Jeter (716), Chase Utley (687), Hanley Ramirez (652), Orlando Cabrera (708), Raul Ibanez (565) and Bobby Abreu (667).