Two separate issues. IMO the most salient point is that post serious injury predictions inherently are hard to do accurately. No way around it. The modeling solution to this is to use error bands, as someone suggested, but they can be hard to do well, and in this case we would need research and a lot of it to do it right.

But if your goal is not to model the uncertainty, but rather to just make a best estimate of performance, IMO you need to start with the same model that is used for normal predictions, with perhaps some regression to the mean added, but still mainly relying on prior performance & other normal considerations such as age. Now, I could be wrong about that, but as a starting assumption in the absence of evidence it seems like a safe, conservative one. And I am pretty certain that whatever system we come up with, prior performance is going to have to at least enter into it. What we REALLY need here is research. Absent that, we are spinning our wheels.

Now IF some regression to the mean is included, the question is which stats we regress. I’d be inclined to think for hitters we would NOT want to regress BB% to the mean – Brad is right on that – and IMO probably not K% either. BABIP and power are areas where we should expect at least a chance of decline (probably depends upon the nature of the injury though). But even there maybe regression to the mean is not the best method. You get a light hitting SS, and your method would end up predicting that the injury would increase his power. Counterintuitive to say the least.

Finally, IMO the real question is this … at what point (in terms of sample size) can we say that post-injury stats are reliable indicators of either (a) a real decline in ability, or (b) lack of decline in ability. IMO int he Utley case, his performance to date is at least some evidence that he is the same hitter he was. But how good is the evidence? The sample is still a bit on the small side to be close to certain. But setting statistics aside, IMO the real risk in Utley’s case is a recurrence of the injury. And of course no amount of statistical analysis can quantify the risk of that. Absent that, I think he’ll be fine.

]]>(1) The method simply assumes a skill altering injury. There is at this point in this case no evidence of that.

(2) Even if there is a skill altering injury, why should be expect the league mean to be a better estimate of the player’s ability than the prior ability? Even player’s with ability altering injuries general retain some – often most – of their prior abilities.

But big picture – even if you want to make the dubious decision of simply throwing prior performance out the window – it seems to me that you would be left in a situation where the correct response to a small sample size should be agnosticism – more data needed, period. Even there, there is no particular reason to regress to the mean. Sure, it would be a better estimator than the actual stats. But not by much, and still not a very good estimator. Small sample size is small sample size, and if you don’t have a reliable starting point (in this case, prior performance), then you can’t make a silk purse out of a sow’s ear.

As an addendum, I recognize the value of including regression to the mean as a small component of a more complex system that includes prior performance. But what’s being done here is another matter – not merely improperly ignoring prior performance, but combining a small sample with speculation (that the player is league average) to yield … a garbage stat.

If I am overly passionate, it is because I think this kind of exercise undercuts the credibility of what statistically inclined baseball analysis is trying to accomplish. It plays into the unfair stereotype of nerds playing with numbers and that aren’t grounded in the real world.

]]>I ran three different calculations with the tool….

1. I ran his 2010 Talent vs Actual scan using only his MiLB combined averages as his regression references. For the “(Player) Season” data I used his 2009 numbers.

2. I ran his 2011 Talent vs Actual scan using only his MiLB combined averages as his regression references. For the “(Player) Season” data I used his 2010 numbers. This method assumes that 2010 was an off-year for him, and that he should perform more like his minor league numbers since he has one more year of growth.

3. I ran his 2011 Talent vs Actual scan using the total combination of his MiLB and MLB averages as his regression references. For the “(Player) Season” data I used his 2010 numbers. The MiLB numbers are the actual numbers he produced – there was no usage of an equivalency calculator. This method takes into account that he has almost 930 PA in the majors and that this may be the true Gordon Beckham that we are seeing.

Here are my results (I apologize for poor formatting):

Scan #1:

………………..2010 Talent…..2010 Stats

xBABIP…………..0.263…..0.297

xHR……………..15.2………….9

xAVG…………….0.241…..0.252

xBB%…………….8.9%…..0.074

xK%……………..16.2%…..0.207

OBP……………..0.362…..0.317

SLG……………..0.491…..0.378

ISO……………..0.193…..0.126

xWalks…………..44.4………….37

xStrikouts……….80.9………….92

xNonHRHits……….94.5………….103

Scan #2:

………………..2011 Talent…..2011 Stats

xBABIP…………..0.271…..0.283

xHR……………..5.6………….6

xAVG…………….0.237…..0.237

xBB%…………….7.4%…..0.058

xK%……………..17.6%…..0.242

OBP……………..0.358…..0.303

SLG……………..0.475…..0.358

ISO……………..0.175…..0.121

xWalks…………..17.8………….14

xStrikouts……….42.5………….52

xNonHRHits……….47.5………….45

Scan #3:

………………..2011 Talent…..2011 Stats

xBABIP…………..0.271…..0.283

xHR……………..5.3………….6

xAVG…………….0.233…..0.237

xBB%…………….7.8%…..0.058

xK%……………..18.9%…..0.242

OBP……………..0.333…..0.303

SLG……………..0.419…..0.358

ISO……………..0.153…..0.121

xWalks…………..18.7………….14

xStrikouts……….45.5………….52

xNonHRHits……….46.5………….45

gyhoys

It is pretty clear that Scan #3 provides the closest-to-actual results. However this might just be a product of who I picked. Beckham has been consistently trending downwards in his level of production. I would be interested to see what would happen with a player that has a down-year sandwiched between two good years, or a top prospect who recovers from his sophomore slump in his 3rd season.

]]>That said, I see no justification for his plate discipline becoming remarkably worse as it does in this model. His knee has little to do with his batting eye so regressing his walk rate towards league average makes little sense. Him swinging more often as “new Utley” is just as likely as him swinging less often to make up for declining skills. I also think we should be regressing his strike out rate to some mid-point between his established levels and league average. Especially since his swinging strike rate is at a career low, and as I understand it, that stabilizes quickly.

Closing thoughts, it’s a good idea, but turning it into a useful calculator will prove an enormous challenge because each injury case is highly individualized. Utley’s power should be regressed to league average, but his walk rate should not and it’s hard to even pick a “best” spot for his strikeout rate. Opting for a one size fits all approach by regressing to league average in all cases leaves too much value on the table.

I really do like the idea though despite the critique. Even if the calculator proves too difficult to perfect, I’m glad to see this type of thinking.

]]>For the league averages, I replaced all of them with Utley’s career numbers. I then replaced the number of CURRENT PA with the number of REMAINING PA for Utley. (In order to calc this number, I took the number of games remaining and multiplied it by 4.2). So I assumed that best-case scenario he should be getting 90 more games (378 PA). Here are the results….

xBABIP…….0.262

xHR…………..13.5

xAVG……….0.250

xBB%………..11.76%

xK%……………14.01%

OBP…………….0.376

SLG……………0.473

ISO……………0.196

xWalks…………44.5

xStrikouts………52.9

xNonHRHits…….70.0

So this would be Utley’s expected performance for the remainder of the season.

Here is one more scan of the data, this time using 70 games (294 PA)

xBABIP………..0.264

xHR…………….10.7

xAVG…………0.251

xBB%…………..11.59%

xK%……………..14.29%

OBP……………0.376

SLG………………0.479

ISO……………..0.199

xWalks………….34.1

xStrikouts……….42.0

xNonHRHits………54.7

Lastly, if you want to use a tool that can predict a player’s talent that has no MLB history, maybe there is a way to incorporate a MiLB Stat Equivalency Calculator into those League Averages….. I will explore this further….

Its a good idea, but it just needs some tweaking

]]>I really expect better from Fangraphs. Wow, the level of badness of this is just astonishing.

]]>