Maybe the error bars on the projections are bigger than the amount of difference between the run estimators in these cases

How can you talk about error without knowing the margin of error for a specific statistic? This is my problem with many stats; I know they have flaws but don’t know the magnitude of those flaws.

2 runs difference could be huge if the margin is .2 but tiny if the margin is 10.

In this case, standard error is relatively easy to calculate. According to linear weights, each plate appearance is a multinomial event, which I presume are independent and identically distributed. After n events, we weight each event and find the sum. The variance of an iid weighted sum is a weighted sum of the variances, where the weights are squared.

For example, let us assume a batter only has two possibilities: get on base or produce an out. The latter is good for +0.5 runs, the latter for – 0.25 runs. Now pretend a player gets on base at a rate of 0.350 of the time. Over 500 plate appearances, we expect this player to accrue
175*.5 – 325*.25 = 6.25 runs.
The variance of a single event is n*p*(1-p), which is equivalent to
175*.650*.25 + 325*.350*.1025 = 40.1
Take the square root to get a standard error of 6.33

Note the variance will increase as n increases as we are finding the variance of a sum rather than a mean. Furthermore, power hitters will have considerably more variance, as their more probable outcomes are associated with large weights. A small change in HRs will have a much bigger change than a small change in singles, leading to more variance.

Comment by guesswork — December 12, 2011 @ 4:38 pm

The multinomial actually has a non-trivial VC matrix, so you have to take that into account when calculating the variance of a projection of the vector of outcomes onto the reals (such as wOBA).

Comment by Barkey Walker — December 13, 2011 @ 12:38 pm

Good point. That actually simplifies things a lot now that I think about it. The variance of the multinomial is n(Ip – ppT) where I is the identity matrix, p is a column vector of probabilities, and T just means transpose. Then let w be our column vector of weights, so the variance of linear weights would be

wT(n(Ip – ppT))w

For my example, that comes out as 10.859, so our standard error is 3.295.

Thanks!

Comment by guesswork — December 13, 2011 @ 1:39 pm

guesswork, you used a binomial example where the VC matrix is rank1 (i.e. you don’t need a matrix).

Comment by Barkey Walker — December 14, 2011 @ 2:30 am

Maybe the error bars on the projections are bigger than the amount of difference between the run estimators in these casesHow can you talk about error without knowing the margin of error for a specific statistic? This is my problem with many stats; I know they have flaws but don’t know the magnitude of those flaws.

2 runs difference could be huge if the margin is .2 but tiny if the margin is 10.

Comment by Anon — December 12, 2011 @ 3:32 pm

In this case, standard error is relatively easy to calculate. According to linear weights, each plate appearance is a multinomial event, which I presume are independent and identically distributed. After n events, we weight each event and find the sum. The variance of an iid weighted sum is a weighted sum of the variances, where the weights are squared.

For example, let us assume a batter only has two possibilities: get on base or produce an out. The latter is good for +0.5 runs, the latter for – 0.25 runs. Now pretend a player gets on base at a rate of 0.350 of the time. Over 500 plate appearances, we expect this player to accrue

175*.5 – 325*.25 = 6.25 runs.

The variance of a single event is n*p*(1-p), which is equivalent to

175*.650*.25 + 325*.350*.1025 = 40.1

Take the square root to get a standard error of 6.33

Note the variance will increase as n increases as we are finding the variance of a sum rather than a mean. Furthermore, power hitters will have considerably more variance, as their more probable outcomes are associated with large weights. A small change in HRs will have a much bigger change than a small change in singles, leading to more variance.

Comment by guesswork — December 12, 2011 @ 4:38 pm

The multinomial actually has a non-trivial VC matrix, so you have to take that into account when calculating the variance of a projection of the vector of outcomes onto the reals (such as wOBA).

Comment by Barkey Walker — December 13, 2011 @ 12:38 pm

Good point. That actually simplifies things a lot now that I think about it. The variance of the multinomial is n(Ip – ppT) where I is the identity matrix, p is a column vector of probabilities, and T just means transpose. Then let w be our column vector of weights, so the variance of linear weights would be

wT(n(Ip – ppT))w

For my example, that comes out as 10.859, so our standard error is 3.295.

Thanks!

Comment by guesswork — December 13, 2011 @ 1:39 pm

guesswork, you used a binomial example where the VC matrix is rank1 (i.e. you don’t need a matrix).

Comment by Barkey Walker — December 14, 2011 @ 2:30 am