I agree that in this analysis every count is taken as being “equal” with respect to the chances of events occurring, when in real life it wouldn’t be, as you suggest. The only context I had thought about adding was F-Strike%, but realized this covers all non-ball events, so it doesn’t really work in this model (other than the four pitch walk, I suppose, since the first pitch must be a ball in this case).

Looking at actual walks by count would be an interesting test. I’m still interested in taking this further, just wondering which way to take it.

]]>That said…I think this is still useful for cases where you have a very small sample of data (since plate discipline numbers stabilize more quickly then K% and BB%), just not so much for predicting if a full season player is going to improve, or decline his BB% or K% next year.

]]>I wonder if the foul% changes when it’s a two-strike count. Hitter trying to protect may increase the probability of a foul. So an overall player’s foul percentage with less than 2 strikes would be less than his foul percentage as he’s fighting off pitches to get to a 9-pitch walk.

It would be interesting to look at an actual player’s % of 4, 5, 6, …9-pitch walks to see if it lines up with the theoretical probability you calculated.

]]>A couple of months ago I tried to do something similar on the expected BB% side of things. I looked at it slightly differently than you. First I calculated these five things:

1. Ball%: (1-Zone%) * (1 – O-Swing%)

2. CallStr%: (Zone%) * (1 – Z-Swing%)

3. SwStr%: SwStr%

4. InPlay%: (HBP + GB + FB + LD) / Pitches

5. Foul%: 1 – Ball% – CallStr% – SwStr% – InPlay%

Then I looked at the probability of drawing a walk on 4 pitches, 5, 6, etc.

Walk4%: (Ball%)^4

Walk5%: (4 choose 1) * (CallStr% + SwStr% + Foul%) * (Ball%)^4

Walk6%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 * (Ball%)^4

Walk7%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%) * (Ball%)^4

Walk8%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%)^2 * (Ball%)^4

Walk9%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%)^3 * (Ball%)^4

Contributions beyond 9 pitches added were negligible.

Didn’t go back as far as you, but as far as my xBB% correlation to actual BB% from the same year, I got R-squared values of:

2012: 0.721

2011: 0.733

2010: 0.748

2009: 0.799

This was for all players with at least 168 PA per season, since I think that’s where I read BB% stabilizes.

Looked at it from a prediction standpoint, but in the general sense it was pretty much identical to using prior year BB% as a predictor. Some years a little better, some years a little worse.

Does this make sense to you? I believe your formula is also using other underlying metrics from year X to give an expected BB% from that same year X, correct?

]]>