Wilin Rosario: Estimating BB and K Using Plate Discipline

In September, teams are allowed to expand their rosters and the Rockies did that in 2011 by calling up Wilin Rosario. Rosario showed a bit of pop, but had some problems making contact. Going into 2012, questions about his ability to not strike out existed. By using a small sample size of a hitter’s swing and contact values, a better estimate of his walk and strikeout rates can be estimated.

The Rockies began the 2012 season with Ramon Hernandez as their #1 catcher and Wilin Rosario was slated as the backup even though Rosario was a highly touted ranked prospect (#49 in 2011, #87 in 2012). The main reason the Rockies didn’t have any faith in Rosario was his plate discipline. In the minors, his BB% ranged between 4.5% and 8.7% and his K% between 19.2%-29.9%. In 57 MLB plate appearances, his BB% was 3.5% and his K% was 35.1%. These values forced people to have reservations about him being able to stick in the majors.

In the 2011 FG+ fantasy preview, Paul Swydan wrote the following on Rosario:

Swinging at every pitch thrown to you is only a good strategy for a hitter if you have enough bat control to hit or foul off nearly every pitch thrown to you (see Guerrero, Vladimir). Wilin Rosario is not this type of hitter, and his acceptable plate discipline in the low minors has steadily worsened as he has moved up the Rockies’ organizational ladder.  ….. Rosario still needs to fine tune his game — particularly his plate discipline — and is unlikely to contribute to your team no matter where he starts the season.

Instead of using BB% and K%, a player’s estimated K% and BB% can be determined by using swing and miss values. To get an idea of this value, I created a formula using (See Appendix) O-Swing%, K-Swing%, O-Contact% and K-Contact% plate discipline values.

By plugging Rosario’s 2011 plate discipline numbers into the spreadsheet, his 2011 plate discipline numbers would be 22% K% and 6% BB%. While the BB% is fairly close to his actual value (4%), the K% is off by 13 percentage points.

With questions surrounding his plate discipline in 2012, he saw is K% end up at 23%. This was within 1% point of what his 2011 estimated K%. With reasonable plate discipline, he was able to put up a decent season (1.8 WAR in 426 PA). Using a second method to calculate a Rosario’s K% and BB% helps to get a better picture of his true talent level.

Rookies, like Rosario, are called up and get a small number of plate appearances. By using a player’s plate discipline numbers, the player’s walk and strikeout rates can be estimated. The estimate can help determine if the player’s talent level is significantly different than their stats suggest.

I wanted a formula to help estimate a player’s K% and BB% using the plate discipline values available at FanGraphs. The formula create wouldn’t be a prediction (as it contains no regression) or stat that stabilizes fast.

I took every player that had over 200 PAs in a season from 2002, when plate discipline numbers are first available at FanGraphs, to 2012. I ran a linear regression against over 3500 seasons and came up with the following two formulas:

BB% = -0.228 x O-Swing% -0.139 x Z-Swing% – 0.030 x O-Contact% -0.257 x Z-Contact% + 0.437
R-Squared = 0.45

K% (K/PA)
K % = 0.248 x O-Swing% -0.345 x Z-Swing% – 0.153 x O-Contact% -0.837 x Z-Contact% + 1.169
R-Squared = 0.79

I have gone ahead and saved people some time and uploaded a spreadsheet to the Google Docs that will automatically do the calculations.

To use the sheet.

1. Download the spreadsheet by using the “Download As” feature under File.
Go to the players page at FanGraphs, minimize minor league data, go to the Standard stat area and copy the all the data going back to 2002.

2. Go to the downloaded spreadsheet and paste the data with the upper left corner being the left yellow box.

3. Go back to the player’s FanGraphs page and copy the (non-Pitch F/X) Plate Discipline values.

4. Go back to the downloaded spreadsheet and paste the data with the upper left corner being the right yellow box.

5. Once the data has been added to the spreadsheet, the player’s real and estimated K% and BB% will be calculated.

Print This Post

Jeff writes for FanGraphs, The Hardball Times and Royals Review, as well as his own website, Baseball Heat Maps with his brother Darrell. In tandem with Bill Petti, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @jeffwzimmerman.

10 Responses to “Wilin Rosario: Estimating BB and K Using Plate Discipline”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Yo says:

    The paragraph immediately after the Swydan excerpt needs to be cleaned up a bit. Solid article, though!

    Vote -1 Vote +1

  2. payroll says:

    Interesting. There has been some limited grumbling in these parts about Mauer’s spike in strikeouts this year, so I pulled his data into your report. According to your model, he should probably have struck out even more, and walked less than he did. Estimated 15.5% k rate to 10.2% walk rate. Actual, 13.7 to 12.5.

    Vote -1 Vote +1

  3. Jon Roegele says:

    Interesting work, Jeff.

    A couple of months ago I tried to do something similar on the expected BB% side of things. I looked at it slightly differently than you. First I calculated these five things:

    1. Ball%: (1-Zone%) * (1 – O-Swing%)
    2. CallStr%: (Zone%) * (1 – Z-Swing%)
    3. SwStr%: SwStr%
    4. InPlay%: (HBP + GB + FB + LD) / Pitches
    5. Foul%: 1 – Ball% – CallStr% – SwStr% – InPlay%

    Then I looked at the probability of drawing a walk on 4 pitches, 5, 6, etc.

    Walk4%: (Ball%)^4
    Walk5%: (4 choose 1) * (CallStr% + SwStr% + Foul%) * (Ball%)^4
    Walk6%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 * (Ball%)^4
    Walk7%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%) * (Ball%)^4
    Walk8%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%)^2 * (Ball%)^4
    Walk9%: (5 choose 2) * (CallStr% + SwStr% + Foul%)^2 *(Foul%)^3 * (Ball%)^4

    Contributions beyond 9 pitches added were negligible.

    Didn’t go back as far as you, but as far as my xBB% correlation to actual BB% from the same year, I got R-squared values of:

    2012: 0.721
    2011: 0.733
    2010: 0.748
    2009: 0.799

    This was for all players with at least 168 PA per season, since I think that’s where I read BB% stabilizes.

    Looked at it from a prediction standpoint, but in the general sense it was pretty much identical to using prior year BB% as a predictor. Some years a little better, some years a little worse.

    Does this make sense to you? I believe your formula is also using other underlying metrics from year X to give an expected BB% from that same year X, correct?

    Vote -1 Vote +1

    • Matthias says:

      I like what you did with the walk probabilities.

      I wonder if the foul% changes when it’s a two-strike count. Hitter trying to protect may increase the probability of a foul. So an overall player’s foul percentage with less than 2 strikes would be less than his foul percentage as he’s fighting off pitches to get to a 9-pitch walk.

      It would be interesting to look at an actual player’s % of 4, 5, 6, …9-pitch walks to see if it lines up with the theoretical probability you calculated.

      Vote -1 Vote +1

      • Jon Roegele says:

        Thanks Matthias,

        I agree that in this analysis every count is taken as being “equal” with respect to the chances of events occurring, when in real life it wouldn’t be, as you suggest. The only context I had thought about adding was F-Strike%, but realized this covers all non-ball events, so it doesn’t really work in this model (other than the four pitch walk, I suppose, since the first pitch must be a ball in this case).

        Looking at actual walks by count would be an interesting test. I’m still interested in taking this further, just wondering which way to take it.

        Vote -1 Vote +1

  4. dcs says:

    Why not also include Zone%, which has a very significant effect on BB (and K)?

    Vote -1 Vote +1

  5. rockymountainhigh says:

    You left out the part about him being a god that walks among men.

    Vote -1 Vote +1

  6. slash12 says:

    I also took a stab at this equation a few years back, but ultimately I just didn’t find it to be a very good predictor of future success, people who beat the model (had better K% or BB% then they should have) seemed to consistently continue to do so, indicating to me, that there was something more that they were doing that the equation wasn’t accounting for.

    That said…I think this is still useful for cases where you have a very small sample of data (since plate discipline numbers stabilize more quickly then K% and BB%), just not so much for predicting if a full season player is going to improve, or decline his BB% or K% next year.

    Vote -1 Vote +1

  7. Spit Ball says:

    Very cool stuff. A definitive step in the right direction of greater understanding.

    Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *