# Controlling the strike zone and batting average

Peripheral statistics are brought up a good deal in the discussion of pitchers. Strikeouts, walks and home runs, as well as other statistics, can be converted into rate stats that give a decent picture of a pitcher’s underlying performance. Peripheral statistics are not unique to pitchers though; they are also useful when evaluating hitters.

A hitter’s walk-to-strikeout ratio, or some derivative equation using those components, is used to evaluate a hitter’s **approach**, or often used in the discussion of how well a hitter “controls the strike zone.”

This idea comes up in the discussion of prospects. A prospect can have great results, but if he is striking out way more than he’s walking, that usually becomes a concern.

Will Middlebrooks, a Boston Red Sox third base prospect, was called up to the big leagues in early May after veteran third baseman Kevin Youkilissustained an injury. At the time of Middlebrooks’ call-up, Kevin Goldstein wrote this for Baseball Prospectus about Middlebrooks’ ability to compete right away:

The biggest challenge for Middlebrooks will be his approach. He sees far too many pitches as hittable and can expand his strike zone at times, which is a trait big league pitchers will surely exploit. The power should play immediately, but he could struggle in the batting average category.

In Middlebrooks’ first three months in the bigs (May through July), he hit for a very high batting average (.301), despite a dreadful walk-to-strikeout ration (0.16; the major league average is 0.40). Also, his batting average on balls in play was very high (.357).

Given Goldstein’s profile and Middlebrooks’s peripherals, his batting average results hardly seemed sustainable.

In 10 games in August, Middlebrooks hit at a .194 clip with a .190 BABIP, but we were unable to see if this small-sample downward trend would continue, since he broke his wrist and would finish the season on the disabled list.

Middlebrooks found surprising success, but it will be interesting to see if that success will continue with the same approach during the 2013 season.

This idea not only applies to prospects, but can in some cases be relevant when discussing hitters with a ton of major league experience.

This offseason, the top free agent hitter is outfielder, Josh Hamilton, who swings as much as anyone else in baseball. In August, I wrote about how Hamilton’s plate discipline may have been the reason for his inconsistent play in 2012. I’ve heard other analysts bring up Hamilton’s plate discipline as a cause for concern in giving him a long-term deal. However, it seems to some that Hamilton may be one of the few examples of a hitter succeeding with little regard for taking pitches and controlling the strike zone. Most players cannot sustain the success that Hamilton has had with this type of approach.

If approach seems to not matter for Hamilton, how much does it matter?

Here’s what I mean by approach. A big league hitter has to go to the plate with a plan in mind. For Hamilton it may be swing if it’s anywhere near the zone, while for Youkillis it may be to take as many pitches as possible in order to draw a walk. What I’m using to define approach for this piece is simple: If a hitter is able to have a solid idea of what he wants to do to succeed at the plate, we should see him striking out only about as often as he walks, instead of four times as much.

The variability in BABIP is a major factor in why we tend to see a fairly weak correlation in year-to-year batting average. I decided to find out whether subtracting walks from strikeouts, as a simple defining tool for a hitter’s approach or ability to control the strike zone, does a better job at predicting batting average in the next season than batting average itself..

### The study

I took a sample of hitters (n=889) who had a least 350 plate appearances in Year X and at least 350 plate appearances in Year X+1, for the years 2007 to 2012.

First, I ran a simple linear regression for the correlation between batting average in Year X and batting average in Year X+1.

I found an r-squared of 18.1 percent for this sample; which means that batting average in one year explains 18.1 percent of the variation in batting average in the subsequent season. This correlation is fairly strong, and about where I expected it to me.

Then I ran the same linear regression, but this time I used walks minus strikeouts divided by plate appearances (BB-K/PA) as the predictor of future batting average. The correlation wasn’t nearly as strong: A batter’s quasi-approach explained only 9.52 percent of the variance in future batting average.

Using a multiple regression as opposed to simple linear regression usually yields stronger, more accurate, results. Thus, I first tried to improve the BB-K predictor by running a multiple regression with two predictors (BB/PA and K/PA); this helped re-weight the BB-K/PA formula into a stronger predictor, as the r-squared improved to 11.53 percent.

That number still wasn’t nearly as strong as simply using batting average, but I figured there was a possibility that combining approach and batting average in a multiple regression would improve the results.

I found that when walks, strikeouts and batting average are combined as three separate predictors, the overall r-squared from just using batting average improved from 18.1 percent to 21.4 percent.

Interestingly, walk percentage was not a significant predictor once batting average was included in the multiple regression equation; thus, walks were factored out and strikeouts and batting average were tested by themselves, which resulted in an r-squared that was still 21.3 percent.

Finally, I decided to see if a regressed version of BABIP would weed out some of the variability in batting average during in the predictor season and improve the model.

To go about doing this, I took each batter’s BABIP in year X and used just 25 percent of that number, while regressing the other 75 percent back to the league average, because it takes around two to two and a half seasons for BABIP to stabilize for hitters. I then combined this regressed BABIP with strikeouts and batting average.

All three predictors were significant, but the BABIP-batting average model caused a slight decrease in overall r-squared (20.8 percent).

Thus, the most predictive model that I found, given the few statistics tested, was simply a combination of strikeout percentage and batting average.

### Issues with the study:

This study was nowhere near perfect, and I actually have a few issues with my own test.

First, I’m not sure that using walk percentage, or even walks minus strikeouts, to predict batting average is the best idea.

A plate appearance that results in a strikeout is considered an at-bat; thus, it goes in the denominator of batting average, while a plate appearance that results in a walk does not.

I think it’s possible that if I had tested the results against hits per plate appearance instead of hits per at-bat, walks may have become significant. I’m not sure that hits per PA is the best thing to test, though. A hitter who does a good job of controlling the strike zone—for instance Ian Kinsler—tends to not have a great batting average (or hits per PA), but more than makes up for that by having a high on-base percentage.

The minimum plate appearance that I set also could have caused problems. My criteria required a hitter to have at least 350 plate appearances in consecutive seasons. Abstractly it makes sense that if a batter is allowed to have more than 700 plate appearances across two big league seasons, he probably has to ability (or his individual approach is good enough) to hit at the major league level.

This factor brought me to a possible idea for a follow-up article. As I mentioned earlier, Middlebrooks’ approach led scouts to be concerned about his batting average in the bigs. It’s possible that if a similar test was done for just rookies, walks and strikeouts might become more predictive.

It makes sense to me that it’s more important to project into the future how well a rookie’s approach translates into success than how well a veteran’s approach translates, given that we’ve already seen him hit in the bigs.

Strikeouts are rising in baseball. The number of hitters who walk more than they strike out is dwindling. The number of qualified hitters who walk more than they strike out has not broken double digits since 2009 (just four in 2012).

There’s a possibility that a batter’s ability to not strike out is becoming more important, in terms of hitting, than walks, given this rise. That, of course is simply a postulation, but there’s a chance it is having some effect on these results and is something to consider.

If a conclusion to this piece is necessary, I’d say that these results indicate a hitter’s ability to either make contact or not chase pitches (a la not strike out) is more important than the ability to take a walk. This makes some intuitive sense, as more contact and fewer strikeouts should lead to more hits, and walks should be more highly correlated with on-base percentage.

However, I’m surprised that walks weren’t a more important factor in predicting future batting average.

One final thing to consider: We all should know by now that projecting weighted on-base average (wOBA) or on-base percentage is more valuable than simply looking at batting average.

**References & Resources**

All data comes courtesy of FanGraphs

Print This Post

I would recommend holding off on using regressions for baseball analysis until you take Econometrics. An r-sqaured of 18% also means that 72% (aka a majority) of the variation in batting average has nothing to do with previous year’s batting average. It would also be helpful to know the coefficients, standard deviations, and confidence intervals for your regressions.

I agree that you don’t want to look only at BA. Their may well be a significant correlation between BB% and ISO.