Park effects and batted ball types

One of the best things about Gameday is the ability to download the data and do your own research and analysis. The possibilities are almost limitless, and Hardball Times readers will be familiar with the many uses, including pitching, hitting and fielding analysis. One of my favorite uses is to create a fielding independent measure of pitch quality.

By now, you’ve probably seen Graham MacAree’s tRA at Fangraphs or StatCorner. It’s based on batted ball types, from ground ball to fly ball. It’s a great tool to use, among many other quality measures of pitching, including xFIP.

I like to use something similar to tRA, which I’ve been calling rv100E. That refers to “expected” run value per 100 pitches thrown, as in runs allowed, or saved, above average. Linear weights are used, based on hit type (single to home run) or out, as well as balls and strikes. The run expectancies are count adjusted, allowing for measurement of a particular pitch, or even a particular location—or both.

Wait a second. I said rv100E is based on batted ball type, but the linear weights are based on hit type. Hit types are probabilistic, and are distributed based on batted ball type.

A table will explain it better. Each batted ball type has a particular range of outcomes, and various frequencies. Outs, zero, one or more, can occur, too, even when a batter reaches safely. Those also are counted. Here are the 2009 batted ball type to even types, along with the linear weight (LW) associated with each hit type. Values are for 2009 only.

  Home Run Single Double Triple Out
Line Drive .022 .524 .174 .015 .224
Ground Ball .000 .219 .018 .001 .693
Fly Ball .119 .057 .082 .013 .597
Pop Up .000 .013 .014 .000 .975
LW 1.468 .489 .768 1.052 -.289

This is a nice start, but the data provided by Gameday, as far as batted ball type, are entered by the stringers in the press box—free-lancers hired for this purpose. And not all of them classify hits the same way.

Dealing with park effects in layers

Consider this problem. A line drive in Petco Park may be worth more than one in Wrigley Field. Or, more likely, it may have a different range of outputs. I can envision more line drive home runs in Wrigley, but fewer triples. At the same time, I’d expect a different range of outcomes with fly balls. More outs and doubles in Petco, more home runs in Wrigley. In that event, I need to apply different weights based on the park.

Or do I? Do I really care if a pitcher gave up a line drive in Citi Field rather than Safeco? Either that pitch was hit hard, or it wasn’t, and I want a park neutral value assigned. So, super duper, I don’t need to worry about park effects. I treat all fly balls as equals, assign the league average home run rate (indirectly shown above) and move on.

Or do I? Don’t forget about the stringers. There’s a “park effect” I care about. Let me show you what I mean. Here are the ratios for fly balls to line drives, by park, in 2009.

The Incompleat Starting Pitcher
The end of the nine-inning start and how we got here.

home FB:LD
ana 2.37
ari 1.78
atl 2.03
bal 1.62
bos 2.08
cha 1.81
chn 1.90
cin 1.31
cle 1.47
col 1.25
det 1.20
flo 1.28
hou 1.99
kca 1.65
lan 1.63
mil 1.70
min 2.10
nya 1.31
nyn 1.29
oak 1.36
phi 1.52
pit 1.68
sdn 1.52
sea 1.25
sfn 1.50
sln 1.14
tba 1.16
tex 1.28
tor 1.51
was 1.24

So, do the Mariners hit a lot of line drives, or do the stringers like to tag hits as line drives? Or should we blame their pitchers?

One way to tease out the team itself from the stringers is to apply the park correction methodology and find the “park effect” on batted ball classification.

Stringer effect

In reality, it isn’t just line drives and flies we have to worry about. On a base hit, when does a liner become a grounder? How likely is a home run to be a line drive or a fly ball?

This table keeps line drives and fly balls separate from their home run counterparts, while the above did not.

ana 0.98 0.75 0.23 1.21 1.20 1.02
ari 1.01 0.88 0.42 1.09 1.12 0.93
atl 0.98 0.83 0.25 1.19 0.87 1.00
bal 1.00 1.00 0.47 1.03 1.23 0.87
bos 0.97 0.86 0.00 1.12 1.07 1.10
cha 0.96 0.91 0.56 1.09 1.34 1.07
chn 0.99 0.92 0.43 1.06 1.22 0.99
cin 0.98 1.16 1.00 0.93 1.14 0.93
cle 1.07 0.97 0.52 0.97 0.69 0.99
col 1.01 1.18 0.96 0.92 0.93 0.80
det 0.98 1.12 4.02 0.92 0.74 1.10
flo 1.01 1.14 2.21 0.90 0.89 0.99
hou 1.01 0.85 0.42 1.07 1.10 1.05
kca 1.04 0.97 0.22 1.06 0.84 0.76
lan 1.04 0.86 0.66 1.06 0.87 1.03
mil 0.99 0.92 1.52 1.06 1.00 1.04
min 1.00 0.81 0.30 1.12 1.13 1.08
nya 1.04 1.05 2.06 0.87 1.23 0.96
nyn 0.97 1.00 0.99 0.98 1.23 1.18
oak 1.00 1.10 1.24 0.92 0.98 1.03
phi 1.02 1.06 0.78 0.93 1.17 0.94
pit 1.01 0.95 0.48 0.99 1.15 1.04
sdn 1.00 1.05 0.47 1.00 0.79 1.00
sea 0.99 1.20 1.68 0.89 0.84 1.03
sfn 0.99 0.99 1.13 1.00 0.92 1.14
sln 1.08 1.13 1.76 0.84 0.66 0.87
tba 0.95 1.19 2.44 0.88 0.78 1.26
tex 0.96 1.26 1.45 0.95 1.02 0.78
tor 1.02 0.93 1.24 0.98 1.19 1.03
was 0.96 1.08 1.11 1.00 0.82 1.13

Dizzy yet? I am. If a number above is less than one, it appears the stringer is less likely than average to classify a batted ball as such. With a caveat for the home runs, there’s a real park effect mixed in.

Next steps

Now that I’ve crunched some numbers, there a few things left to do. First, expose this to public scrutiny to flush out issues with my methodology. At the same time, crowd source the application of this information. Based on a given park, and a stringer’s classification, how would you distribute the linear weights for hits and outs? In other words, how should rv100E work?

References & Resources
Gameday data from MLBAM
Linear weights calculated using Tom Tango’s tool
All math errors and other brain cramps by the author, but he’ll blame the editor

Print This Post
Sort by:   newest | oldest | most voted
Colin Wyers
Colin Wyers

What we’re really after here is whether or not the ball is being hit differently in these parks or if it’s simply being scored differently. I think this could be answered with Hit F/X pretty easily, although I don’t know if the data we have from the conference is large enough to make any definate conclusions for out-of-sample data.


Here is an easy way to find out…watch the game live.

Of course none of this could ever be settled until you standardize what a line drive is, what a fly ball is, what a pop-up is and what a grounder is.  And is that possible?  There is always going to be subjectivity in that.