Expanding on Four Factors: Fun(?) With Math

I feel like my Four Factors series, which has covered a few interesting hitters, has gone over well here. I felt that it could be more, however, and I believe I’ve taken the first step into making it more than a simple tool or rule of thumb. The following is more of a reference post than an analytical post, and as the title suggests, there will be some math involved here. What I put forth here also suggests a possible addition of a fifth (or sixth) factor, albeit one (or two) that appears to be much less important than the other four.

For the four factors to have more use besides as an at-a-glance hitter evaluation tool, it would have to mean something in terms of runs. For hitting, that means wOBA. A bridge between the Four Factors and wOBA would mean that we could find how meaningful it is that Ryan Howard has seen his strikeouts drop if his power returns to previous form, or how much better my boy George Kottaras would be with a respectable BABIP instead of his current .195 mark.

Of course, BB% and K% can give us raw (or, even better, per plate appearance) numbers for walks and strikeouts. Working to figure out the other four events – 1B, 2B, 3B, and HR – would give us a working estimate of wOBA. This can be done with the use of POWH (XB/H), the version of Isolated Power that only looks at slugging percentage on hits, and BABIP (H/BIP), the other two factors. These four relatively isolated hitting skills can almost completely account for a player’s overall line.

The tricky part comes with estimating home runs, because of the four stats we have, only two of them deal with contact, and of those, one deals with balls in play, excluding home runs. But we can solve for home runs as follows. Follow the jump for the math.

POWH*BABIP = (XB/H)*((H-HR)/BIP) where BIP = PA-BB-SO-HR, or balls in play. Now, it’s time to use some “fancy algebra,” which means a lot of steps. Note that instead of using PA, I use 1, as instead of calculating raw PA totals, I prefer per plate appearance totals for the results.

Expanding:
POWH*BABIP = (XB/H)*(H/BIP)-(XB/H)(HR/BIP)

Cancelling:
POWH*BABIP = (XB/BIP)-(XB/H)(HR/BIP)

Simplifying:
POWH*BABIP = (XB/BIP)-(POWH)(HR/BIP)

Expanding the XB term:
POWH*BABIP = (2B+2*3B+3*HR)/BIP – (POWH)(HR/BIP)

Combining like terms:
POWH*BABIP = (2B+2*3B+(3-POWH)*HR)/BIP

Expanding the BIP term:
POWH*BABIP = (2B+2*3B+(3-POWH)*HR)/(1-BB-SO-HR)

Multiply both sides by expanded BIP term:
POWH*BABIP*(1-BB-SO-HR) = (2B+2*3B+(3-POWH)*HR)

Distribute on left side (note that (1-BB-SO) is the percentage of PAs ending in contact):
POWH*BABIP*(1-BB-SO)-POWH*BABIP*HR = (2B+2*3B+(3-POWH)*HR

Add POWH*BABIP*HR to both sides:
POWH*BABIP*(1-BB-SO) = 2B + 2*3B + (3-POWH)*HR + POWH*BABIP*HR

Now we’re stuck. But we can use two more leaguewide factors to allow us to get around the fact that we have one equation with three variables (2B,3B,HR). We can use the leaguewide rates of 2B/HR (call this “X”) and 3B/HR (call this “Y”) to reduce this equation to a solvable form, which limits us to assuming the same rates of XBH for each hitter, which causes some issues at the margins. These issues, however, tend to be relatively small. Our equation now reads as follows:

POWH*BABIP*(1-BB-SO) = X*HR + 2*Y*HR + (3-POWH)*HR + POWH*BABIP*HR

Factor out HR:
POWH*BABIP*(1-BB-SO) = HR*(X+2Y+(3-POWH)+POWH*BABIP)

Divide off right hand side:
HR = (POWH*BABIP*(1-BB-SO))/(X+2Y+(3-POWH)+POWH*BABIP)

With HR figured out, the rest of the events fall into place. Since X = 2B/HR and Y = 3B/HR,
2B = HR*X and
3B = HR*Y
Finally, 1B = BABIP*(1-BB-SO-HR)-2B-3B. This is the rest of the in-play hits after 2B and 3B.

There it is. For the 2009 hitters with at least 50 plate appearances, this formula is on average within .0001 points of wOBA with a standard devation of .010 – that means that 68% of players are projected within 10 wOBA points simply using league-wide XBH distribution, and 95% are projected within 20 points. What that tells me is that there really is a 5th factor (or 5th and 6th?), XBH distribution, which would be enough to ensure total accuracy – naturally, as this would simply be estimating wOBA from their overall line. XBH distribution seems to be by far less important than the other four factors.

What are the uses of this? Ideally, as above, it’s to isolate these factors and to describe how a change in one of these factors could impact a player’s production at the plate. Perhaps there are more; perhaps this is limited in use. Here’s a spreadsheet with the players from 2009 and their full Four Factors lines – feel free to play around with it, and please let me know of any suggestions in the comments.

Later this afternoon, we’ll take a look at some examples of how I think this could be useful.



Print This Post



Jack Moore's work can be seen at VICE Sports and anywhere else you're willing to pay him to write. Buy his e-book.


Sort by:   newest | oldest | most voted
Nick Fleder
Guest
Nick Fleder
6 years 1 month ago

I’m curious… what do you think of George Kottarras’ projections next year?

Seems like the guy is all or nothing, but as you say .195 BABIP among other things prove that luck isn’t on his side…

Craig
Guest
6 years 1 month ago

I’m very excited for the afternoon part, and not just because I love algebra. I’ve been hoping someone would take a look at something similar to this.

Eric
Guest
Eric
6 years 1 month ago

Using the same weightings as the spreadsheet for wOBA, I obtain an expression for wOBA as follows:

wOBA = .72*BB + .9*BABIP*(1-BB-SO) + HR*(1.95+.34*X+.66*Y-.9*BABIP)

where HR is a function of POWH, BABIP, BB, and SO:

HR = [POWH*BABIP*(1-BB-SO)] / [X+2Y+3+(BABIP-1)*POWH]

We can therefore take a derivative of wOBA with respect to any of the four variables, for instance with respect to POWH:

dwOBA/dPOWH = dHR/dPOWH * (1.95+.34*X+.66*Y-.9*BABIP)

and

dHR/dPOWH = [(X+2Y+3+(BABIP-1)*POWH)*BABIP*(1-BB-SO) – POWH*BABIP*(1-BB-SO)*(BABIP-1)] / [X+2Y+3+(BABIP-1)*POWH]^2
= [(X+2Y+3)*BABIP*(1-BB-SO)] / [X+2Y+3+(BABIP-1)*POWH]^2

Using 1.6 for X and .15 for Y, we get:

dHR/dPOWH = [4.9*BABIP*(1-BB-SO)] / [4.9+(BABIP-1)*POWH]^2

and

dwOBA/dPOWH = [(2.53 – .9*BABIP)*4.9*BABIP*(1-BB-SO)] / [4.9+(BABIP-1)*POWH]^2

Using the last line in the Google spreadsheet, playerid 1624, we obtain a value of .108 for dwOBA/dPOWH, and to the number of digits displayed this holds numerically.

Put shortly, a raw change of .100 in POWH for this hitter will result in a raw change of .011 in wOBA.

.

We can also pull out how the other three of the Four Factors interact in this case, meaning that an increase in POWH while keeping the other three constant will have a different effect on wOBA depending on what those constant values are.

The least complicated are BB and SO. A higher base rate of BB (or SO) necessarily means that the player will see less impact in wOBA from an increase in POWH – this makes sense in that POWH can only help on a batted ball.

BABIP is significantly more complicated: in the numerator, we have:

(1-BB-SO)*(UB – VB^2)

where U and V are numbers and U > V. Because B can never be greater than 1, a higher B means a higher numerator. In the denominator, we have (B-1), and again because B can never be greater than 1 a higher B means subtracting less of POWH (a positive number), which means a higher denominator. A higher numerator and a higher denominator mean that no general conclusion can be drawn regarding the resulting fraction – it depends on the relative size of X, Y, BB, SO, and POWH.

.

Taking this analysis in sum, it is probably best to do further analysis numerically, as the derivatives are quite unwieldy.

wpDiscuz