I feel like my Four Factors series, which has covered a few interesting hitters, has gone over well here. I felt that it could be more, however, and I believe I’ve taken the first step into making it more than a simple tool or rule of thumb. The following is more of a reference post than an analytical post, and as the title suggests, there will be some math involved here. What I put forth here also suggests a possible addition of a fifth (or sixth) factor, albeit one (or two) that appears to be much less important than the other four.
For the four factors to have more use besides as an at-a-glance hitter evaluation tool, it would have to mean something in terms of runs. For hitting, that means wOBA. A bridge between the Four Factors and wOBA would mean that we could find how meaningful it is that Ryan Howard has seen his strikeouts drop if his power returns to previous form, or how much better my boy George Kottaras would be with a respectable BABIP instead of his current .195 mark.
Of course, BB% and K% can give us raw (or, even better, per plate appearance) numbers for walks and strikeouts. Working to figure out the other four events – 1B, 2B, 3B, and HR – would give us a working estimate of wOBA. This can be done with the use of POWH (XB/H), the version of Isolated Power that only looks at slugging percentage on hits, and BABIP (H/BIP), the other two factors. These four relatively isolated hitting skills can almost completely account for a player’s overall line.
The tricky part comes with estimating home runs, because of the four stats we have, only two of them deal with contact, and of those, one deals with balls in play, excluding home runs. But we can solve for home runs as follows. Follow the jump for the math.
POWH*BABIP = (XB/H)*((H-HR)/BIP) where BIP = PA-BB-SO-HR, or balls in play. Now, it’s time to use some “fancy algebra,” which means a lot of steps. Note that instead of using PA, I use 1, as instead of calculating raw PA totals, I prefer per plate appearance totals for the results.
POWH*BABIP = (XB/H)*(H/BIP)-(XB/H)(HR/BIP)
POWH*BABIP = (XB/BIP)-(XB/H)(HR/BIP)
POWH*BABIP = (XB/BIP)-(POWH)(HR/BIP)
Expanding the XB term:
POWH*BABIP = (2B+2*3B+3*HR)/BIP – (POWH)(HR/BIP)
Combining like terms:
POWH*BABIP = (2B+2*3B+(3-POWH)*HR)/BIP
Expanding the BIP term:
POWH*BABIP = (2B+2*3B+(3-POWH)*HR)/(1-BB-SO-HR)
Multiply both sides by expanded BIP term:
POWH*BABIP*(1-BB-SO-HR) = (2B+2*3B+(3-POWH)*HR)
Distribute on left side (note that (1-BB-SO) is the percentage of PAs ending in contact):
POWH*BABIP*(1-BB-SO)-POWH*BABIP*HR = (2B+2*3B+(3-POWH)*HR
Add POWH*BABIP*HR to both sides:
POWH*BABIP*(1-BB-SO) = 2B + 2*3B + (3-POWH)*HR + POWH*BABIP*HR
Now we’re stuck. But we can use two more leaguewide factors to allow us to get around the fact that we have one equation with three variables (2B,3B,HR). We can use the leaguewide rates of 2B/HR (call this “X”) and 3B/HR (call this “Y”) to reduce this equation to a solvable form, which limits us to assuming the same rates of XBH for each hitter, which causes some issues at the margins. These issues, however, tend to be relatively small. Our equation now reads as follows:
POWH*BABIP*(1-BB-SO) = X*HR + 2*Y*HR + (3-POWH)*HR + POWH*BABIP*HR
Factor out HR:
POWH*BABIP*(1-BB-SO) = HR*(X+2Y+(3-POWH)+POWH*BABIP)
Divide off right hand side:
HR = (POWH*BABIP*(1-BB-SO))/(X+2Y+(3-POWH)+POWH*BABIP)
With HR figured out, the rest of the events fall into place. Since X = 2B/HR and Y = 3B/HR,
2B = HR*X and
3B = HR*Y
Finally, 1B = BABIP*(1-BB-SO-HR)-2B-3B. This is the rest of the in-play hits after 2B and 3B.
There it is. For the 2009 hitters with at least 50 plate appearances, this formula is on average within .0001 points of wOBA with a standard devation of .010 – that means that 68% of players are projected within 10 wOBA points simply using league-wide XBH distribution, and 95% are projected within 20 points. What that tells me is that there really is a 5th factor (or 5th and 6th?), XBH distribution, which would be enough to ensure total accuracy – naturally, as this would simply be estimating wOBA from their overall line. XBH distribution seems to be by far less important than the other four factors.
What are the uses of this? Ideally, as above, it’s to isolate these factors and to describe how a change in one of these factors could impact a player’s production at the plate. Perhaps there are more; perhaps this is limited in use. Here’s a spreadsheet with the players from 2009 and their full Four Factors lines – feel free to play around with it, and please let me know of any suggestions in the comments.
Later this afternoon, we’ll take a look at some examples of how I think this could be useful.