## Game Score (and Crowd-Sourcing)

Bill James created a metric called Game Score that looks at a pitcher’s standard pitching line, to come up with an overall score, centered at 50, with most scores in the 0 to 100 range. I am trying to take that concept, deconstruct Game Score, and reconstruct it by forcing it to follow some set rules.

The scale of Game Score will mimic win percentage. So, a Game Score of 50 means you will win 50% of the time. A Game Score of 70 means you will win 70% of the time, and so on.

We’re not going to be total maniacs about it, and force a Game Score to stop at 99 or 1. Think of the relationship between Game Score and win percentage more as useful guidelines, rather than a hard constraint. This is especially because the relationship between wins and runs is not linear, and so, it’s going to be impossible to create a linear metric that will work at the extremes.

We also know that ten marginal runs equals one marginal win. This means that one marginal run equals 0.10 marginal wins. In terms of the Game Score scale, one marginal run is 10 Game Score points. Keep this in mind as you read the various versions of Game Score.

In addition, we’re going to set the starting point of Game Score to 40, rather than following the Bill James lead of starting at 50. The idea here is to think in terms of replacement level, and if you pitch to one batter and are out of the game, we’d hardly call that an “average” game. Indeed, I would even consider starting the Game Score at 35 or even 30. For the moment, we’ll start at 40, and let’s see where this takes us.

**Version 1: Runs**

This version focuses only on runs allowed. The basic equation to solve is this:

Game Score = a * IP + b * R + 40

We’ve already established that one run is 10 Game Score points, so that forced b = -10. In order to solve for “a”, we need to know what an average number of innings is for a start (around 6.1 in 2011), runs (around 2.9 these days). By definition, Game Score equals 50 for this start. Therefore, solving for “a” gives us 6.4. I’m not too crazy about the decimal. So, we have:

Game Score = (6.4 * IP) – (10 * R) + 40

A nine-inning shutout gives us a Game Score of 98. On the flip-side, to get a Game Score close to 0 would mean 3 innings and 6 runs allowed (Game Score of minus 1), or 5 innings and 7 runs allowed (Game Score of 2).

If you believe that pitching is only about innings and runs, then stop here! This is the Game Score for you. Everything else I am about to say is going to be useless to you. But, if you are interested in different facets of pitching, then read on.

**Version 2: Strikeouts and walks**

This version concerns itself with the two resulting events that are highly linked to the pitcher himself. While contacted balls can be influenced by the park or the fielders, the strikeout and walk stabilizes very quickly. If there are only two stats to know about a pitcher, it’s these two things. In addition, we know that the difference between the two is what correlates to runs the most. Finally, it takes about three walks to add one run (and 10 Game Score points).

When I say “walk”, I’m really excluding intentional walks and including hit batters. It makes most sense to consider it that way than not, so that’s what I’m doing. So, we need to solve for:

Game Score = a * IP + b * (SO – BB) + 40

We’ve established that b = 3. Now we just need to solve for “a”. An average start has a strikeout minus walk differential of 2.5. Therefore, solving the equation gives us a = 0.4. Again, not a huge fan of the decimal, but, we’ll have to live with it. So we have:

Game Score = 0.4 * IP + 3 * (SO – BB) + 40

A Clemens game (20 K, 0 walks, 9 IP) would give us a Game Score of 104. Again, while we’d like to not necessarily exceed the 100 scale (since we were hoping to imply that to mean 100% winning), we’re not going to go crazy with that rule. On the flip side, a 0 strikeout, 14 walk, 5 inning game would be a Game Score of 0. We don’t see games like this because a pitcher will get pulled very fast. A pitcher can get runs to pile up because of bad breaks, but he will never be allowed to pile up walks.

**Version 3: FIP**

FIP is the shortcut version of DIPS, a sabermetric breakthrough mostly at the feet of Voros McCracken, and possibly inspired by Bill James’ DER work. Since the core of FIP is already scaled to runs, it’s a straightforward calculation:

FIPcore = 2 * SO – 3 * BB – 13 * HR

Game Score = a * IP + FIPcore + 40

An average FIPcore in a start is -5.2. So, solving for “a” gives us 2.5. Therefore, we have:

Game Score = 2.5 * IP + FIPcore + 40

A shoutout to Kevin Harlow is in order here.

Again applying a Clemens game here (with 0 HR), we get a Game Score of 103. On the flip side, a 3 HR, 5 BB, 0 K 5-inning start gives us a Game Score of minus 2.

**Version 4: Linear Weights**

Linear Weights, or component runs, is the inspiration here. Our focus is only on the inputs (hits, walks, home runs), while ignoring any sequencing. If you scatter the hits and walks, you can get a shutout, but if you bunch up the same number of hits and walks, you might get a 4-run inning. So, in this version, we’re explicitly ignoring the runs (the output) and only considering the components (the input).

As with the other metrics, since Linear Weights is already grounded in runs, the conversion to Game Score points is easy enough. A “hit” is not only singles, but all extra base hits. This is how the core looks like:

LWTScore = -(3*BB + 5*H + 8*HR)

So, a HR gets minus 13, while all other hits gets minus 5.

Game Score = a * IP + LWTScore + 40

A standard start has a LWTScore of -41. Solving for “a” gives us 8.4. Our fourth version is:

Game Score = 8.4 * IP + LWTScore + 40

A perfect game gives us a whopping Game Score of 116. To get a Game Score close to 100, you’d need to allow only one HR (Game Score of 103), or allow only 3 hits (Game Score of 101), or allow 5 walks (Game Score of 101). This version, more than the others, shows where the system breaks down. This is because there are many more components being considered. In the other cases, it basically presumed “average” results for the unknown parameters. In this case, since we considered the most important parameters, the linear approach breaks at the extreme. But, as noted, we’re going to live with it. (Or not. Your choice.)

To get a Game Score close to 0 while pitching 5 innings, you need to allow 3 HR, 5 walks, and 5 other hits (Game Score of 3).

**Version X: Bringing them together**

When you look at Bill James’ Game Score, he includes hits and walks and strikeouts and runs. In none of the above four versions do I have all of these together. This is because I was splitting up the components in a more focused manner. If all you cared about was runs, then you get that. If all you cared about was hits and walkd, then you get that. Bill basically (and implicitly) amalgamates these four versions into one.

Therefore, how best to do that? That’s where YOU come in. I don’t know. I can decide to weight the Runs version at 40% and the Linear Weights version at 30% and the FIP version at 20% and the Strikeout/Walk version at 10%. Or, I can come up with a different scheme. What is the best one? Well, it’s the one that best describes what we want: how well did the pitcher pitch?

Do we count all shutouts the same, regardless if it was a perfect game, or if you had 10 hits scattered throughout? Maybe you do, if all you care about is runs. And you wouldn’t if you think these two games are different. But, how different are they?

What if you had 10 hits + walks scattered throughout the game for no runs, and in another game the very same pitcher bunched them up for 4 runs? Are those two games identical? Well, if you don’t care about sequencing, then, yes, they are the same. If sequencing counts for how well a pitcher pitched, then, no they are not the same.

**Crowd-sourcing**

So, that’s where the exercise lies: what is the balancing point? How do you weight each of the 4 versions so that it tells the fair story? In order to answer that question, what you the reader need to do is… watch baseball games. You have to look at “weird” games (basically those where the Game Score of the four versions are all over the place for a particular start), and compare them to each other. Francisco Liriano threw a gem that Twins fans were overwhelmingly in agreement that it was a better pitched game than his no-hitter. So, that’s what you have to do.

I know, I know, you need to work. But, if I give you my answer on how to weight the four components, someone is going to say “no way”. That’s because that person is going to have his own balancing scheme. And that’s why we need a consensus.

You can still take advantage that the four Game Score numbers are going to be calculated. But, it would be helpful to have a consensus view as to how those four numbers can be collapsed into a “final” version as well.

David Appelman took the initiative here to run a quick poll a few days ago, comparing the no-hit games of Verlander, Santana, and Liriano. The Verlander and Santana games were in the same ballpark, with the Verlander game slightly ahead, and Liriano’s nowhere in sight.

So, David or I, or perhaps you the reader, will be on top of this. And, we’ll try to figure out the answer. I’m guessing that the upcoming playoffs will provide some great examples (like Doc and Lincecum last year). The advantage here is that we are all going to watch those games, and so, becomes a great crowd-sourcing event. My plan is to have the Version X of the Game Score decided by the end of the World Series.

**Recap**

Version 1 = 40 + (6.4 * IP) – (10 * R)

Version 2 = 40 + (0.4 * IP) + 3 * (SO – BB)

Version 3 = 40 + (2.5 * IP) + FIPcore

Version 4 = 40 + (8.4 * IP) + LWTScore

Print This Post

None of the formulas you listed use HBP. What gives?

From the article:

When I say “walk”, I’m really excluding intentional walks and including hit batters. It makes most sense to consider it that way than not, so that’s what I’m doing.

***

So, whenever you see “BB”, it really means “BB-IBB+HBP”. It’s ugly to show that, hence the reason I just show BB.

Oh, missed that line. Very reasonable, thanks.

What is the end goal here? Are we simply trying to find a way to compare individual pitching performances, or is this a broader attempt to improve the way that we measure a pitcher’s value? Ideally, would Average Game Score be a better measure of WAR than FIP?

Forgive me if these are obvious or elementary questions; I didn’t really know anything about Game Score before reading this article.

Putting it into WAR could be one possible outcome, though not necessarily the goal.

The opening paragraphs make me feel like I’ve jumped into Part II of a series but missed Part I. Did you leave out an opening paragraph or 2?

Yeah, originally I had intented this for my blog, so it would be part of a continuing series for the regulars there.

Let me add a paragraph so as to not shock the newcomers.

Thanks.

I love this. I’d say something like 40% Runs + 60% FIP would be a really fun game score to kick around. I don’t love the LWTS gs because it treats 1/2/3 base hits equally. I’m sure you did it to maintain simplicity, but I’m not sure it makes it more useful. And the straight K/BB system seems to overlap with FIP too much, but I do like removing HR from the equation (partially). Here are a couple options I would vote for:

Runs 40% – FIP 60% – Baseline

Runs 40% – FIP 30% – K/BB 30% (if you want HR to have less affect)

Runs 33% – FIP and/or K/BB 33% – LWTS – 33% (If you want BABIP to have more effect)

Ok, so I have no idea what I would really vote for. But my gut says RA should be around a 1/3, with FIP based stuff making up the rest of it. This is cool. Wonder what the crowd thinks.

Yeah, the LWTScore is disappointing. I’d like to see a game score based on the same linear weights as wOBA.

wOBA is Linear Weights.

The difference is that wOBA uses plate appearances as its “opportunity space”, while Linear Weights uses outs (or IP).

In my case, since IP is the opportunity space, then Linear Weights gives me the correct weights.

Yirm is saying that the implementation of the LWTS is disappointing, and that he would want to see each event given it’s proper weighting – which is the point I made in my post as well. As it is, it accomplishes nearly the same thing, and you did it for simplicity’s sake, totally understood. But it would be nice.

To clarify, all we’re talking about is doubles and triples.

Oh, I see.

Right, I was trying to simplify it by not including those events, for two reasons, neither of which may be good enough: (a) to keep it simple like James’ original Game Score, to which I added HR and removed ER, (b) lack of historical data pre-1950.

I’m not sure that setting the values to “5″ for hits, then a bonus of “3″ for doubles or triples is necessarily required (and subsequently increasing the IP coefficient to balance it out).

However, I’m not against it, nor am I against the idea of one version used if the 2B+3B data is available, and another version when that data is not available.

Oh interesting, didn’t realize we were missing 2B/3B data for pitchers pre 1950. Is that across the board, or just spotty retrosheet data that hasn’t been totally transcribed? I’d vote for calculating the 2B/3B values when we have it, and using straight H when we don’t… since I’m not the one doing the work :)

Perhaps what I’ll do is put up a poll first, to guage the “blink” opinion of the readers here. Then, when the playoffs roll around, we can talk more specifics of actual games.

1) If one goal is to have a comparable scale to likelihood of winning, then the formulas necessarily change (however imperceptibly) with changes in scoring environments, correct?

2) If another goal is to find the right combination of the 4 score formulas, then shouldn’t actual games provide history by which to run some sort of regression analysis on win likelihood (Assuming the vagaries of run support wash out over large samples)?

1. The standard linear weights formula leave all the change in weights to the out. While not necessarily the best, or the right, way to do it, there’s a strong appeal to only needing to worry about changing one parameter, rather than all of them. Do we really want to worry about the weight for the Runs to be 9.5 or 10.2 each year, or can we just go with 10, and let the coefficient for the IP float?

Given that these are quick shortcuts anyway, I think there’s a strong appeal to just changing the IP weight to force the average to “50″.

2. If you do that, 100% of the weight is going to be on runs (Version 1). This is because the winning team is based 100% on runs scored and runs allowed.

If you already have runs allowed, what is the number of walks, hits, HR, and strikeouts going to tell you? Well, nothing at all. You already have runs.

Well, it might tell you A LITTLE. Because the starter is only going to pitch 5-7 innings, the number of hits, walks, HR a starter gives up might be a good indicator as to what the reliever is going to give up (because that tells you more about the opposing hitting team than the number of runs scored).

However, others have run the regression in the past, and virtually 100% of the weight would go to runs allowed.

You last sentence makes sense, since all of the information that ACTUALLY matters wrt the winner of the game is stored in RA.

However, if all of the weight goes to RA, doesn’t that just mean that the problem is degenerate? In which case you can form a correlation between RA and the other three variables and base your weights off of those.

I’d propose that 1-R^2 in the latter correlation is some evidence that “sequencing skill” exists, in some form.

I added a “recap” section at the bottom, just so that all the equations are there together.

If you give double the weight to Version 4 (linear weights), and the other 3 are given about the same weight, we get this equation:

Game Score

= 40

+ 5 * IP

+ 1 * SO

- 2 * (R + BB + H)

- 5 * HR

A perfect game with 20 K gives you a Game Score of 105.

Those weights are a bit reminiscent to the James weight. The K is a match. The hit is a match.

He has 1 for walk, whereas I have 2. I have 5 for HR, whereas it was excluded in his. I think both of these are improvements.

He has 4 for ER and 2 for UER, whereas I have 2 for R. So, obviously he considers runs alot more than I do.

He starts at 50 whereas I start at 40. I think this is an improvement too.

He gives bonus points for pitching in innings 5 and later, whereas I don’t. I’m agnostic on this one.

If you look at the Bill James weights, and try to reverse-engineer them into the 4 versions I have, he gives these implied weights:

40% Version 1 (Runs)

30% Version 2 (K/BB)

0% Version 3 (FIP)

30% Version 4 (Linear Weights, sans HR)

I get a perfect match to his R, K, H components. I get an implied BB value of -2 not -1 like in his original one.

So, I think there’s a definite mistake in his Game Score for the walk. I can’t get the value of the BB to be -1, while also keeping the value of the K and H as he has them.

Nonetheless, there’s nothing wrong with his weights. Certainly reasonable. Presumably, if he had decided to include HR, some of the weight for Version 2 and Version 4 would go to Version 3.

In that case, the HR-included Bill James Game Score would imply the following:

Game Score

= 40

+ 5 * IP

+ 1 * SO

- 1 * H

- 2 * BB

- 4 * (R + HR)

I think there is something unappealing in weighting the walk more than the hit. I know why it comes out that, especially if you like FIP. It just seems… weird.

I like giving a little credit for going past the 5th inning. That has value to the team. (Well come to think of it – does it? Anecdotal/logically I would think it should, but can we prove it? First, you would have to be pitching better than your bullpen, obviously, and you’d have to show that your bullpen would otherwise not be able to pitch as much during the next couple days if they were to relieve you… and that your bullpen is better than your other starters. Seems messy, maybe leave it out then?)

Yeah, it’s a tough one. You can make the case that being pulled after 5, before you get tired, is actually a good thing, that with a 6-deep or 7-deep bullpen, that any of those guys is better than your non-ace starter.

Yea exactly. It’s been a while, but Isn’t there a part in the Book where you show that the 3rd time through the order SPs become measurably worse?

Right, third time through the order is terrible.

If we do the weighting of 60/40 FIP Score and Runs, which I think gets to the essence of great games (dominance plus run prevention), with some rounding simplification, the formula would come out:

GS=4*IP-4*R+1K-2BB-8HR+40

The Clemens game is a nice 96. The average game I believe remains 50 (4*6ish-4*3ish+4-4-5=47, so it’s close)

Looking at John Lackey’s first game of the season which was 3 2/3rds, 2 walks, 3Ks, 2 HRs, and 9 runs we’d get a game score of 4*3.66-4*9+3-4-16+40=2

That feels pretty good to me.

In your case, you’d have to decide if it matters if someone gives up 0 singles+doubles+triples or 10 (without the corresponding change in runs). That is, does his “scatter-ability” matter or not.

You are saying “not”

Shouldn’t the runs component take care of the “scatterability” and XBH components to a sufficient extent? That is, to say “without the corresponding change in runs” begs the question a bit it seems.

Also, I would have thought that not including any hits would be to argue that “scatterability” does matter, because you’re solely looking at runs, and not at hits that didn’t turn into runs. (This is more a question than a statement because I entirely acknowledge my inferior thinking on these issues).

You are mostly right. I don’t think I explained it well.

Basically, it’s hard to give up 10 hits one game, and 0 hits another game, and have the same number of runs allowed.

The question is if these two games should count the same (for you) or not. Do you want “scatter-ability” to be a real ability? Do you believe in it? If so, then giving up 10 hits and 0 runs is the same as giving up 0 hits and 0 runs.

I’d say that I mildly believe in scatterability, and think that sometimes pitchers lose command at basically random times, so they will have games with decent peripherals and lots of runs.

Even so, for Game Score, my gut is that it’s some measure skill descriptive (the FIP component) and some component descriptive of what happened (the run component). Insofar as scatterability matters, and when it matters, I think it gets picked up in the runs component.

In a game where a pitcher strikes out 10, walks 1, and gives up between 3 and 7 hits over 8 innings, while allowing 1 run, I don’t think I’d remember that game any differently given the number of hits. If the pitcher strikes out 10, walks one, gives up 8 hits, and 5 runs, with no home runs, I’d say “god damn if he hadn’t just collapsed in the 5th, or had Jeter gotten to that ball, he would have had a great game.”

Excellent. Not that I necessarily agree with you, but you’ve done exactly what I’ve asked, and that is, to think through, and then decide for yourself what you want.

Isn’t doing an BB/SO version and a FIP version redundant?

Regardless, I do like the version you have in your 4:44pm post.

Not necessarily redundant. It’s almost like you are saying: “should I count the HR or not? I’m not sure, so I’ll go 50/50 on it”. And so, you take half of Version 2 and half of Version 3.

The poll of the no-hitters made me think of the EloRater on bref comparing all major league hitters based, basically, on general consensus. I am wondering if the same thing could be done for pitched games; give people a serious of one on one forced choice comparisons (with all of the components included in the 4 models, and excluding own offensive team performance), and eventually build up a ranked list, from best to worse. I think it would have to be fit to a normal curve, since most pitching performances would give you a 35-65% chance of winning the game. I think it would then be fairly simple to create a formula that weighs each of the 4 models (or breaks up them up into their individual components). Or not.

I would prefer that people do that based on actual games they’ve seen.

Otherwise, we’re asking them to interpret numbers. If that’s the case, then simply giving me a weight of the 4 versions is sufficient, and alot easier to code and much faster to get a result.

Do you (we) have WPA data? Basically it seems like we are trying to find a linear approximation from stats to WPA.

You’re trying to do it by “feel” which is also legitimate, but I would want to at least see the coefficients from a regression of IP, H, R, ER, BB, HR, and K on WPA.

Apart from that, I would probably vote for something pretty close to Bill James’ weights as you calculated them above: Version 1, 2, and 4 weighted about equally, with a little extra on Version 1.

For starting pitchers, WPA is entirely runs and innings driven, and so, that would be Version 1.

Ah, you’re right, I never thought about it that way. I was thinking that every Hit or BB a pitcher gives up lowers the Win% a bit…but then I guess every inning he pitches removes that.

Right, except for when he leaves the game in the middle of an inning.

I like a 50/50 combination of “Strikeouts and Walks” and “Linear Weights”. It flies in the face of DIPS a bit (to the extent that it explicitly excludes your FIP-based game score) but when we are talking about dominance in a single game I will go against the DIPS grain and say that the results of contact matter.

As for the other two scores, I think this 50/50 combination allows you to exclude them. The “Strikeouts and Walks” score captures 2 of the 3 inputs to FIP anyway (though not in the same weights), and the “Linear Weights” version captures the third. So it has more than a bit of the “flavor” of FIP contained within it, rendering an FIP-based score somewhat redundant. And in my opinion the runs-based score should be somewhat redundant with the linear weights score, except in the case of truly fluky low-run games that are not “dominant” in the way I would think of it (e.g. 10+ singles and some walks scattered over 9 innings allowing no runs).

I like a version that goes:

40% FIP

35% Runs

25% LWTS

I think that you should include HR because HR’s represent a mistake by the pitcher, and a great game ideally has less mistakes. This makes the K-BB version redundant. A nine inning shutout is a dominant game to me, even if it involves few Ks and more scattering of hits. I would give LWTS more weight if it included 2Bs and 3Bs, because giving up gap hits is less impressive than giving up bloop singles.

* Also, how does 3 BBs = 1 R

Go to my site, and read “How are Runs Really Created”.

tangotiger.net

The only reason not to give full weight to the HR is that the HR is park-dependent to some extent, not to mention that alot of the HR is dependent on the hitter.

This is why Version 2 (sans HR) correlates better than Version 3 (FIP with HR) with next year’s data (or out of sample data).

That is, rather than isolating the pitcher’s performance, we’re including things outside his control (to some extent).

Hence, the argument you can make to give most weight to FIP, but then some weight to Version 2 (without HR).

The correct answer about the weightings is–who cares?

I’ve never seen proof that the original “game score” concept means anything and see no reason to think that this new version does either.

One thing (and certainly not the only thing) that’s interesting about pitcher game scores is that, if they are a good reflection of pitcher skill and measure the chance he’s given his team to win the game, they could help talk about the value of consistency or inconsistency in a rational way. The FG article about Ubaldo Jimenez and accusations of inconsistency used the Bill James Game Score, which is probably OK for talking about a pitcher’s consistency. But one that was related to a pitcher’s contribution to team success could also help us determine how much consistency matters.

If I had a database of run-based game scores going back many years (because run-based game scores are tied closely to a team’s chances of winning and over a long career you’d expect a pitcher’s distribution of scores to converge), the first pitcher I’d look at would be Blyleven. Because one of the charges leveled against his performance in the HoF debates was that he was unusually inconsistent, and that his distribution of excellent and mediocre starts was the cause of his poor win-loss record relative to his aggregate stats. That’s an interesting and plausible argument, and one that we don’t quite have the tools to evaluate. For this purpose we’d want a game score that sacrificed simplicity for accuracy — the real for W% (assuming average offense and replacement-level relievers, or something like that) from IP and RA is certainly not linear, and the game score I’d look for would reflect that.

Are you dead set on weighing the three? Why not use the FIP and linear weights score to predict runs, regress them both towards the runs actually allowed and weigh them equally. That way a guy is rewarded for giving up fewer runs but your still looking mostly at how well he pitched with results just being a benefit or a detriment. It’ll also benefit guys who go longer implicitly because there isn’t as much regression but it’ll still keep poorly pitched complete games in context.

You can get rid of those pesky decimals in V1 and V2 by simply changing the constant term from 40 to 30, which you said wouldn’t be a problem. I like these better, anyway:

V1: 8*IP -10*R +30

V2: 2*IP + 3*(SO-BB) +30

Right, I don’t mind changing the starting point to 35, maybe even 30. I’d have to do it the same for all of them though.

It’s more a question of what we want the starting point to be. The replacement level I use for a starting pitcher is .380, meaning I’d have to start it at 38. You can make a decent case for anything in the 35-40 range. Even in the 30-40 range.

Note also that the lower you make the starting point, the larger the IP multiplier is going to be to balance it out. And so, at the top end, you run the risk of really going past the 100 level.

30 also works very well as a base for V4:

10*IP + LWTScore +30

The best you can do with V3 is:

4*IP + FIPcore + 30, which gives an average of 49.2. Close enough?

Have you considered changing V2 to (2*SO-3*BB) to make it consistent with FIP? Then you could eliminate V3 altogether since HR are already included in V4.

The idea is that you can make a legitimate case not to have HR at all. So, that option has to remain. In your case, the HR is always tied to the BB.

I wonder if it wouldn’t make sense to start with formulas that don’t contain any common terms, and then figure out the weighting (to avoid double-counting). In the four formulas above, homeruns and K’s appear in two, and walks in three.

So, maybe one could come up with a K-only formula (and use it along with the runs-based and the linear weights-based one), or a hits-only formula (and use it along with the runs and FIP equations).

Actually, that was my point. I’d rather leave the HR’s out altogether. But I can’t because V2 overweights the importance of SO’s relative to BB’s so much that I’m forced to go with V3 instead. It’s the lesser of 2 evils. If the V2 formula was changed I’d shift all the weight from V3 to V2. Then, if I still feel the need to give HR’s some weight, I can bring them back in through V4 along with other hits.

Each Version PROPERLY weights each component. There is no overweighting within a version.

As for not wanting to give HR any weight: if a pitcher gives up six HR, are you prepared to ignore that fact altogether? And that you would rather say that someone with 0 walks and 6 HR had a better game than someone with 6 walks and 0 HR (all other things equal)?

This is the point of this exercise.