Summing up Game Score

I’m on a bit of a pitcher evaluation kick at the moment. Just a couple of days ago, I wrote about crowdsourcing balls in play at Beyond the Box Score.

More importantly, two weeks ago I had an idea: instead of measuring starting pitching performances on an inning or plate appearance basis, why don’t we evaluate them on a game-by-game basis? Since (team) wins are the end goal of a pitcher, and since each game is basically independent, we could evaluate an entire season simply by evaluating each start, and summing them up.

So how do we evaluate a single start? Traditionally, we have used pitcher wins. Then, those who wanted to ignore the effect of the pitcher’s team offense thought of the Quality Start. But do we really want to say that a six-inning, three-run start (4.50 ERA) is quality? No is the answer. No we don’t.

There wasn’t a great way to evaluate a single start, so Bill James, doing what Bill James does best, created something called Game Score. Here’s the formula for Game Score:

Game Score = Outs + 2*(innings completed after the fourth) + strikeouts – 2*hits – 4*earned runs – 2*unearned runs – walks + 50

It was a pretty good start, but far from perfect. Weighting earned runs twice as strongly as unearned runs seems arbitrary, as does counting only innings after the fourth. I won’t get into the specifics of what’s wrong with this Game Score, because it doesn’t really matter for my purposes. But, because it will be a good reference, I’ll show you the leader board for the sum of each pitcher’s Game Score for each start in the 2012 season:

NumNameGS
1Clayton Kershaw2089.33
2Justin Verlander2072.66
3R.A. Dickey2057.33
4Felix Hernandez1969.66
5Matt Cain1947.66
6Zack Greinke1917.66
7David Price1914.99
8Gio Gonzalez1912.66
9Johnny Cueto1907
10James Shields1893.33
11Kyle Lohse1885
12Mat Latos1870
13Jake Peavy1862.99
14Cole Hamels1859.66
15Hiroki Kuroda1858.33
16Madison Bumgarner1837.66
17Yovani Gallardo1828.66
18Jordan Zimmermann1797
19C.J. Wilson1796.33
20Jason Vargas1787.66

Looks like it passes the sniff test to me. Let’s move on.

A couple years ago, Tom Tango introduced a few alternatives to James’ Game Score, each one based on a different method of evaluating pitchers. Let’s summarize them.

Runs

The first new version of Game Score cares only about runs allowed. It’s essentially the Game Score version of RA9. Here’s the formula (again, as formulated by Tango):

Game Score = 6.4*IP – 10*R + 40

And the 2012 leader boards for total Game Score:

NumNameRuns GS
1Clayton Kershaw2077.06
2R.A. Dickey2049.06
3Justin Verlander2035.33
4Johnny Cueto1978.8
5Felix Hernandez1964.8
6David Price1960.39
7Matt Cain1953.73
8Kyle Lohse1940.4
9Zack Greinke1878.93
10Hiroki Kuroda1865.86
11Gio Gonzalez1865.73
12Jordan Zimmermann1842.26
13Matt Harrison1825.33
14Cole Hamels1818.13
15Jake Peavy1801.59
16Mat Latos1789.73
17Jason Vargas1780.93
18Jered Weaver1777.46
19Yovani Gallardo1765.6
20Cliff Lee1760.4

Strikeouts and walks

Here we have the other end of the spectrum; instead of considering only runs allowed, this version is going to be based only on strikeouts and walks, and nothing else. It’s basically the Game Score version of kwERA.

Game Score = 0.4*IP + 3*(SO–BB) + 40

And the leader boards:

NumNameKBB GS
1Justin Verlander1958.33
2R.A. Dickey1947.06
3Clayton Kershaw1924.06
4Felix Hernandez1913.8
5James Shields1912.06
6Zack Greinke1882.93
7Max Scherzer1874.06
8Cole Hamels1827.13
9Cliff Lee1821.4
10Ian Kennedy1811.33
11Madison Bumgarner1807.33
12Jake Peavy1805.59
13Matt Cain1796.73
14Mat Latos1793.73
15Johnny Cueto1784.8
16Yovani Gallardo1779.6
17David Price1768.39
18Adam Wainwright1764.46
19Hiroki Kuroda1761.86
20Gio Gonzalez1761.73

FIP

See the previous version, but add home runs, and you have the FIP version. There’s really not too much else to say. As always, Tango’s formula:

Game Score = 2.5*IP + 2*SO – 3*BB – 13*HR + 40

Mental Health and the CBA
A particular bit of language in the latest CBA could have negative consequences for some players.

Leader board:

NumNameFIP GS
1Felix Hernandez1996
2Justin Verlander1972.83
3Clayton Kershaw1965.16
4R.A. Dickey1906.66
5Zack Greinke1894.83
6Johnny Cueto1875.5
7Gio Gonzalez1856.33
8James Shields1842.16
9Adam Wainwright1802.66
10David Price1798.49
11Matt Cain1791.33
12Kyle Lohse1775.5
13Madison Bumgarner1754.83
14Cole Hamels1751.33
15Max Scherzer1738.16
16Hiroki Kuroda1731.16
17Mat Latos1723.33
18Jake Peavy1720.49
19Cliff Lee1719.5
20Jordan Zimmermann1718.16

Linear weights

Last one! This time, we’re going to use a simplified version of linear weights, looking only at walks, hits and home runs.

Game Score = 8.4*IP – 3*BB – 5*H – 8*HR + 40

Leader board:

NumNameLWTS GS
1Clayton Kershaw2080.39
2Justin Verlander2035.99
3R.A. Dickey1984.39
4Felix Hernandez1943.8
5Matt Cain1919.39
6Gio Gonzalez1918.4
7Kyle Lohse1869.4
8Johnny Cueto1865.8
9David Price1848.39
10Zack Greinke1837.59
11James Shields1824.39
12Mat Latos1818.39
13Jake Peavy1804.59
14Madison Bumgarner1801.99
15Hiroki Kuroda1793.19
16Cole Hamels1759.8
17Jered Weaver1754.79
18C.J. Wilson1735.6
19Jordan Zimmermann1726.6
20Adam Wainwright1701.8

Average

Now, it’s almost certain that none of these versions of Game Score is perfect on its own. However, as Tango said in the article a few years ago, we can assign weights to each one depending on our goals or preferences. Unfortunately, right now, I’m not sure how to do that. Maybe that will be a project for a future article. For now, I’m going to give you the average of all four new versions of Game Score.

NumNameAvg GS
1Clayton Kershaw2027.2
2Justin Verlander2015.028
3R.A. Dickey1988.9
4Felix Hernandez1957.612
5Zack Greinke1882.388
6Johnny Cueto1882.38
7Matt Cain1881.768
8Gio Gonzalez1862.97
9David Price1858.13
10James Shields1845.8
11Kyle Lohse1838.54
12Cole Hamels1803.21
13Hiroki Kuroda1802.08
14Jake Peavy1799.05
15Mat Latos1799.036
16Madison Bumgarner1789.028
17Jordan Zimmermann1755.656
18Cliff Lee1744.14
19Yovani Gallardo1741.292
20Max Scherzer1732.132

This list looks good, but it is far from a perfect way to evaluate pitchers. It doesn’t take into account park or league factors, which is incredibly important. However, if you’re looking for a different way to evaluate pitchers that takes many different factors into account, this is something to consider.

Conclusion

There you have it. For your reference, here’s a Google Docs spreadsheet of all the versions of Game Score for every pitcher who made at least one start in 2012.

Before I go, because I didn’t do a whole lot of actual analysis, here are some of my ideas at the moment for where to go next with these data:

{exp:list_maker} Include park and league factors
Combine these versions of Game Score with varying weights
Convert Game Score to wins
Look at total Game Score over a career
Probably much, much more. Stay tuned! {/exp:list_maker}

Thanks again to Tom Tango for the inspiration and, honestly, most of the real analysis. Also thanks to James Gentile for the Retrosheet help.


Print This Post
Matt is the founder of SaberSim, a daily sports projections and analytics company. Follow him on Twitter @MattR_Hunter and @SaberSim, or email him here and tell him all the things he should do to make the site better.
Sort by:   newest | oldest | most voted
MrMan
Guest
MrMan
Someone help me with something that continually perplexes me.  I see on Fangraphs and sites like this a lot of really good, insightful analysis and well-thought out approaches to mining numbers.  But I also see a lot of poorly presented information.  For example, in the very simple tables used on this post it’s difficult for readers to quickly grasp the meaning simply due to the poor formatting of the numbers.  Currently the numbers are presented in a centered format with two decimals.  First, the two decimals are meaningless when you’re dealing with numbers i n the thousands; there’s no meaningful… Read more »
Matt Hunter
Guest
Matt Hunter

Thanks for the feedback MrMan. You’re completely correct, and I should have taken more care in presenting these tables well. I’ll keep your suggestions in mind for future tables, graphs, and charts.

Jim
Guest
Jim

If you take off the 50 free points per start then you can subtract 1650 from both Kershaw and Verlander and probably others.  This takes care of the comma problem for those who don’t read numbers well.

I guess another reason for the 50 points is if a pitcher starts throws one pitch that is not hit and then departs, his game score is 50.  Wow!

MrMan
Guest
MrMan
Matt First, that’s a positive attitude.  Most people don’t respond to any criticism with “you’re completely correct”.  Second, you’re obviously smarter than I when it comes to baseball analytics; but if you ever do want some help in presenting data it’s basically what I do for a living and would be happy to help.  @Jim….I’m with you in not really understanding the granting of the 50 points per start.  This accounts for as much as 80% to 90% of the overall score.  Seems like a generous number for simply getting up on the mound and is too heavy compared to… Read more »
Tangotiger
Guest
Tangotiger
I can’t stand decimals either, in situations like this.  If it has no meaningful difference, it shouldn’t be shown. And this applies to ERA (should be to one decimal place), OBP, SLG (to two decimal places), and so on.  At least for those, they can rely on intertia/tradition for their obstinance. It’s why Bill James shows 107 runs created, and not 107.2. *** My Game Score starts at 40, not 50, and someone recommended I start at 30 (and make each of the point be earned to get to the average of 50).  Starting at 50 is problematic, precisely for… Read more »
Carl
Guest
Carl
Tom Tango/Tangotiger (I assume same person) I had missed how similar my proposal was to your #1 proposal.  Nice catch.  I do think my proposal would take that and tweak it by: 1) not giving 10 marginal runs as a win This is important, I believe, to adjust for diiferent scoring periods (1960s vs 1920s vs steroid era vs deadball era) and 2) requiring minimum 5 innings (hidden in looking at wins for IP/ER allowed 3) using ER instead of R to avoid differences in official scorers. 4) not adding 40. Are you up to the challenge of creating such… Read more »
Tangotiger
Guest
Tangotiger
Yes, I would tweak it by era.  The 10 runs per win is good enough though.  Most eras will have it between 9 and 11, and, really, does it matter if the Game Score will show 64 instead of 67?  Anyway, I agree that the “10” should be flexible. *** “using ER instead of R to avoid differences in official scorers” ??  You are confused here.  R has NO interpretation.  ER is subject to official scorer interpretation.  So, to avoid differences, you want R.  You are arguing against yourself here. *** And the “40” or “30” or whatever to add,… Read more »
Carl
Guest
Carl
Tom Tango, 1) You are 100% that I got myself confused w ER/R and while I had proposed ER above, I should not have.  Using R ist he superior measuring stick. 2) I don’t care for the 10, as while it really doesn’t matter for individual games (as you say, 64 vs 67 doesn’t really matter that much). I want to avoid the situation where the amount of the loss affects an adjusted W/L record.  For example, a SP who gives up 8 runs in 1/3 of an ining will get the loss 99.99% of the time.  One who gives… Read more »
Carl
Guest
Carl

PS> Tom Tango, Loved the original article and remembered it as soon as I clicked on the link above.

Zachary
Guest
Zachary
I know you didn’t mean too much by averaging the different metrics, so this is not any sort of criticism but just observations if you actually decide to attempt an improved game score. You said that it is almost certain none of the versions on their own are perfect, but that assigning weights depending on goals/preferences to the metrics (like the 25/25/25/25 split in the simple average provided) could potentially produce a better game score figure. In my opinion almost any sort of weighting method across multiple versions will actually produce a worse result for evaluation for a few reasons. … Read more »
Tangotiger
Guest
Tangotiger
If this was preferable I would think that would be the weight used for FIP from the outset. I guess you are not familiar with “predictive FIP”, elsewhere at Hardball Times, because that’s pretty much what it comes down to. *** The purpose of the different Game Scores is that each has a different view as to how to evaluate a pitcher’s performance. And by laying it out there, each person is therefore free to decide how they want to view that performance… but forces that person to be consistent. Is a 4-hit, 0-walk game the same as a 0-hit… Read more »
Zachary
Guest
Zachary
I absolutely agree that each Game Score version is a different but valid view as to how to evaluate a pitcher’s performance. There most likely isn’t a perfect Game Score method, and if there is, or at least a best method, I couldn’t begin to guess as to what it would be composed of.  I also was completely aware that this article was not meant to be an answer, just the opening of the discussion on the issue, and I had no intention to claim any one individual method was without merit.  Also I acknowledge I am in no ways… Read more »
Zachary
Guest
Zachary
As an aside, in response to the comment on “predictive FIP”, I am not entirely sure I follow what you are trying to say.  From what I think I understand it is that you are saying pFIP calculation is similar, at least loosely, to how I was looking at the weighting of “SO’s and BB’s” method and the “FIP” method.  If that is so, then maybe using the pFIP to calculate the Game Score may be of interest.  What I was saying in the highlighted line was that we have a new formula for a game score that uses HR,… Read more »
Jim
Guest
Jim
I came up with a totally different one years ago, which I will throw out here.  I’m not in love with this one either, but it does reward excellence and not mediocrity. Jim’s Modified game scores 1.  Must pitch at least 7 innings. 2.  Add one point for each batter retired after 21. 3.  Subtract (or add) the difference between:   Strikeouts and walks.   Hits and four.   Earned runs and modified quality start   Average pitches per inning and eleven My modified quality start is a straight ERA of 3.00 or less. This can result in a negative… Read more »
Carl
Guest
Carl

Carl’s Game Score:

Take the prior year’s winning percentage for each IP/ER combo and multiply that percentage for the correspondnig IP/ER for each start.  The individual games will be a % of win earned and the sum of a pitcher’s individual games will be his total wins.  Subtract starts from the adjusted wins to get Adjusted Losses.

By using entire starts, eliminates the situation where a starter is domninent one night, terrible the next yet his .500 record looks unfair due to lots of K’s, low walks, etc.  Also, eliminates knuckle ballers (and other pitchers such as Hudson) who outperform their FIP.

No ma'am we're musicians
Guest
No ma'am we're musicians

I guess what bothers me about such efforts is the lumping into the ‘bad’ bin all walks.  I’ve seen situations where the walks did turn out bad for the defense, but I tend to remember more times when the walk of a hot hitter lead to the scoreless end of the inning.  Conversely, I’ve seen hits by a slow runner clog up the bases enough so that instead of being a blowout inning, a couple runs are picked up.

Some sort of different approach is needed, where the results of an inning factor back into the events.

bucdaddy
Guest
bucdaddy

I can’t stand decimals either, in situations like this.
—-
Heh, I guess it’s just a glitch in the way the clocks in clock sports operate, but it amuses/irks me that the timers in, say, basketball start running decimals in the last minute, and that announcers feel obligated to include them. “Time out, Lakers, with 57.6 seconds to play …” I mean, there DOES come a point when the decimals matter to actual strategy, but that point is with 3.8 or 5.1 seconds to go, and not much sooner.

Dave Cornutt
Guest
Dave Cornutt
I seem to recall that when James first published the Game Score formula and his original observations, he stated that he regarded it as a toy—fun to play with, but with questionable value as to enlightenment.  However, it’s a good point that GS is an attempt to do a better version of what Quality Start (which goes back to the dawn of sabermetrics) does.  (Hey, does anyone remember Runs Created…)  The main qulbble I have with the way this is presented is that, like all counting stats, you need to have some idea of context in order to really understand… Read more »
MrMan
Guest
MrMan
@Jim Not sure if you’re serious or not.  But if you added all the money to the right of the decimal in my paychecks last year you come up with $12.45.  Now, $12.45 isn’t worthless, you can have a decent meal with it or purchase a good Online game via XBOX Live or even make it halfway to a BPiA Boomstick.  But in terms of using it to evaluate my overall compensation…it would be meaningless.  And realize that’s the aggregate of all my paychecks…any one of the paychecks the amount, a t most, would be $0.99.  The point is that… Read more »
Tangotiger
Guest
Tangotiger

Jim: we’re arguing for rounding, not truncating. 

I can’t believe you would use the plot device of Superman III to make your case.  (And I can’t believe they made a worse Superman movie after that one.)

Carl
Guest
Carl
Okay guys, I’m intrgued enough to sacrifice a few weekends to get this done.  Can someone pelase direct me to where I can get a download (pref in either xls or csv format, a list of all 2010-2012 games reflecting the last name/first name of teh starter, the IP by that starter in the game, the Runs allowed by each starter in the game and whether the team (not the pitcher) won or lost the game?  I’m envisioning a large file w 14580 rows of data and 5 columns to start the analysis?  Thank you to all who can help… Read more »
Tangotiger
Guest
Tangotiger

Carl: you can get it at Baseball-Reference.com’s Play Index.

But, you are going to find what can be explained by PythagenPat.  If league average is 4.3 runs per 9 IP, and if you have a pitcher that gave up say 1 runs in 6 IP, this is what you have:
Team Runs scored = 4.3
Team Runs allowed = 1 + 4.3/9*3 = 2.43

And that will give you a win% of 72.5%.

My game score for that is:
Game Score = 6.4*6 – 10*1 + 40 = 68.4.

Which seems close enough for a crude measure.

Jim
Guest
Jim
I’m answering two posts directed to me at once.  Okay tango, after rounding what have you got? Truncation!  And there would have been no clear cut triple crown winner last year because both Trout and Cabrrera would have batted 33 or .33. No, if you have your employer send to my bank account everything to the right of the decimal in your paycheck and I get 999 other people to do the same, that will add a good chunk to my disposable income.  Figuring the average amount at 50 cents, times 1000 deposits per week, keeps me in Dale’s, I… Read more »
Tangotiger
Guest
Tangotiger

If my paycheck is 100.60 cents, I’d get 101$.  If someone else’s paycheck is 100.40 cents, he’d get 100$.  Either way, our company is paying out 201$.  That’s why rounding works, because given a large enough number of employees, things will work out.

***

Carl, I just wrote this up:

http://tangotiger.com/index.php/site/article/tangos-lab-deconstructing-game-score

Jim
Guest
Jim

For those of you who don’t like decimals, would you send me everything to the right of the decimal on your pay check?  If it’s not important to you, I will be able to use it.  You guys are scared of number, methinks.

Tangotiger
Guest
Tangotiger
The purpose of FIP is to be “descriptive”. The formula can’t change.  It’s its raison d’etre. The purpose of pFIP is, by definition, to be “predictive”. Predictive is based on historical data, and that means to look backwards and figure out what the pitcher was really responsible for, and the rest was just random variation.  (A descriptive stat still counts that random variation, while a predictive one removes it.) By averaging the two game scores, we are reducing that random variation (somewhat). *** And you can actually make a stronger case that you don’t like the averaging by comparing the… Read more »
Rolling Fingers
Guest
Rolling Fingers

After admittedly scrolling rather quickly through all the discussion about decimal places, we are still left to wonder why there are any fractions in the first table. The Bill James formula always yields an integer. The sum of a column of integers should always be an integer.

Carl
Guest
Carl
Tom Tango, Thank you for both the suggestion to use basebalreference.com (will subscribe tomorrow), as well as the other link. The use of pytheg to winning percentage is cool.  After just a quick look though, I think the small differences (ie 3 game score points in the example above) breaks down at the extreme.  For example, a team that allows 8 runs in 9 innings, to me, is unlikely to win 22% of the time. Conversely if the next game that same team allows only 2 runs in 9 innings, are they going to win only 78% of the time? … Read more »
Tangotiger
Guest
Tangotiger

Right, as I noted in the comments on my blog, I made a mistake.  A team that AVERAGES allowing 2 runs will win 78% of the time, but if they allow EXACTLY 2 runs, the calculation will be different.

Tangotiger
Guest
Tangotiger

As for the Bill James Game Score, since the decimals are all the .33 and .66 variety, it’s clear that Matt gave out points for partial innings, which is not correct.

Matt Hunter
Guest
Matt Hunter

Apologies, I must have had an error in my formula somehow.

Thanks for all the fantastic feedback, everyone. Definitely lots of material to think about and improve on for the future.

Hardwood
Guest
Hardwood

I do love watching people fail at statistics. Makes for quite the friday night.

bucdaddy
Guest
bucdaddy

IIRC, sending fractions of a cent into your own account was a plot point in “Office Space.” It worked out to be a little more than the penny-ante (literally) thieves thought it would.

wpDiscuz