FanGraphs Baseball


RSS feed for comments on this post.

  1. None of the formulas you listed use HBP. What gives?

    Comment by Yirmiyahu — July 29, 2011 @ 2:48 pm

  2. From the article:

    When I say “walk”, I’m really excluding intentional walks and including hit batters. It makes most sense to consider it that way than not, so that’s what I’m doing.


    So, whenever you see “BB”, it really means “BB-IBB+HBP”. It’s ugly to show that, hence the reason I just show BB.

    Comment by tangotiger — July 29, 2011 @ 2:54 pm

  3. Oh, missed that line. Very reasonable, thanks.

    Comment by Yirmiyahu — July 29, 2011 @ 3:04 pm

  4. What is the end goal here? Are we simply trying to find a way to compare individual pitching performances, or is this a broader attempt to improve the way that we measure a pitcher’s value? Ideally, would Average Game Score be a better measure of WAR than FIP?

    Forgive me if these are obvious or elementary questions; I didn’t really know anything about Game Score before reading this article.

    Comment by strongbad56 — July 29, 2011 @ 3:21 pm

  5. The opening paragraphs make me feel like I’ve jumped into Part II of a series but missed Part I. Did you leave out an opening paragraph or 2?

    Comment by Hizouse — July 29, 2011 @ 3:27 pm

  6. Putting it into WAR could be one possible outcome, though not necessarily the goal.

    Comment by tangotiger — July 29, 2011 @ 3:34 pm

  7. Yeah, originally I had intented this for my blog, so it would be part of a continuing series for the regulars there.

    Let me add a paragraph so as to not shock the newcomers.

    Comment by tangotiger — July 29, 2011 @ 3:34 pm

  8. I love this. I’d say something like 40% Runs + 60% FIP would be a really fun game score to kick around. I don’t love the LWTS gs because it treats 1/2/3 base hits equally. I’m sure you did it to maintain simplicity, but I’m not sure it makes it more useful. And the straight K/BB system seems to overlap with FIP too much, but I do like removing HR from the equation (partially). Here are a couple options I would vote for:

    Runs 40% – FIP 60% – Baseline
    Runs 40% – FIP 30% – K/BB 30% (if you want HR to have less affect)
    Runs 33% – FIP and/or K/BB 33% – LWTS – 33% (If you want BABIP to have more effect)

    Ok, so I have no idea what I would really vote for. But my gut says RA should be around a 1/3, with FIP based stuff making up the rest of it. This is cool. Wonder what the crowd thinks.

    Comment by Telo — July 29, 2011 @ 3:36 pm

  9. Perhaps what I’ll do is put up a poll first, to guage the “blink” opinion of the readers here. Then, when the playoffs roll around, we can talk more specifics of actual games.

    Comment by tangotiger — July 29, 2011 @ 3:40 pm

  10. 1) If one goal is to have a comparable scale to likelihood of winning, then the formulas necessarily change (however imperceptibly) with changes in scoring environments, correct?

    2) If another goal is to find the right combination of the 4 score formulas, then shouldn’t actual games provide history by which to run some sort of regression analysis on win likelihood (Assuming the vagaries of run support wash out over large samples)?

    Comment by Cory — July 29, 2011 @ 3:54 pm

  11. 1. The standard linear weights formula leave all the change in weights to the out. While not necessarily the best, or the right, way to do it, there’s a strong appeal to only needing to worry about changing one parameter, rather than all of them. Do we really want to worry about the weight for the Runs to be 9.5 or 10.2 each year, or can we just go with 10, and let the coefficient for the IP float?

    Given that these are quick shortcuts anyway, I think there’s a strong appeal to just changing the IP weight to force the average to “50”.

    2. If you do that, 100% of the weight is going to be on runs (Version 1). This is because the winning team is based 100% on runs scored and runs allowed.

    If you already have runs allowed, what is the number of walks, hits, HR, and strikeouts going to tell you? Well, nothing at all. You already have runs.

    Well, it might tell you A LITTLE. Because the starter is only going to pitch 5-7 innings, the number of hits, walks, HR a starter gives up might be a good indicator as to what the reliever is going to give up (because that tells you more about the opposing hitting team than the number of runs scored).

    However, others have run the regression in the past, and virtually 100% of the weight would go to runs allowed.

    Comment by tangotiger — July 29, 2011 @ 4:01 pm

  12. Yeah, the LWTScore is disappointing. I’d like to see a game score based on the same linear weights as wOBA.

    Comment by Yirmiyahu — July 29, 2011 @ 4:09 pm

  13. wOBA is Linear Weights.

    The difference is that wOBA uses plate appearances as its “opportunity space”, while Linear Weights uses outs (or IP).

    In my case, since IP is the opportunity space, then Linear Weights gives me the correct weights.

    Comment by tangotiger — July 29, 2011 @ 4:14 pm

  14. I added a “recap” section at the bottom, just so that all the equations are there together.

    Comment by tangotiger — July 29, 2011 @ 4:14 pm

  15. Yirm is saying that the implementation of the LWTS is disappointing, and that he would want to see each event given it’s proper weighting – which is the point I made in my post as well. As it is, it accomplishes nearly the same thing, and you did it for simplicity’s sake, totally understood. But it would be nice.

    Comment by Telo — July 29, 2011 @ 4:19 pm

  16. To clarify, all we’re talking about is doubles and triples.

    Comment by Telo — July 29, 2011 @ 4:22 pm

  17. Oh, I see.

    Right, I was trying to simplify it by not including those events, for two reasons, neither of which may be good enough: (a) to keep it simple like James’ original Game Score, to which I added HR and removed ER, (b) lack of historical data pre-1950.

    I’m not sure that setting the values to “5” for hits, then a bonus of “3” for doubles or triples is necessarily required (and subsequently increasing the IP coefficient to balance it out).

    However, I’m not against it, nor am I against the idea of one version used if the 2B+3B data is available, and another version when that data is not available.

    Comment by tangotiger — July 29, 2011 @ 4:30 pm

  18. If you give double the weight to Version 4 (linear weights), and the other 3 are given about the same weight, we get this equation:

    Game Score
    = 40
    + 5 * IP
    + 1 * SO
    – 2 * (R + BB + H)
    – 5 * HR

    A perfect game with 20 K gives you a Game Score of 105.

    Those weights are a bit reminiscent to the James weight. The K is a match. The hit is a match.

    He has 1 for walk, whereas I have 2. I have 5 for HR, whereas it was excluded in his. I think both of these are improvements.

    He has 4 for ER and 2 for UER, whereas I have 2 for R. So, obviously he considers runs alot more than I do.

    He starts at 50 whereas I start at 40. I think this is an improvement too.

    He gives bonus points for pitching in innings 5 and later, whereas I don’t. I’m agnostic on this one.

    Comment by tangotiger — July 29, 2011 @ 4:44 pm

  19. Oh interesting, didn’t realize we were missing 2B/3B data for pitchers pre 1950. Is that across the board, or just spotty retrosheet data that hasn’t been totally transcribed? I’d vote for calculating the 2B/3B values when we have it, and using straight H when we don’t… since I’m not the one doing the work :)

    Comment by Telo — July 29, 2011 @ 4:51 pm

  20. If you look at the Bill James weights, and try to reverse-engineer them into the 4 versions I have, he gives these implied weights:

    40% Version 1 (Runs)
    30% Version 2 (K/BB)
    0% Version 3 (FIP)
    30% Version 4 (Linear Weights, sans HR)

    I get a perfect match to his R, K, H components. I get an implied BB value of -2 not -1 like in his original one.

    So, I think there’s a definite mistake in his Game Score for the walk. I can’t get the value of the BB to be -1, while also keeping the value of the K and H as he has them.

    Nonetheless, there’s nothing wrong with his weights. Certainly reasonable. Presumably, if he had decided to include HR, some of the weight for Version 2 and Version 4 would go to Version 3.

    In that case, the HR-included Bill James Game Score would imply the following:
    Game Score
    = 40
    + 5 * IP
    + 1 * SO
    – 1 * H
    – 2 * BB
    – 4 * (R + HR)

    I think there is something unappealing in weighting the walk more than the hit. I know why it comes out that, especially if you like FIP. It just seems… weird.

    Comment by tangotiger — July 29, 2011 @ 4:56 pm

  21. I like giving a little credit for going past the 5th inning. That has value to the team. (Well come to think of it – does it? Anecdotal/logically I would think it should, but can we prove it? First, you would have to be pitching better than your bullpen, obviously, and you’d have to show that your bullpen would otherwise not be able to pitch as much during the next couple days if they were to relieve you… and that your bullpen is better than your other starters. Seems messy, maybe leave it out then?)

    Comment by Telo — July 29, 2011 @ 4:56 pm

  22. If we do the weighting of 60/40 FIP Score and Runs, which I think gets to the essence of great games (dominance plus run prevention), with some rounding simplification, the formula would come out:


    The Clemens game is a nice 96. The average game I believe remains 50 (4*6ish-4*3ish+4-4-5=47, so it’s close)

    Looking at John Lackey’s first game of the season which was 3 2/3rds, 2 walks, 3Ks, 2 HRs, and 9 runs we’d get a game score of 4*3.66-4*9+3-4-16+40=2

    That feels pretty good to me.

    Comment by Joel W — July 29, 2011 @ 4:58 pm

  23. In your case, you’d have to decide if it matters if someone gives up 0 singles+doubles+triples or 10 (without the corresponding change in runs). That is, does his “scatter-ability” matter or not.

    You are saying “not”

    Comment by tangotiger — July 29, 2011 @ 5:05 pm

  24. Yeah, it’s a tough one. You can make the case that being pulled after 5, before you get tired, is actually a good thing, that with a 6-deep or 7-deep bullpen, that any of those guys is better than your non-ace starter.

    Comment by tangotiger — July 29, 2011 @ 5:06 pm

  25. Isn’t doing an BB/SO version and a FIP version redundant?

    Regardless, I do like the version you have in your 4:44pm post.

    Comment by Charles Saeger — July 29, 2011 @ 5:09 pm

  26. Shouldn’t the runs component take care of the “scatterability” and XBH components to a sufficient extent? That is, to say “without the corresponding change in runs” begs the question a bit it seems.

    Also, I would have thought that not including any hits would be to argue that “scatterability” does matter, because you’re solely looking at runs, and not at hits that didn’t turn into runs. (This is more a question than a statement because I entirely acknowledge my inferior thinking on these issues).

    Comment by Joel W — July 29, 2011 @ 5:12 pm

  27. Not necessarily redundant. It’s almost like you are saying: “should I count the HR or not? I’m not sure, so I’ll go 50/50 on it”. And so, you take half of Version 2 and half of Version 3.

    Comment by tangotiger — July 29, 2011 @ 5:13 pm

  28. Yea exactly. It’s been a while, but Isn’t there a part in the Book where you show that the 3rd time through the order SPs become measurably worse?

    Comment by Telo — July 29, 2011 @ 5:14 pm

  29. You are mostly right. I don’t think I explained it well.

    Basically, it’s hard to give up 10 hits one game, and 0 hits another game, and have the same number of runs allowed.

    The question is if these two games should count the same (for you) or not. Do you want “scatter-ability” to be a real ability? Do you believe in it? If so, then giving up 10 hits and 0 runs is the same as giving up 0 hits and 0 runs.

    Comment by tangotiger — July 29, 2011 @ 5:16 pm

  30. The poll of the no-hitters made me think of the EloRater on bref comparing all major league hitters based, basically, on general consensus. I am wondering if the same thing could be done for pitched games; give people a serious of one on one forced choice comparisons (with all of the components included in the 4 models, and excluding own offensive team performance), and eventually build up a ranked list, from best to worse. I think it would have to be fit to a normal curve, since most pitching performances would give you a 35-65% chance of winning the game. I think it would then be fairly simple to create a formula that weighs each of the 4 models (or breaks up them up into their individual components). Or not.

    Comment by Jared — July 29, 2011 @ 5:17 pm

  31. Do you (we) have WPA data? Basically it seems like we are trying to find a linear approximation from stats to WPA.

    You’re trying to do it by “feel” which is also legitimate, but I would want to at least see the coefficients from a regression of IP, H, R, ER, BB, HR, and K on WPA.

    Comment by Matt Crawford — July 29, 2011 @ 5:19 pm

  32. Apart from that, I would probably vote for something pretty close to Bill James’ weights as you calculated them above: Version 1, 2, and 4 weighted about equally, with a little extra on Version 1.

    Comment by Matt Crawford — July 29, 2011 @ 5:23 pm

  33. I’d say that I mildly believe in scatterability, and think that sometimes pitchers lose command at basically random times, so they will have games with decent peripherals and lots of runs.

    Even so, for Game Score, my gut is that it’s some measure skill descriptive (the FIP component) and some component descriptive of what happened (the run component). Insofar as scatterability matters, and when it matters, I think it gets picked up in the runs component.

    In a game where a pitcher strikes out 10, walks 1, and gives up between 3 and 7 hits over 8 innings, while allowing 1 run, I don’t think I’d remember that game any differently given the number of hits. If the pitcher strikes out 10, walks one, gives up 8 hits, and 5 runs, with no home runs, I’d say “god damn if he hadn’t just collapsed in the 5th, or had Jeter gotten to that ball, he would have had a great game.”

    Comment by Joel W — July 29, 2011 @ 5:27 pm

  34. I like a 50/50 combination of “Strikeouts and Walks” and “Linear Weights”. It flies in the face of DIPS a bit (to the extent that it explicitly excludes your FIP-based game score) but when we are talking about dominance in a single game I will go against the DIPS grain and say that the results of contact matter.

    As for the other two scores, I think this 50/50 combination allows you to exclude them. The “Strikeouts and Walks” score captures 2 of the 3 inputs to FIP anyway (though not in the same weights), and the “Linear Weights” version captures the third. So it has more than a bit of the “flavor” of FIP contained within it, rendering an FIP-based score somewhat redundant. And in my opinion the runs-based score should be somewhat redundant with the linear weights score, except in the case of truly fluky low-run games that are not “dominant” in the way I would think of it (e.g. 10+ singles and some walks scattered over 9 innings allowing no runs).

    Comment by mcbrown — July 29, 2011 @ 5:40 pm

  35. Right, third time through the order is terrible.

    Comment by tangotiger — July 29, 2011 @ 6:20 pm

  36. Excellent. Not that I necessarily agree with you, but you’ve done exactly what I’ve asked, and that is, to think through, and then decide for yourself what you want.

    Comment by tangotiger — July 29, 2011 @ 6:55 pm

  37. I would prefer that people do that based on actual games they’ve seen.

    Otherwise, we’re asking them to interpret numbers. If that’s the case, then simply giving me a weight of the 4 versions is sufficient, and alot easier to code and much faster to get a result.

    Comment by tangotiger — July 29, 2011 @ 6:57 pm

  38. For starting pitchers, WPA is entirely runs and innings driven, and so, that would be Version 1.

    Comment by tangotiger — July 29, 2011 @ 6:59 pm

  39. I like a version that goes:

    40% FIP
    35% Runs
    25% LWTS

    I think that you should include HR because HR’s represent a mistake by the pitcher, and a great game ideally has less mistakes. This makes the K-BB version redundant. A nine inning shutout is a dominant game to me, even if it involves few Ks and more scattering of hits. I would give LWTS more weight if it included 2Bs and 3Bs, because giving up gap hits is less impressive than giving up bloop singles.

    * Also, how does 3 BBs = 1 R

    Comment by William — July 29, 2011 @ 7:32 pm

  40. Go to my site, and read “How are Runs Really Created”.

    Comment by tangotiger — July 29, 2011 @ 7:34 pm

  41. The only reason not to give full weight to the HR is that the HR is park-dependent to some extent, not to mention that alot of the HR is dependent on the hitter.

    This is why Version 2 (sans HR) correlates better than Version 3 (FIP with HR) with next year’s data (or out of sample data).

    That is, rather than isolating the pitcher’s performance, we’re including things outside his control (to some extent).

    Hence, the argument you can make to give most weight to FIP, but then some weight to Version 2 (without HR).

    Comment by tangotiger — July 29, 2011 @ 7:42 pm

  42. Ah, you’re right, I never thought about it that way. I was thinking that every Hit or BB a pitcher gives up lowers the Win% a bit…but then I guess every inning he pitches removes that.

    Comment by Matt Crawford — July 29, 2011 @ 7:46 pm

  43. The correct answer about the weightings is–who cares?
    I’ve never seen proof that the original “game score” concept means anything and see no reason to think that this new version does either.

    Comment by GiantHusker — July 29, 2011 @ 7:50 pm

  44. Are you dead set on weighing the three? Why not use the FIP and linear weights score to predict runs, regress them both towards the runs actually allowed and weigh them equally. That way a guy is rewarded for giving up fewer runs but your still looking mostly at how well he pitched with results just being a benefit or a detriment. It’ll also benefit guys who go longer implicitly because there isn’t as much regression but it’ll still keep poorly pitched complete games in context.

    Comment by Deadpool — July 29, 2011 @ 8:28 pm

  45. One thing (and certainly not the only thing) that’s interesting about pitcher game scores is that, if they are a good reflection of pitcher skill and measure the chance he’s given his team to win the game, they could help talk about the value of consistency or inconsistency in a rational way. The FG article about Ubaldo Jimenez and accusations of inconsistency used the Bill James Game Score, which is probably OK for talking about a pitcher’s consistency. But one that was related to a pitcher’s contribution to team success could also help us determine how much consistency matters.

    If I had a database of run-based game scores going back many years (because run-based game scores are tied closely to a team’s chances of winning and over a long career you’d expect a pitcher’s distribution of scores to converge), the first pitcher I’d look at would be Blyleven. Because one of the charges leveled against his performance in the HoF debates was that he was unusually inconsistent, and that his distribution of excellent and mediocre starts was the cause of his poor win-loss record relative to his aggregate stats. That’s an interesting and plausible argument, and one that we don’t quite have the tools to evaluate. For this purpose we’d want a game score that sacrificed simplicity for accuracy — the real for W% (assuming average offense and replacement-level relievers, or something like that) from IP and RA is certainly not linear, and the game score I’d look for would reflect that.

    Comment by Al Dimond — July 29, 2011 @ 8:34 pm

  46. You can get rid of those pesky decimals in V1 and V2 by simply changing the constant term from 40 to 30, which you said wouldn’t be a problem. I like these better, anyway:

    V1: 8*IP -10*R +30
    V2: 2*IP + 3*(SO-BB) +30

    Comment by James M. — July 29, 2011 @ 8:35 pm

  47. Right, except for when he leaves the game in the middle of an inning.

    Comment by tangotiger — July 29, 2011 @ 9:04 pm

  48. Thanks.

    Comment by Hizouse — July 29, 2011 @ 9:30 pm

  49. Right, I don’t mind changing the starting point to 35, maybe even 30. I’d have to do it the same for all of them though.

    It’s more a question of what we want the starting point to be. The replacement level I use for a starting pitcher is .380, meaning I’d have to start it at 38. You can make a decent case for anything in the 35-40 range. Even in the 30-40 range.

    Comment by tangotiger — July 29, 2011 @ 9:46 pm

  50. Note also that the lower you make the starting point, the larger the IP multiplier is going to be to balance it out. And so, at the top end, you run the risk of really going past the 100 level.

    Comment by tangotiger — July 29, 2011 @ 9:47 pm

  51. You last sentence makes sense, since all of the information that ACTUALLY matters wrt the winner of the game is stored in RA.

    However, if all of the weight goes to RA, doesn’t that just mean that the problem is degenerate? In which case you can form a correlation between RA and the other three variables and base your weights off of those.

    I’d propose that 1-R^2 in the latter correlation is some evidence that “sequencing skill” exists, in some form.

    Comment by cd — July 30, 2011 @ 12:18 am

  52. I wonder if it wouldn’t make sense to start with formulas that don’t contain any common terms, and then figure out the weighting (to avoid double-counting). In the four formulas above, homeruns and K’s appear in two, and walks in three.

    So, maybe one could come up with a K-only formula (and use it along with the runs-based and the linear weights-based one), or a hits-only formula (and use it along with the runs and FIP equations).

    Comment by Craig Tyle — July 30, 2011 @ 11:50 am

  53. 30 also works very well as a base for V4:

    10*IP + LWTScore +30

    The best you can do with V3 is:

    4*IP + FIPcore + 30, which gives an average of 49.2. Close enough?

    Have you considered changing V2 to (2*SO-3*BB) to make it consistent with FIP? Then you could eliminate V3 altogether since HR are already included in V4.

    Comment by James M. — July 30, 2011 @ 4:26 pm

  54. The idea is that you can make a legitimate case not to have HR at all. So, that option has to remain. In your case, the HR is always tied to the BB.

    Comment by tangotiger — July 30, 2011 @ 4:37 pm

  55. Actually, that was my point. I’d rather leave the HR’s out altogether. But I can’t because V2 overweights the importance of SO’s relative to BB’s so much that I’m forced to go with V3 instead. It’s the lesser of 2 evils. If the V2 formula was changed I’d shift all the weight from V3 to V2. Then, if I still feel the need to give HR’s some weight, I can bring them back in through V4 along with other hits.

    Comment by James M. — August 2, 2011 @ 8:22 pm

  56. Each Version PROPERLY weights each component. There is no overweighting within a version.

    As for not wanting to give HR any weight: if a pitcher gives up six HR, are you prepared to ignore that fact altogether? And that you would rather say that someone with 0 walks and 6 HR had a better game than someone with 6 walks and 0 HR (all other things equal)?

    This is the point of this exercise.

    Comment by tangotiger — August 2, 2011 @ 9:47 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Close this window.

0.273 Powered by WordPress