FanGraphs Baseball

Comments

RSS feed for comments on this post.

  1. The logistic formula graph looks like a saddle that got cut off just past the saddle point (in the higher green band).

    The difficulty in extrapolating which graph is better based on its extrema (i.e. where batter or pitcher K% approaches 100) is that we don’t have any tool by which to evaluate it besides intuition. Even during Michael Jordan’s lone year of pro ball (in AA, no less), he only had a K% of 26% or so.

    Comment by rusty — June 14, 2013 @ 12:59 pm

  2. Steve: can you tell me what your formulas would give you for:

    a. 10% K v 10% K, in league of 17.5% K?

    b. 90% K v 90% K, in league of 82.5% K?

    Comment by tangotiger — June 14, 2013 @ 1:49 pm

  3. I wrote about this being an example of an ecological fallacy yesterday. I take that back. It is interesting and obviously took a lot of effort! Congrats!

    I was merely pointing out that the analysis may not tell us very much about Adam Dunn because it appears to be based on grouped data and seems to assume that outcome of a PA is independent of all others. I suppose my concern is that your model may be an overly simplistic representation of reality and thus may not account for some theoretically relevant variables. In particular, it seems as though you are modeling matchups using historical K% only as if those are the only variables that determine matchup outcomes.

    An alternative modeling strategy would be to control for unobserved differences between pitchers and hitters by including fixed or random effects. Of course, it could get awfully complicated since it would probably make sense to also account for game, team and year effects as well.

    Just my .02.

    Comment by Ecological Fallacy (not really, I guess) — June 14, 2013 @ 2:08 pm

  4. Sure, Tango. A) 5.5%; B) 94.5%

    The spreadsheet app near the top of the article is interactive, for whoever is interested in figuring out more of these — you can input the numbers directly into it on this page.

    Comment by Steve Staude. — June 14, 2013 @ 2:12 pm

  5. Ahh, OK I kind of see that.

    Only one solution to the problem of not enough super-high-K% data points — put me in the majors! For science!

    Comment by Steve Staude. — June 14, 2013 @ 2:20 pm

  6. Yeah, this is the simplest way I could have done it — it’s just a baseline. Some batters may have particular weakness to a curve, or a certain type of delivery, which throw them off of the baseline. But I don’t think we can really judge that until we have that baseline.

    You have to admit — this system does at least explain/predict a huge share of what’s going on, right?

    Comment by Steve Staude. — June 14, 2013 @ 2:26 pm

  7. Oh, thanks, by the way!

    Comment by Steve Staude. — June 14, 2013 @ 2:40 pm

  8. It’s blocked at the office. Those numbers are for the odds ratio method.

    What do you get using your other methods?

    Comment by tangotiger — June 14, 2013 @ 3:10 pm

  9. Haha, I appreciate the response.

    That said, since we’re really only looking at that left-most quarter or so of the shape, the two share the same type of convexity, so the analytic story you’re telling shouldn’t change that much. I’m using as high-water marks Kimbrel’s 50.1% K last year, and Adam Dunn’s 35.7% K from 2011…

    Comment by rusty — June 14, 2013 @ 3:23 pm

  10. Whoops, I assumed that’s what you were interested in, and misread. OK, now listing Formula 1, 2, and Logistic:

    A) 5.35%, 5.94%, 5.55%
    B) 98.25%, 96.38%, 94.45%

    Formulas 1 and 2 won’t take a different league K% as an input, though (I know they should, ideally).

    Comment by Steve Staude. — June 14, 2013 @ 3:33 pm

  11. A+B must equal 100%.

    Because all I did for B was say “not strike out”. So, if you have a 5% chance to strikeout, you have a 95% chance to not strikeout.

    Comment by tangotiger — June 14, 2013 @ 3:40 pm

  12. Ah, true. Another mark in favor of the logistic and/or odds ratio.

    Comment by Steve Staude. — June 14, 2013 @ 3:48 pm

  13. On my site, I also suggest that maybe you should limit it to 2007, because K rates jumped starting in 2008 or 2009. The league average is nothing close to constant.

    Comment by tangotiger — June 14, 2013 @ 3:57 pm

  14. The other thing that is wrong with the logit: the three coefficients must add up to 1. In your case, it’s +.92, +.92, -.77.

    So, a 17.5% hitter v 17.5% pitcher in a 17.5% league will NOT give you 17.5%!

    I mean, if you want to argue for +.92, +.92, -.84, we can have that discussion.

    Otherwise, you are overfitting.

    Comment by tangotiger — June 14, 2013 @ 4:00 pm

  15. Ooh, good point. I’ll set that constraint and see what happens.

    I’ll give your 2007 suggestion a shot, but it’ll take some doing…

    Comment by Steve Staude. — June 14, 2013 @ 4:23 pm

  16. First off, I really enjoyed these articles. I think you’re right about using projected K%’s instead of historical K%’s. Off hand, my best guess as to why you’d get coefficients of .92 and .92 for the logistic regression instead of 1 and 1 is that the equation is actually serving to regress the observed odds ratios to the “true” odds ratios. Steamer doesn’t (yet) have projected splits but I think we should start working on them.

    Comment by J. Cross — June 14, 2013 @ 4:49 pm

  17. OK, with that constraint, the coefficients go to 0.91,0.94, and -0.85

    The 100+ PA line (over the ’02-’12 set) now has a slope of 0.935 (going through the origin) and R^2 of 0.958, vs. the 0.906 and 0.956 for odds ratio.

    At 1000+ PA, it’s a 0.932 slope and 0.987 R^2, vs. 0.919 and 0.984 for odds ratio.

    So that constraint definitely brings the logistic function a lot more in line with odds ratio.

    Can you think of any way to adjust for the shifting league K%? My thought was that it would all average out, but I don’t know…

    Comment by Steve Staude. — June 14, 2013 @ 5:04 pm

  18. Are you reporting MAE/RMSE on the same sample used to derive the F1/F2 formulas?

    Comment by Colin Wyers — June 14, 2013 @ 5:05 pm

  19. Thanks! Yes, please do — those splits will be awesome for more applications than just this. I think the conclusions here could be much stronger when taking aging (and whatever else you do) into account. What else do you take into account for K%, if that’s not top-secret?

    Comment by Steve Staude. — June 14, 2013 @ 5:17 pm

  20. Not on the 2013 data, but yes on the ’02-’12 (and also for the logistic). I know it’s not the best way to do it, but I did use 1.5 million data points and simple (rounded, even) formulas. I don’t think they’re picking up much random error; systematic error, maybe.

    Comment by Steve Staude. — June 14, 2013 @ 5:30 pm

  21. The easiest way is to limit your data to 2002-07. Is there a reason you can’t do that?

    Comment by Tangotiger — June 14, 2013 @ 6:41 pm

  22. No, I can (eventually). I was just curious about what kind of math would be involved in fixing the problem for this data set.

    Comment by Steve Staude. — June 14, 2013 @ 7:06 pm

  23. I am way late to this party. I just saw it mentioned awhile ago on the weekly SABR notes email. So great work. This is lots of fun

    Comment by Cyril Morong — June 14, 2013 @ 7:12 pm

  24. About a month ago Darvish faced the Astros. I posted the following on my local SABR chapter’s email list

    “He has struck out 40.4% of the batters he has faced. The league
    average is 20.2%. Astro batters have struck out 26.7% of their PAs.”

    Then I Predicted about 49% (which is what I think the Odds Ratio gives here). But he only struck out 25.8%

    Comment by Cyril Morong — June 14, 2013 @ 7:15 pm

  25. You won’t like the answer: Odds Ratio Method!

    Comment by Tangotiger — June 14, 2013 @ 7:32 pm

  26. Interesting. Given they faced Yu 27 times that game, my binomial distribution calculator above thinks there’d be at least a 3% chance of the observed K% being that far off of the mark (looks like 8 of 27 K’ed, so 29.6%?), even if all those numbers are perfect estimates of the “true” rates, and even if the formula is perfect.

    If you use Yu’s (say that 3 times fast) career K% of 29.2%, and the Astros’ 2012-2013 23.5% against RHP, then the formulas all guess around 30-34%.

    Comment by Steve Staude. — June 14, 2013 @ 7:46 pm

  27. Haha, might I then need an odds ratio to adjust the odds ratio that’s adjusting the odds ratio?

    OK, so how might that work? Something like:
    (2002-2006 Odds)*(2007-2011 Odds) / (2002-2011 Odds) ?
    (then converted back into a probability)

    Comment by Steve Staude. — June 14, 2013 @ 7:57 pm

  28. Have you tried this with other components, like walk rate or groundball rate? Also, how do these compare to the venerable old log5 formula?

    Comment by Dan Rosenheck — June 14, 2013 @ 8:17 pm

  29. Do I understand correctly that the formula in this link is the applicable log5 formula? http://www.baseballthinkfactory.org/btf/scholars/levitt/articles/batter_pitcher_matchup.htm

    If so, then it’s actually the same as the odds ratio formula here.

    Haven’t tried BB% and GB% yet (though I have the data ready to go), but that’s the plan.

    Comment by Steve Staude. — June 14, 2013 @ 8:24 pm

  30. Thanks. That gets closer to what actually happened

    Comment by Cyril Morong — June 14, 2013 @ 8:53 pm

  31. “Do I understand correctly that the formula in this link is the applicable log5 formula?”

    I think it is but not totally sure

    Comment by Cyril Morong — June 14, 2013 @ 8:54 pm

  32. log5 is odds ratio

    Comment by Tangotiger — June 14, 2013 @ 11:42 pm

  33. OK, thanks guys.

    Comment by Steve Staude. — June 15, 2013 @ 3:16 am

  34. I’m not sure I’d go with an odds ratio approach. If it were me, I’d explore using a skill comparison similar to ELO that predicts winning%. Then I’d use a modifier to both the pitcher and the batters’ skills based on handedness, home park, etc. Then do ELO or whatever skill comparison system you want to predict outcome %.

    Comment by Nivra — June 15, 2013 @ 6:20 pm

  35. oh…. this is the project you were referencing in the reliability thread a few weeks back…. interesting.

    A couple of brief points: first, I’m not really sure why you are including the league wide strikeout rate here. Well, I understand why, but I think you’re wrong to do it, as the league-wide strikeout ratio really shouldn’t have anything to do with your prediction for an individual at bat, unless you are trying to bias yourself in the correct direction. Basically, including it is a version of the gamblers fallacy, in a way. But that’s a minor point.

    More importantly (and nice work again, by the way), I would like to persuade you, in the future (and present if possible), to use positive predictive power, hit rate and negative predictive power instead of mean averaged error in reporting your findings. I, and presumably other informed readers, would not find it trivial to know in what direction each formula made its errors. To be honest, information like that is usually how I determine the relative utility of predictive models. I also have a hunch that if you break down the distributions of the errors a bit more, you may find what is “wrong” with the odds ratio method.

    Comment by Kevin — June 17, 2013 @ 1:05 am

  36. Would that involve starting everybody out with an equal rating, and raising their rating based on the ratings of who they beat (and dropping based on who they lose to)? If so, sounds good but kind of tough (at least for me).

    Comment by Steve Staude. — June 17, 2013 @ 4:45 am

  37. Heh, I can’t win on the league rates thing — I leave it out, and MGL and Tango criticize that…

    Anyway, Formulas 1 & 2 don’t directly include league K%, as do Odds Ratio and the Logistic function, but they indirectly include approximately 2002-2012′s league K% through their weighting. It’s important, I think, in light of the fact that the historical K% of each player (which I’m using as an approximation of their “true” rate) has a lot to do with who they faced in the past — and league K% is a proxy for that.

    That reminds me of: http://en.wikipedia.org/wiki/False_positive_paradox by the way.

    Positive and negative predictive power — how could I apply those here? In what I’m seeing, it looks like they’re used for yes or no predictions. I’m not making any yes or no predictions on strikeouts — only giving a probability.

    Not sure what you mean about the direction of the errors — I thought the slopes of the regression lines on the scatterplots, along with the average errors (as opposed to the mean absolute errors) were indicators of that.

    Comment by Steve Staude. — June 17, 2013 @ 5:12 am

  38. I would think this is more an example of the ecological fallacy than the false positive paradox. The false positive paradox would be relevant if you were trying to predict what the strikeout rate would be for a group of individuals. But you are not. You are trying to predict the outcome of a single PA, so including league averages is an example of the ecological fallacy, IMO at least. I think reasonable minds may differ on this point, but for me at least, this is a clear example of confusing the correlations of individuals and groups.

    Regarding PPP/NPP, it might not be something you can use here explicitly, but as you have described it thus far, your goal is to create essentially a probability matrix of outcomes for each at bat. Which means you will have a probable outcome (along with other less probable outcomes) for each PA. Although we man never have a K% above maybe 30% for a given at bat, there are likely to be situations in which a K is the most probable outcome. Also, since there are a finite number of outcomes to a PA, computing PPP/NPP will be possible (its not strictly binary). Regardless, the basic idea here is that by comparing the amount of error variance above and below the regression lines to the total amount of variance above and below the lines (which is what PPP/NPP does, in a way), I think you can essentially accomplish what you are trying to do with the league k% without explicitly adding it into your formula. Or something like that, at least. I wanted to see it in table form because… well… I like tables better than graphs. In any case, the reason I think this matters is that in this experiment, since strikeouts are the low probability event, I would give more weight to the formula that over-predicts strikeouts than the formula which under-predicts them.

    BTW, is this moving towards a measure of managerial effectiveness (manager WAR if you will)?

    Comment by Kevin — June 18, 2013 @ 12:18 am

  39. Well, I think the example Tango trapped me into earlier is a good point in favor of the need for a league average:
    If you say the league K% is 20%, and you’re talking about a batter who Ks 20% of the time, and a pitcher who Ks 20% of batters, you’d probably expect that pitcher to strike out that batter about 20% of the time, right? That’s what all the formulas say, more or less. How about a batter and pitcher who both have a K% of 80? With the league K% of 20, all the formulas say something in the 90% range will be the outcome. But if you adjust the league K% up to 80%, then odds ratio instead says the batter will K 80% of the time against that pitcher.

    The trick is, the chance of not striking out is 1 minus the chance of striking out; that means, if the formula tells you that a strikeout will happen 20% of the time in a certain circumstance, then it ought to also tell you that it won’t happen 80% of the time in that same circumstance. It seems to me that taking the overall average into account is the only way you can do that.

    I’m still not getting it regarding PPP/NPP — everything I’m seeing shows it dealing with True Positives, False Positives, etc. I don’t think, even if a K% is the likeliest outcome of a particular PA, that there could be enough confidence in that assessment to label it a “positive.”

    I noticed that the following form of the PPP formula is actually similar to the odds ratio method: (sensitivity x prevalence) / (sensitivity x prevalence + (1-sensitivity) x (1-prevalence)), as prevalence is analogous to league average, and both factor in the (1-…) of the terms. Both are based off of Bayes’ Theorem, I bet.

    Hm, not a bad idea regarding managerial effectiveness. Not easy, though — there are a lot of variables to consider, some of which we are not always privy to (e.g. health status of bench players or relievers).

    Comment by Steve Staude. — June 18, 2013 @ 5:32 am

  40. Have you taken a stab at other stats yet?

    Comment by Steve — January 17, 2014 @ 2:24 pm

  41. Hi there, my fellow Steve. Thanks for asking. You know, I was working on batted ball stats, when I got distracted by my THT Annual project, which dealt with a more in-depth look at this sort of stuff (specifically, whether some hitters’ K% were legitimately better or worse than expected by the odds ratio against certain types of pitchers). There were a couple other big projects around then which haven’t been released yet, btw.

    Point is, it’s definitely on my to-do list, though I’ve been busy lately with other stuff.

    Comment by Steve Staude. — January 17, 2014 @ 3:30 pm

  42. Steve, I’m actually also looking at batted ball stats. My goal is to quantify why GB(FB) pitchers have an edge vs. GB(FB) hitters. So, I’m very interested in your work.

    I really enjoyed your work in THT Annual by the way.

    Comment by Steve — March 24, 2014 @ 10:36 pm

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Close this window.

0.173 Powered by WordPress