First Basemen “Scoops” – The Value of Handling Errant Throws at First Base
Most of you are familiar with Ultimate Zone Rating (UZR), a metric for evaluating fielding using detailed play-by-play data. For first basemen, like all other fielders, it measures the number of ground balls that are fielded and turned into outs as compared to an average-fielding first basemen, given the same number, type, and location of ground balls, as well as the same number of outs, the base runner configuration, and batter handedness (plus an adjustment for the parks and the ground/fly tendency of the pitchers). The difference between these two values is a fielder’s UZR. It is usually expressed as a number of runs saved or cost compared to an average player at that position over a specified period of time (usually defensive games). You will often see it as a rate stat, generally per 150 games.
What UZR does not measure for first basemen, because the requisite data is not readily available, are the theoretical runs saved or cost by virtue of a first basemen’s skill at successfully catching errant throws or throws in the dirt. For lack of a better word, I call these “scoops,” even though they necessarily include poor throws (e.g., high or off-line) that are not in the dirt.
Fortunately, there is a way to estimate this skill, using a relatively simple method “invented” by Tom Tango, called “without and with you”, or WOWY. The WOWY methodology was explained by Tango in an excellent article in the 2008 Hardball Times as well as in various online forums, including his (and my) blog, www.insidethebook.com.
Basically, it goes like this: Figure out what happens when a particular player is on the field (generally something that explicitly involves that player, or at least is affected by that player, such as the number of ground balls a particular SS – say Derek Jeter – fields and turns into outs). Then figure out the same thing when that player is not on the field, but all other relevant variables (in this example, mostly the pitcher, park, and opponent batters) remain constant. The difference between the two rates (per whatever you want) should reflect the difference in skill, or at least in performance (skill is usually performance regressed toward some mean, the amount of regression being a function of the sample size of the performance) of whatever you are measuring, between the player in question and the average player when the player in question is not on the field (but in the data set).
In other words, to use the Jeter example, if Derek fields 4 balls per 9 innings and all other SS (let’s say that our sample of “other SS” is large and unbiased enough to consider them league average) field 4.5 balls per 9 innings, given the exact same pool of pitchers, batters, and parks, then we can safely say (more or less – within the bounds of sample error) that Jeter is .5 balls per 9 innings worse than the average SS. As I said, it is simple and brilliant. The results are extremely telling if we can get large enough samples of Jeter and lots of other SS that are “matched” with Jeter based upon parks, pitchers, batters, etc.
The methodology is a little more complicated than that, but hopefully you get the general idea. Anyway, the same thing can be done with first basemen in order to estimate their “scooping” performance and ability. What I did was look at every ground ball that was thrown from each infielder to each first baseman. I put that ground ball into one of two buckets: One, when there was no error on the throw, or two, when there was an error on the throw. The assumption is that when a throwing error is made, there is an errant throw (no duh) and the first baseman is not able to somehow coax that bad throw into an out by scooping it out of the dirt, jumping in the air, catching it while off the bag and still making the play, etc. Obviously most of the time when a throwing error occurs, there is nothing that any first baseman can do about it. However, in the long run, we can assume that a certain fixed percentage of bad throws from each infield position will always result in an error while another fixed percentage of those bad throws have a chance to be “saved” by a skilled (or tall) first baseman.
We can also assume that when an error is not made, that sometimes a bad throw occurs and the first baseman “saves the day.” Again, most outs and non-errors occur on easily catchable throws, but a certain fixed percentage of them will occur on bad throws that are skillfully and successfully handled by the first baseman.
Given these assumptions (which are true, by the way), if we look at all throws by a particular player at a particular position to a certain first baseman and then compare the results (error or no error) to when throws are made from the same infielder at the same position to all other first basemen, the difference can be attributed to the “scooping skill” of that particular first baseman.
An example:
Say that all infielders threw to Todd Helton 1000 times in 2007. And say that those exact same infielders threw to some other first baseman (and we are going to assume that all of these “other” first basemen are average, collectively) around 1000 times also (it does not really matter how many throws went to these other first basemen, although we would like it to be a lot). Now let’s say that 20 throwing errors were made on those throws to Helton and only 15 were made to all the other first basemen. We can safely say (with some uncertainty of course, due to sample error) that Helton was 5 plays per 1000 throws (about a full season actually) better than the other pool of first basemen, who we are assuming are average. So Todd is 5 plays, or around 4 runs, per season, above average at “scoops.”
In actuality what I did was to match up every player at every infield position (including pitcher and catcher) with every first basemen, and then for each, specific first baseman, I did a “with and without” and prorated or weighted that difference by the minimum of these two numbers – the throws made to the first baseman in question and the throws made to all other first basemen – by the same fielder.
For example, let’s say that over the sample time period, Edgar Renteria threw 300 balls to Albert Pujols (by the way, I used all data from 2000 to 2008) and he made 6 throwing errors. Now let’s say the Renteria also threw 800 balls to other first basemen (on the Cardinals or any other team he played for) and made20 errors. The difference in error rate is .5 errors per 100 throws in favor of Albert. Thus we give him credit so far for .5 errors per 100, weighted by 300 throws (the lesser of 300 and 500). We do that for every fielder who threw to Albert at least 20 times and 20 times to other first basemen, and then we add everything up to get a weighted average. This weighted average represents a first baseman’s “scooping” skill as compared to an average first baseman, or at least as compared to the average first baseman in that player’s “matched pool” of first basemen.
That last part of the last sentence is the primary flaw in the methodology. Since I am only comparing each first basemen to all other first baseman who had the same fielders throwing to them, it is possible, and in some cases, inevitable, for one first baseman to be compared to a pool of primarily good or bad first basemen, since each one will tend to be compared with the others on his own team. For example, Helton will tend to be compared with whomever else manned first base for the Rockies when he was not on the field. If those few backup first basemen happened to be particularly bad at “scooping” balls, then Helton will be overrated by this methodology. In fact, because of this flaw, and because regular first basemen tend to be better at “scooping” than backup first basemen, any regular will tend to be overrated (because they are often compared to a backup) and any backup will tend to be underrated (because they are often compared to a regular). Keep in mind, for example, that because some of Helton’s infielder teammates played on other teams, he is not always going to be compared to another Rockies’ first baseman. Anyway, for now, it looks like we are going to have to live with this flaw or weakness in the system. The correct way to handle the data, of course, is to adjust each first baseman’s rating by that of his “others” by doing an iterative process. In the future I may do this, but for now, we are stuck with version 1.0.
Here are the results:
Again, I used play-by-play data from 2000 to 2008.
Interestingly, and not unexpectedly, there were significant across-the-board differences in the scooping ability of the average tall and short, lefty and righty, and regular and backup first baseman.
Tall (>6’1”) RH: .6 runs per 1000 matched throws, or around a full season.
Tall LH: 1.2 runs
Short RH: -.8 runs
Short LH: .6 runs
Less than 300 matched throws: -1.5 runs
300 to 1000 throws: .4 runs
More than 1000 throws: .6 runs
Players with at least 1000 matched throws total where the minimum of the two pair for each fielder is added to the total:
Best per 1000 throws
Berkman +4 runs, 2928 matched throws
Choi +4, 1361
Conine +3, 3100
Connor Jackson +3, 1790
Loney +3, 1679
Mientkewicz +3, 4344
D Ward +3, 1119
Olerud +2.5, 4481
Sexson +2.5, 6471
Tony Clark +2.5, 2880
Dan Johnson +2.5, 1774
Overbay +2.5, 4834
Worst
Karros -4, 2986
Mo Vaughn -4, 1644
Galaraga -3, 1842
Stairs -3, 1122
Bagwell -2.5, 4931
Casey -2.5, 6444
Julio Franco -2.5, 2010
Swisher -2.5, 1134
Thome -2.5, 4476
When we regress these numbers in order to turn them into “true talent” scooping ability, we get slightly smaller values, depending on the number of matched throws (sample size) of course. In fact, it appears that the spread in talent between the best and worst “scoopers” at first base is on the order of 2-3 runs, plus or minus (a 4-6 run spread). So before you start opining about how your favorite first baseman is so great defensively because he “saves so many errors,” consider that scooping ability is probably worth less than a ¼ of total defensive ability or value at first base. Fielding grounders is at least 75% of the package and “scooping” is the rest. But every little bit helps.
Print This Post

MGL – What conversion rate did you use for converting plus minus plays made to plus minus runs?
You’ve got an error in your example: “Now let’s say that 20 throwing errors were made on those throws to Helton and only 15 were made to all the other first basemen. We can safely say (with some uncertainty of course, due to sample error) that Helton was 5 plays per 1000 throws (about a full season actually) better than the other pool of first basemen, who we are assuming are average.”
If there were 20 errors on throws to Helton and only 15 errors on throws to other 1B, then Helton is below-average at scooping, not better than the other players.
MGL – Also, what database did you use to determine who was the intended receiver on error throws? One of the very few details of a play that Retrosheet lacks is what base was being thrown to when a throwing error was made.
This is helpful, MGL. The leaders and the trailers also easily pass the smell test, which should make this information more readily accepted.
You might try renaming it “with and without you” for the puerile reference to the U2 song. Keep the masses happy with marketing…
There’s a good Beatles fan for ya, Mr. Mike Green. It’s “With Or Without You” but I’ll forgive him because I know he has Beatles on the brain.
Do you mean “Within You Without You” :)
Mike Green was making a U2 reference, as he said!
A further query: catching errors seem to be omitted here, but catching-errors-on-a-throw are not included in UZR, are they?
I realise that not all catching errors are on bad throws, but a significant number of them are: the catching/throwing error decision lies at least partially in the scorer’s whim, which would be nice to take out.
Probably could also run into the mind set of the scorer being biased because the first basemen has a good reputation as a fielder.
Is this a fair grading system, when two players who are being judged are primarily being judged against one another? D. Johnson and N. Swisher for instance spent most of their scooping opportunities while on the A’s sharing mostly the same infielders exclusively for them on the A’s. If one player is a above average scooper, he would negatively effect the other’s scooping ability to unfair degree.
Nick – MGL spends a fair amount of time discussing this shortcoming of the WOWY method in the article.
It is called With Or Without You.
For V1.0 this is a hell of a lot better than what i could come up with. Would it be possible to included the all star games and the WBC to spice up the data any?
I simply MUST know how JT Snow measures on this…
Seconded. It was like the #1 excuse for why the Giants should have kept Snow for as long as they did. “He scoops throws so well!”
Click on my name, and look at post #2 for career leaders.
Oops… looks like Rally’s site is caput at this moment.
Yes, UZR includes all errors, including receiving throws. Errors in UZR are treated separately from “range” and then everything is combined at the end. A lot of the PBP or semi-PBP systems treat errors the same as a missed ball. They are not! Scorer judgment and bias aside, an error is much worse than a missed ball. A missed ball is a ball that may or may not have been playable, even though the UZR program thinks it should have been caught some percentage of the time. An error is a ball which, according to the scorer at least, should have been caught 98% (or whatever the average fielding % at that poition is) of the time. If we treat an error as a missed ball, it might be a ball that, according to ther various parameters and buckets, UZR thinks is fielded 20% or 50% or 70% of the time (or whatever).
Anyway, the sata set I used was STATS which includes who receives every ball.
And yes, Nick, as Peter indicated, I explained that you are right in that each player is actually compared to whoever happened to receive a lot of throws from the same fielders, which tends to be their teammates. I suspect that even when I adjust for that, the numbers won’t change that much. If it is not too much trouble, I’ll do that soon.
This was a quickie synopsis of some work I was doing the other night. Hope that the results were helpful. I think that the illuminating part was the magnitude of the difference between the best and worst, which this kind of methodology should pick up (and keep in mind that sample results always, or at least usually, pick up a larger spread than than the “true” spread, just as if you looked at the difference between the best and worst BA in any one or even two-year period).
Some time ago, people were guessing that “scooping” was worth a few runs a year, and that seems to be about right. Maybe a run or two more since some scoops save hits rather than errors.
Thanks for the comments. I always like to “vett” these methodologies to get ideas for making them better.
I can post the entire results on Google Docs if anyone would like.
Anyway, the sata set I used was STATS which includes who receives every ball.
Thanks for this info. It is very cool that STATS does this. I wish that that information could become a part of Retrosheet.
It still would be interesting to me to see how you go from plus minus plays made to run values. An example would be great. Thanks.
If I have understood the process correctly, there may be a underestimation of the value of scooping. On a ground ball in the hole or down the third base line followed by a throw in the dirt, but on line and barely in time, the most common scoring results are “out” if the first baseman makes the scoop and “infield single” if he does not. I understand that these plays are effectively not counted because of the absence of “error”, as a likely scoring possibility.
I would guess also that the height advantage might be lessened if these plays were taken into account.
A possibility already specifically acknowledged in MGL’s reply above.
Yup, as Peter said, I think I mentioned that one can assume that the scoop values are a little low due to some bad throws that are scored as hits and not errors. I don’t think it will make much of a difference though. Maybe another run or so for the best or worst.
Peter, an error is worth around .5 runs and an out around .27. I used .8 runs per “difference between an out and an error,” the same as you would use for a regular defensive system that tracked missed balls versus outs. I suppose that a throw to first suggests that no other runners on on base unless there are 2 outs, so maybe the value of an error on a bad throw to first is closer to .4 runs. So maybe .7 runs is more accurate. No big deal though.
And whoever mentioned that my Helton example was backwards, you are correct of course. Thanks for pointing it out.
I suppose that a throw to first suggests that no other runners on on base unless there are 2 outs,
I don’t know why you would suppose that. Infielders hold runners on 2nd and/or 3d all the time and throw to first with less than 2 outs. They also throw to first when runners are in motion and there is no chance for a play elsewhere. But you are correct that the run value for an error on the throw is closer to .7. Not for the reason you gave, but because an error on the throw can either put the batter on or not and can also either advance the batter beyond first or not. The run values for these four buckets vary from .35 to .75 and average .445. That added to the average run value of an infield out ground ball of .258 gives .703. As you said, no big deal, your conclusions are still the same. However, dividing into 4 buckets might result in a slightly greater spread between the very worst and very best first basemen as the very best may be able to prevent the batter and other runners from advancing more often than the very worst first basemen.
I am new to all this so I hope I am not asking too basic a question, but doesn’t STATS keep a record of actual scoops? Couldn’t the actual scoops be used in place of this formula?
First of all, why is everyone saying this measures scooping ability, when it really just measures % of outs made vs. opportunities. Richie Sexson rates well based strictly on his reach. The fact that his fielding percentage is lower than most would tell you that he’s probably worse at scooping low throws as well. These stats are interesting, but mostly it tells us two things we already knew: 1) lefties have an advantage at 1b because their “reaching” foot is naturally closer to the bag and glove hand is naturally closer to the rest of the field, while righties have to pivot 180 degrees to get in proper stretch position, and 2) tall guys make an easier target. Assuming Sexson and Snow both have arm lengths in proportion to their height, Sexson’s reach gives him a target radius somewhere between 6″ and 1′ longer than Snow’s, which would give him a huge advantage in the number of square feet he can cover while keeping his foot on the bag. I don’t know if anyone actually keeps a stat of throws in the dirt that are actually scooped by a 1B, but I would bet that the percentage of successful scoops more closely mirrors the players fielding percentages.
One more thing in defense of JT Snow: It was his overall defense, not just his scooping ability, that made him valuable. He was a great fielder, and a great thrower. A lot of 1Bs wouldn’t even try to start a double play, or throw home on a close play, but Snow excelled at throwing players out at the other bases. Of all the 1Bs I’ve ever seen, most are either just a little better or a little worse than average. Snow was the only one that ever really stood out to me. And I’m not a huge Snow fan. I would have traded him in a second for an average fielding 1B who’d hit 30-40 HRs a year. But he was a great fielder.
Peter, as I said, a throw to first with less than 2 outs SUGGESTS that there are no other runners on base. Unless I am on crack or completely lost my mind, I certainly did not say or think that a throw to first with less than 2 outs ALWAYS means that there are no other runners on base.
If I told you that “there was a throw to a base” or I told you that “there was a throw to first base” (with less than two outs), which bucket would you think had more men on base, on the average?
That being said, if it is closer to .7 runs per “swing” (error or out), that’s fine by me. I don’t think it will make any (other than de minimus) difference at all, keeping track of what “bucket” the error is in, to be honest.
Ben, from the article:
“However, in the long run, we can assume that a certain fixed percentage of bad throws from each infield position will always result in an error while another fixed percentage of those bad throws have a chance to be “saved” by a skilled (OR TALL) first baseman.”
“What UZR does not measure for first basemen, because the requisite data is not readily available, are the theoretical runs saved or cost by virtue of a first baseman’s skill at successfully catching ERRANT THROWS OR THROWS IN THE DIRT. FOR LACK OF A BETTER WORD, I CALL THESE “SCOOPS”, EVEN THOUGH THEY NECESSARILY INCLUDE POOR THROWS THAT ARE NOT IN THE DIRT.”
You said:
“These stats are interesting, but mostly it tells us two things we already knew…”
You could have saved me a few hours if you told us exactly how much this information “that we already knew” was worth. 1 run a year? 10 runs?
You also could have saved me some time if you told me which players, independent of their height and handedness, were particularly good or bad at saving errors. There are tall lefties in the data who were not good and short righties who were good. But apparently you knew that already.
You are right in that a few of the things that UZR or “scoops” does not capture is a first baseman’s skill at starting the DP (although I do have a metric which looks at that, which is debuting on Fangraphs soon – “turning the DP by infielders”) and throwing to other bases, like home plate or perhaps to second or third on a bunt. But you probably know that too. Can you share with us all of the players who are good or bad at that (other than J.T. Snow) and exactly how many runs that kind of skill is worth?
;)
Ben Marhsall made a comment of JT Snow excelling at throwing runners out at other bases. Are there stats that show a percentages of outs/safe on Fielders choices? Are some 1B (or IF) more aggressive in trying to start a Double Play or getting the lead runner? How successful are they? How many runs saved or cost based on thier decisions?