The Next Step

Let me start theoretical. I wonder why prospect lists run in order of career potential. In my view, prospects are valuable because they provide Major League Baseball’s best bargain. Find a player ready to contribute from Year One to Year Seven, and the return on investment is ridiculous. In three seasons, Tim Lincecum has been worth roughly $84 million to the Giants. If you didn’t know, he has not been paid that much. However, in a few short years, Lincecum will enter free agency, and he will no longer be a bargain. Teams will bid for his services, and he will be paid appropriately by what the market determines.

In my eyes, prospect lists should attempt to determine a ranking based on what value players will provide when they are under organizational control (first six to seven years). If we follow prospects because they are a bargain, we should only care about their performance when they represent a bargain. Right? Consider yesterday’s posterboy, Garry Templeton, who in a retro prospect list, probably wouldn’t rank very highly. But why not? Templeton was well above the average shortstop with the Cardinals, and was the centerpiece of a trade that netted the Cardinals Ozzie Smith. Templeton provided insane value to the Cardinals.

In fact, in their first seven seasons, Garry Templeton produced 20.5 WAR. Ozzie Smith, who peaked in Years 7-12 of his career, produced just 17.7 WAR in his first seven years. Now, readers, I ask you: why would Smith be considered the better prospect in hindsight? Particularly in today’s environment, when loyalty doesn’t exist with free agents.

************

As I’ve transitioned back into covering minor league baseball, I have begun to see the direction I want my analysis to take — it’s both outlined above, and it exists in the FanGraphs defining stat: WAR. I want to attempt to see prospects in the light that the organizations might: who is overvalued relative to the likely contributions they’ll provide and thus make a nice trade chip, and who should teams be making way for? What value might a prospect provide our team? Eric Hosmer and Pedro Alvarez are right next to each other in Keith Law’s rankings; if each is the player scouts think they could become, what does that look like in terms of WAR (an article for another day?)

This is long-winded, as I so often am, but I’m trying to create a dialogue about what a sabermetric approach to covering prospects can be. And I want your help! It’s no longer about ignoring scouting reports and restricting yourself to MLEs (was it ever?), but about finding the proper routes to evaluating players more accurately — based on development (like yesterday), based on nuance (the sinker series), and based on modern statistical analysis.

Today, I’m going to take a stab at the latter. After the jump, we’ll walk through creating a set of expectations on what the Cubs should anticipate from Starlin Castro (sorry, he’s on the brain).

Note: This will be an assumption that Castro is an everyday player for the Chicago Cubs. Prospect analysis is about balancing the potential for stardom with the potential for bust in your analysis. This is purposely avoiding the bust/bench player route than Castro — and nearly every prospect — could potentially reach.

Starlin Castro, the everyday player, hits in one of three spots in a Major League lineup: second, seventh or eighth. Modern Major League Baseball leaves the possibility of Castro leading off as a longshot, given walk rates that aren’t acceptable. So, I began by averaging together the plate appearances those spots in the lineup had for the Cubs in 2009: 685. Then, I decided to give Castro seven off days for the season. This will be an analysis based on Castro’s production given 655 plate appearances.

Next, you want to create a set of raw counting statistics that will add up to 655 PAs. This means quickly deciding on three different Castro skillsets:

Contact: Castro is fantastic in this regard, and whiffed in just about 10.5% of his plate appearances last season. Given that the number that first came to my head was 75 strikeouts, and that it represents a small increase due to tougher pitchers, I think it’s fair.

Patience: Alomar was one of the only from yesterday’s study to embrace the base on balls, and the Cubs don’t preach patience enough for me to believe Castro will walk much — certainly not until he’s a little older. I have him bookmarked for 40 walks, two of which are intentional.

Power: This is the most important, and with Castro, toughest to project. Jim Callis yesterday suggested he could hit 20 home runs. I don’t see it. If it does happen, like Jose Lopez, it will almost surely be later in his career. If we’re thinking about some six-year average, I’ll use 10 home runs. I’ve added 33 doubles and 7 triples. From there, it’s filling in the blanks.

75 K. 40 BB (2 IBB). 33 2B. 7 3B. 10 HR. 5 HBP. 8 SH. 5 SF. 1 CI. 3 RBOE.

Adding this up, we can say this represents 596 at-bats. I then assigned Castro a .310 BABIP (Jose Reyes‘ career average), and did a little algebra, which determined that this meant he clubbed 170 hits in this mock season, 120 going for singles. I’m projecting Castro at .285/.333/.414, and I admit, this feels right.

Next, I calculated wOBA, using this handy guide. Plug in the numbers, and out comes a .327 wOBA. Rather than take the long route to turn this to wRAA — and so I wouldn’t have to guess league OBP and create a scale — I did something easier. If Reed Johnson posted a 0.2 wRAA in 186 plate appearances, with the same wOBA, then changing the plate appearances to 655 puts Castro’s projected wRAA at 0.7. As you know, our RAR formula here is a simple addition of four numbers: wRAA, UZR, and adjustments for replacement and positional. We can do the replacement pretty simply: 655 plate appearances works out to 22.8 runs above using our formula.

Halfway home, and I currently have Castro’s RAR at 23.5. But, as reader Rob G. noted in the comments yesterday: “Castro’s defense at an important defensive position is much of the reason behind the hype.” Indeed. This was why I restricted myself to middle infielders in yesterday’s comparisons, but if we are set to calculate his WAR, we need a decent guess at UZR. So, here I turn to Kevin Goldstein, who wrote two poignant comments concerning Castro when he ranked the shortstop second in the Cubs system. The Good: “His defensive fundamentals are outstanding for both his level and his age, with smooth actions, soft hands, a quick transfer, and a plus arm.” And, the bad: “Several scouts noted below-average running times to first base, and his range is affected by it, possibly leading to a move to second base down the road.”

This made me think of Elvis Andrus, one of our comparisons from yesterday. What did Goldstein say about his defense, which stood +10.7 as a rookie? “He has outstanding shortstop action, a plus arm, and exceptional range,” Goldstein wrote a year ago. So, Castro and Andrus share the outstanding “actions,” and the plus arm, while Castro comes out ahead in hands, and Andrus definitely in the range column. Now, I’d be cynical to note that Andrus’ plus defensive season entirely consisted of his RngR, but it’s true. I would say that Castro sounds like a zero to +5 defender at shortstop, and a +5 to +10 defender at second, to lean conservatively and stress the importance of range in Ultimate Zone Rating.

I do not think we would be doing Castro a disservice by saying that his positional adjustment and defensive rating were the equivalent of 10 runs above replacement. Consider that Andrus’ position adjustment was only 6.4 in 145 games, so if Castro sticks at shortstop and has a similar adjustment, I’m calling him a true +3.6 defender. That is a compliment.

If you’re still with me (God bless you), then it’s time to add everything up: 33.5 Runs Above Replacement. I’ll leave it to you to determine if this is a conservative estimate or an aggressive one, but I think it’s a nice median expectation. As a Cubs fan, I think this would be great news if it were sustainable — if Castro was a 3 WAR player always. But I’m guessing that is one sentiment that would not be greeted kindly at Clark and Addison.




Print This Post





18 Responses to “The Next Step”

You can follow any responses to this entry through the RSS 2.0 feed.
  1. Mike Green says:

    It isn’t really a median projection. Castro was 19 years old, and had only 111 at-bats in double A. There is probably a 30% chance that he gets no more than a cup of coffee in the major leagues.

    What I would suggest is that you try to get a weighted average using something like this:

    0-1 WAR 30%
    1-2 WAR 20%
    2-3 WAR 25%
    3-4 WAR 15%
    4-5 WAR 5%
    5-6 WAR 4%
    Utley 1%

    Vote -1 Vote +1

    • Bryan Smith says:

      From the article:

      Note: This will be an assumption that Castro is an everyday player for the Chicago Cubs. Prospect analysis is about balancing the potential for stardom with the potential for bust in your analysis. This is purposely avoiding the bust/bench player route than Castro — and nearly every prospect — could potentially reach.

      ************

      So, we agree. I’m saying it is a median projection given that he’s given time as an everyday player.

      Vote -1 Vote +1

      • Mike Green says:

        Fair enough. I still think that what you want is a weighted average. The reason being that some players have a greater chance of being regular players than others, and if you use “median performance as a regular”, you’re not getting a true picture of value (which is what WAR tries to measure). This is particularly so for pitchers at different stages of development.

        Vote -1 Vote +1

      • Bryan Smith says:

        Oh, no question you want the weighted average. But a “bust percentage” is a really tough and arbitrary thing to come up with. I’d love ideas on how it can be achieved, but it’s a tough thing to throw into analysis.

        Vote -1 Vote +1

  2. Tom says:

    I think you’re overlooking something in this article — value produced years 1-6 depend an awful lot on when the player is called up. You can’t predict when a team will call up the player and start reaping the rewards of their bargain. Clearly the Braves would get more value out of Jason Hayward if they waited until he was closer to his peak. All you can do is say which prospects you think will have the best peaks… it’s up to the organizaiton to decide if they want to wait for them to develop fully.

    Vote -1 Vote +1

    • Bryan Smith says:

      But, the teams know when they are going to call a player up, right?

      And by that decision, they are determining the value they get from their player. If Heyward is their best option this year, but he’s not really ready, they have a decision to make. They can either play him this year, at $320,000, rather than play a guy like Matt Diaz, or pay someone like Johnny Damon or Jermaine Dye. But by doing so, they are recognizing that he will be a free agent a year earlier — 27 instead of 28.

      They are trading Jason Heyward 2010-2016 for Heyward 2011-2017, and they are doing it consciously. Is 2010 at 320K more valuable than 2017 at whatever price his last arbitration payment is ($12M?).

      You say “all we can do is say which prospects you think will have the best peaks.” Why? Why can’t I forecast what a player’s development schedule likely is — given the team he plays for, their needs, etc. — and determine what that makes his likely performance at under team control.

      Isn’t that more useful? Am i crazy?

      Vote -1 Vote +1

      • 198d says:

        “But, the teams know when they are going to call a player up, right?”

        Is that always true, though? As a Jays fan, I know that the plethora of pitching that was used last season (with a handful of kids jumping directly from AA) was not intentional. Injuries can really throw a wrench in the “prospect development” plan…

        Granted, I will concede that in typical cases, injury call-ups tend to be marginal/middling/aging “prospects” already on the 40-man and not the potential studs, but I’m not convinced that all call-ups are always pre-planned.

        Vote -1 Vote +1

    • The A Team says:

      The tricky part about that is that we need to consider the organization’s revenue curve too. A team that is competitive now should be incentivized to bring up their prospects now whenever they represent a marginal gain, while a team that is set to lose often should be playing games with their player’s service clocks (Orioles) in order to get the largest cost controlled peak.

      Vote -1 Vote +1

  3. The A Team says:

    This is something I have been mentally adjusting for for quite some time now. I think you need to factor in a lot more risk into your general Starlin Castro model. Perhaps I’m guilty of laziness, but I generally derive my risk factors from scout reports and average injury rates (and like I said, I mentally adjust to make up for my mediocre math skills, my results are certainly very biased). Would generating several baskets of skill sets and finding average collapse/attrition/improve/breakout rates be useful here? You could then try to stuff prospects into each basket and get some ‘close enough’ estimates. Or am I just suggesting PECOTA for prospect value?

    I wish you luck in formalizing a methodology and will happily jump on the bandwagon if it’s up to snuff.

    Vote -1 Vote +1

    • drchstrpunk says:

      Baseball America did something like this on their Top 100 list last year, where next to many prospects they stated the biggest risk and a comparison for a guy who was a bust.

      Vote -1 Vote +1

    • Bryan Smith says:

      This is interesting, and sounds like people want me to work on it. Can we form a number — based on something prudent — that accurately represents a “bust percentage” or as A Team called it, “risk factor”. I don’t know if there’s anything there, or really how to approach it, but I’ll see what smarter people than I think and get back to you.

      Vote -1 Vote +1

      • 198d says:

        Isn’t there a somewhat standard reference by one of our SABR overlords which studied value vs. draft slot for prospects? One might assume that the research done here, especially regarding “risk factor,” may already be completed. (or, at the very least, this may be a good starting point) Apologies that I can’t recall the article, but I’m certain someone here would know…

        Vote -1 Vote +1

      • Bryan Smith says:

        198d: There have been a couple, but I think you are probably talking about Victor Wang’s great work at Hardball Times. While he did some great heavy-lifting, I’m not sure it has much utility in regards to individual prospects.

        It might have some value with players that were just drafted, but I’d hate to lump everybody in the same group. Some guys drop due to reasons outside of their control, and the draft is never in order of talent. It’s fine to do a large study — it’s going to work out with a big sample size — but to evaluate at the micro level with that work would be irresponsible.

        Vote -1 Vote +1

  4. Nny says:

    I’ve been doing something similar with Marlins’ prospects. It really helps put things more in perspective from a positional point, and the ol’ adage of 2b/3b being underrated and 1b’s being overrated very very much carries over (i.e. Even though there is a large gap in offense, Matt Dominguez and Logan Morrison have around the same median and ceiling WAR).

    I haven’t thought about what you thought about though, in regards to maximizing a players value to a team by when you call him up. I’d assume the major change would be in older prospects with same median/ceiling WAR would be valued more than younger prospects, since you’d be controlling the older prospect’s prime years.

    Even if a younger, high ceiling prospect has the same “cost controlled median WAR” as an older, lower ceiling prospect, you still go with the younger one because he might go Miguel Cabrera/Hanley Ramirez. And in cases of something like keeping down Heyward, there’s the possibility that that then stalls his development.

    Vote -1 Vote +1

  5. statzombie says:

    I need to try to understand this. You say you are “creating a set of expectations on what the Cubs should anticipate from Starlin Castro,” but when? Is this what the Cubs should expect from Castro in 2010 if we were starting for them, or in 2013?

    Regardless of our difference of opinion on Castro’s capabilities for this year, I do not think I understand this direction of this article. Want to branch out from MLEs and use scouting reports? And modern statistical analysis? And then you proceed to throw random numbers based essentially on your opinion and plug them into some formulas (which I can do with any forecasting system). As for scouting reports, you take a couple of comments and come up with unscientific numbers. This is not modern statistical analysis.

    Want to use expect knowledge (scouts) and statistics? There’s a branch of modern statistics literally used for exactly this: Bayesian statistics. If you’re not familiar, their popularity is fairly recent, and you need a computer to actually do interesting Bayesian analysis (and not Excel).

    Vote -1 Vote +1

    • Bryan Smith says:

      Thanks for the advice. I’ll check out the Bayesian stuff, and see if it’s over my head (I have a guess already). The important part of this post is above the fold, as I say, the theoretical things I’m talking about. I did hope that smarter people would chime in and come up with better methods.

      I’m certainly not claiming modern statistical analysis, but I also don’t think I’m using random numbers. I managed to combine scouting reports, with a set of comparisons and, yes, MLE’s, to help determine what Castro’s potential rates might be in K%. BB% and XBH% (and BABIP). The raw numbers were built from that, using round numbers when possible. I hardly contend this is the right way, but I just wanted to show a really rough draft for what I suggest in the beginning: that I think there represents an untapped area available to better value prospects.

      To answer your first question, down the road my hope is that we can come up with accurate WAR intervals for a player’s mean performance over his team-controlled seasons.

      Vote -1 Vote +1

  6. Barry Reed says:

    I too am confused… On one hand, the goal appears to be determining a systematic was to value prospects over the course of their “controlled” years… However, the bulk of the discussion comes across as a different approach to MLEs… I’m definitely not an expert, but it seems that a lot of work has already gone into the MLE part of this problem, so the focus of something like this should be more on how that gets incorporated into a total “controlled” value…

    For my tastes, I’d like to see something along the lines of the first post, where there’s a distribution of outcomes projected for each controlled season, which would result in a distribution of controlled value (not just a single number and not just a range)… Of course, working out all the necessary probabilities is not trivial, but using another idea posted above (grouping prospects into a few categories/buckets) might allow for a large enough sample size to make some reasonable estimates…

    Vote -1 Vote +1

    • Bryan Smith says:

      Barry — Thanks for the comment. Constructive criticism is what I was hoping for so we can flesh out this idea together. I’m going to do some behind the scenes work on exactly what you mentioned, and really polish this up. The Castro thing was meant only to be quick-and-dirty, and show that we really have some room to do different stuff in the field. I think you agree.

      Vote -1 Vote +1

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *