## Building Fantasy Player Valuations?

I’d like to solicit the help of our community in building a useful fantasy player valuations guide. When we have the parameters set, I’ll code it and put it up on FanGraphs.

There are a couple goals here:

1. Building a useful and easy to use fantasy player valuation guide.

2. Full transparency in how all the rankings work.

I’ve dabbled in this a bit, so I will first give a starting point:

– The worth of SB, HR, and all other categories, in Fantasy Baseball

– How the Price Guide Works, Part I (Standard Scores)

– How to Value Players for Rotisserie Baseball by Art McGee (Second Edition)

If I put things into steps, it basically seems like this is the general process:

1. Find the standard deviations and averages for each category for the appropriate player pool. (12 teams, 5×5 for example)

2. Using whatever projections, create z-scores for each player for each category.

3. Add up the individual category z-scores for each player.

4. Set position eligibility for each player.

5. For each position, take number of roster eligible players and adjust the worst player in the group to 0, while re-adjusting the rest of the players.

6. Based on the full pool of money to be spent and sum of position adjusted z-scores, assign dollar values to each player. (If you’d like to manually assign a split to hitting/pitching, do this here).

Most of these are pretty straightforward, but step 1 seems to be where there is the largest amount of disagreement.

How do you choose the player pool for averages and standard deviations? Do you use last year’s stats? Do you use projected stats? Do you use iterations? Do you use empirical data from similar fantasy leagues?

Then there’s the question of where ADP or Average Auction Value comes in, and how to use that data to try and further tailor your picks.

What about points based leagues?

How does removing a player from the draft pool impact the rankings and how do you handle that?

Feel free to tackle all these questions and point out additional issues in the comments. I’ll of course continue to participate with I’m sure plenty of additional questions and comments.

Print This Post

For #1, I prefer to use at least the last 3 seasons to rule out any wild flukes in a category where someone ran away with steals or saves, etc. If the league is a long-standing one, such as Tout Wars, I’ll go back a good five+ seasons.

FanGraphs Supporting MemberSo you would use actually fantasy league data and go back 3 season (or more), and you would do this for your own league as opposed to all the leagues? I guess for first year leagues you could just use an average over all the leagues or something?

I’ll just say automating that is difficult, unless your league manager has an open API (like Yahoo). This is something which could have to be compromised on from a coding point of view, (or give the option to enter in your own averages/standard deviations).

David,

Sorry, I should have noticed that. I’m glad you’re aware of his work. If you’re planning something bigger and better than what he’s done, you’re going to make a lot of people happy.

Oops, this was meant to go below, concerning Mays Copeland’s “Price Guide.”

My own leagues are easier to pull. However, working for a company that runs leagues also allows me the luxury of pulling a variety of similar leagues and checking to see how far off my own local leagues are.

First year leagues are too crazy to pull data from, especially if they are auction leagues because dollar values tend to be crazy in first year auctions.

I don’t recommend a straight-forward average of Z-scores for each category. Clearly, some fantasy categories are more predictive than other categories; higher Z-scores in those categories deserve higher precedence than high Z-scores in non-predictive categories.

This

I’ve been trying to figure out how to account for this for a while now, but I don’t have a very strong stats background. Maybe run a regression correlating projections to actual results for the last few years? But I have no idea what one would do with those R-squared values…

I agree.

Perhaps you’re already aware of Mays Copeland’s work at http://www.lastplayerpicked.com. If not, you should take a look at his site and see what he did to create his valuation system:

http://www.lastplayerpicked.com/how-the-price-guide-works-part-i-standard-scores/

FanGraphs Supporting MemberYep, I link to that twice in the article. Definitely aware of his system and have read through his How the Price Guide Works series a number of times.

I’m going to be watching this with great interest… I’ve done this for my League a few times, and had some luck. One issue is that each league spends differently in an auction. My League seems to spend with a 2 humped distribution. Lots of guys at the high end, not much in the middle, and lots of cheap fillers at the end. I have years worth of data and I use the Auction results and run regressions against that year’s projection set.

I have never been able to crack the “inflation” issue for my draft tool.

Draft inflation can be tracked relatively effectively in a simple Excel spreadsheet. To do so, you need to create a list with player values for every player who may be purchased in your league ($0 values are acceptable here, you just need an entry for everyone you think has a good shot to be taken). This does not necessarily require you to come up with your own prices for folks, as using an expert list or an average of them can be quite useful and a major time saver.

You then need to track the price players actually go to. In a third column, set up this formula: =B3-C3. This is the difference between the players’ values and what they actually went for. You must then create sum functions to track the total salary and the difference between salary and value. Then create a cell that uses this formula: =(TOTAL LEAGUE SALARY-SALARY OF PLAYERS DRAFTED)/(TOTAL LEAGUE SALARY-VALUE OF PLAYERS DRAFTED) where value can be tracked by adding salary and excess value. You then need a final column where your original values are multiplied by the inflation factor.

Voila! Inflation calcalutor.

I created my own roto 5×5 ranking system based pretty closely on the last player picked formula. I mostly used projections but then changed them based on previous results, so I would use an average of different projections along with a weighted average of the past 3 years.

I did a variance calculation for each stat, for example (Runs-RunsonAverage)/StDevRuns. In order to do hits, I did H-(AB*AverageAVG) =xH and then (xH-xHaverage)/xHStDev. I added up all of these variances for H, R, RBI, HR and SB and got a total. Then, I ranked each position and set the lowest starter (for example, 12th SS) as a 0 and added whatever that player had to all of the other players in that position. I did this same thing with pitchers.

As far as multiple position eligibility goes, I wasn’t exactly sure how to mathematically make a player more valuable with multiple position eligibility, so I just ranked the player in every available position and only used his highest spot in my rankings.

In order to take into account ADP, I simply matched up the ADP with my rank and subtracted. If I had a player ranked 3 rounds ahead of the ADP, he was marked as a sleeper candidate.

I also projected a position for each player. The top 12 1B were projected as 1B and then any other 1B who were in the top 246 position players were set as either UTIL or BENCH players. The top 12 OF were OF1, next 12 were OF2, etc. This gave me an idea of where I was weak on the team.

My top 10 list last year for this system was:

1. Albert Pujols

2. Hanley Ramirez

3. Joe Mauer

4. Chase Utley

5. Ryan Howard

6. Miguel Cabrera

7. Alex Rodriguez

8. David Wright

9. Ryan Braun

10. Tim Lincecum

Of course, a ranking system can only be as good as the data points that go into it, so we should definitely get that set first.

My thought for positional eligibility is to count eligibility for each position as a stat category, where eligible players get a 1 and ineligible players a 0. You could even assign fractional values for players who are expected to gain eligibility at an additional position. Then the z-scores for ability to play SS, etc., will show how scarce that ability is, and multiposition players can gain in points/dollars. I have no idea if that would work without disrupting the rest of the system; it’s just a thought.

“6. Based on the full pool of money to be spent and sum of position adjusted z-scores, assign dollar values to each player. (If you’d like to manually assign a split to hitting/pitching, do this here).”

Is it possible to allow a range of characters beyond 3 in this field? I happen to be in a league where we use major league salaries, so the avg starting payroll for each team is the MLB team avg (last year was ~$97,000,000).

I’ve actually tried building my own valuation system for the first time this year. Instead of using category averages and standard deviations, however, my plan is to use value over replacement for each statistic to calculate the z-scores for each player for each category. This is really effective at taking positional scarcity into account, since you are calculating the actual value above replacement.

For example, let’s say I’m getting my value for Brian McCann, who I’ve currently got projected at 66 runs, 23 HR, 91 RBI, 5 SB, and a .278 avg over 520 at-bats. My replacement catcher is projected at 49 runs, 5 HR, 45 RBI, 0 SB, and a .265 avg. McCann’s over-replacement stats are then 17 runs, 18 HR, 46 RBI, 5 SB, and 6.76 avg (calculated by difference in avg multiplied by 520 at-bats). I can then multiply these stats by the z-score for each, giving me an accurate value for McCann compared to other players.

Z-scores are calculated by taking the over-replacement stats for the entire pool of starting players (this would vary based on league size and starting positions), and dividing that number by the amount of money you expect the league to spend on each statistic.

For the player stats I’m using an average of projections that are readily available on sites like Fangraphs, but the basic methodolgy would remain the same no matter what player projections are used.

One additional note about the z-scores:

When I say I take the over-replacement stats for the entire pool of starting players, what I mean is that I calculate the over-replacement stats for each player like I did for McCann above. I then sum these stats in each category to give me the total amount of replacement stats available in the league, and this is the number that I divide by the amount of money I expect to be spent on that stat.

The two downsides to this methodolgy that I’ve noticed are:

a) You have to know which players you want to include in your player pool before any values can be assigned. This is a fairly common problem though, and once you get to the last players selected in your league they are likely to be fairly similar.

b) You have to have a good idea of how much money will be spent on each stat above replacement in your league. This is a much tricker issue in my mind, but in some ways it might be beneficial to NOT have this information. Hypothetically, in a roto league each stat should have the same value, since no one category is worth more than the others. If you build you valuations model to reflect this, you may find that certain stats are routinely undervalued in your league, giving you an advantage at finding value for under market prices.

I see a couple of issues with dollar value calculators. The first is the lack of risk/reward input. The second is a problem calculating value off the replacement level of the position the way that it is done.

1. Risk/Reward: The best input to dollar value calculators is the next years projections, whichever you have the most confidence in. These projections may produce the same value for Justin Upton and Nelson Cruz. However, even though there average projections are equal, I believe that Justin Upton’s upside is much greater. Therefore, he should be given a bump in his value. Good risk/reward plays win fantasy titles and dollar calculators miss these values. I also do not think that these calculations do a good job with injury risk/rewards.

2. Replacement level: This adjustment typically is base off the fact that the last player bought at a position is a $1 player. The dollar values are then linearly calculated based off this replacement level player for each position. Since Catchers, Pitchers and Middle Infielders are thin this gives all of the players at these positions a bump. However, in auctions there are many players that go for $1 at the end. A greater percentage of these players are at these weaker positions. In an 12 team AL/NL only auction there may be 10 $1 catchers. I am not sure if the replacement level in this case should be the best $1 catcher or the average of the ten, but I know it is not the worst $1 catcher. Same with pitching, there are probably around 25-30 $1 pitchers picked. This has a large effect on the dollar value calculations.

2.

RWS,

You are absolutely right about point #1. You have to view dollar values as more of a guide than gospel, especially in situations like the one you described.

As for point #2, I really don’t see that as a problem for using dollar value calculations, but a reason for using them. Instead of trying to determine which players will go for $1, I use the highest ranked player available at each position outside of my player pool for my replacement players. The idea is that this player should be the person you would first pick up if one of your players got hurt. The value of that player is $0, so every player above him at his position must have some positive value.

As to the problem of 10 catchers going for $1 in a 12-team league, I view that more as an opportunity to get additional value in a draf than a problem. If I know for sure that every catcher after the first 2 will go for only $1, and my valuations tell me that the 3rd ranked catcher is worth $5, then I’ll try to make sure I get him for $1, giving me an additional $4 in value I wouldn’t have otherwise gotten.

I think this is why so many calculators spit out $ values for the high end catchers and pitchers that are too high. All of the $1 catchers are free-talent, however there is real skill in playing the end game correctly to get the right one.

Give the points leagues some love :D My wOBA/FIP based league was a huge success last year. Here’s the settings: http://www.fantasybythebook.com/fantasy-baseball-league-settings/

We had a 16 team league, and in 2011 we’re moving forward with many things. But everyone seemed to enjoy the points and more “fangraphs friendly” valuation of talent.

I’m obviously bias. But I can dream that people start picking it up!

What do your rosters look like?

Here’s the winning team:

PLAYER, TEAM POS

Carlos Ruiz, Phi C

Derrek Lee, Atl 1B DTD

Dan Uggla, Atl 2B

Jose Bautista, Tor 3B, LF, RF DTD

Derek Jeter, NYY SS

Matt Holliday, StL LF

Colby Rasmus, StL CF

Shin-Soo Choo, Cle RF, LF

Nick Swisher, NYY 1B, RF, DH

Wilson Betemit, KC 1B, 3B, DH

Jed Lowrie, Bos SS, 2B

Ben Zobrist, TB 2B, 1B, CF, RF

Kevin Youkilis, Bos 3B, 1B DTD

PLAYER, TEAM POS

Matt Garza, TB SP

Mike Pelfrey, NYM SP

Jason Vargas, Sea SP, RP

Doug Fister, Sea SP

Brian Duensing, Min SP, RP

Luke Gregerson, SD RP

Matt Thornton, CWS RP

Kyle McClellan, StL RP

Sean Marshall, ChC SP, RP

Drew Storen, Was RP

Huston Street, Col RP DTD

Phil Hughes, NYY SP, RP

R.A. Dickey, NYM RP, SP

Ricky Nolasco, Fla SP DTD

Here’s my team that finished dead even…

PLAYER, TEAM POS

Matt Wieters, Bal C

Luke Scott, Bal LF, 1B, DH

Placido Polanco, Phi 2B, 3B DTD

Jorge Cantu, Tex 3B, 1B

Jose Reyes, NYM SS

Carlos Quentin, CWS LF, RF, DH

Chris Denorfia, SD CF, LF, RF

Andre Ethier, LAD RF

Raul Ibanez, Phi LF

J.D. Drew, Bos RF

Jhonny Peralta, Det SS, 3B

Shane Victorino, Phi CF

Austin Kearns, NYY RF, LF

Chipper Jones, Atl 3B DTD

PLAYER, TEAM POS

Felix Hernandez, Sea SP

CC Sabathia, NYY SP DTD

Jered Weaver, LAA SP

Kevin Millwood, Bal SP

Brandon Morrow, Tor SP, RP

Billy Wagner, Atl RP

Ryan Franklin, StL RP

Jon Rauch, Min RP

Ryan Madson, Phi RP

Hong-Chih Kuo, LAD RP

Franklin Morales, Col RP

Ben Sheets*, Oak SP DL60

Luke Hochevar, KC SP

Jeff Francis, Col SP

The winning team was carried by Uggla, Bautista, Choo, Holliday, and good enough pitching. In fact the team wasn’t all that good at all. But it was full of above average players at every spot, and managed to get some luck out of it. Almost like the Giants. My team had a killer rotation. However, my hitters were all average or below average. In fact after Chipper went down, my team went down. :(

@JWAY what’s the fun in a wOBA/FIP league if you lose the ability to leverage it against people who use AVG and ERA???

One write up Cameron Snapp and I did a while ago is available at:

http://aggpro.blogspot.com/2010/04/fantasy-valuation.html

It is z-score based and similar to Rob’s.

The system also takes ADP into consideration to try and maximize global team value during a snake style draft.

Here is a question that I think is interesting to many of us who don’t have the stats background that a lot of these commenters have (rock on), but if Fangraphs created a Fantasy Player Valuation System, would they charge for it? If so, how much?

Anyone have thoughts, especially any of the Fangraphs guys?

What I always thought would be useful is a tool to use on draft day where you can enter each player selected by each team, and have it tally up what each team is projected to do on the season, as well as what the likely total amount of each stat left in the pool as a whole. That way you can see if there’s a run on SB, or Sv or whatever, and adjust your draft accordingly.

We can talk about draft positioning all day, but when people are in drafts they often make less-than-optimal choices because they aren’t doing a very good job of tracking what the competition already has. For instance, if everyone is making a run on SB early in the draft, you can decide to punt SB and focus on players who give the most value in the other categories, and get an advantage there.

I always wanted to write a little GUI and back end database for this, but never had the time to do it in the past.

For the last 4 years or so I’ve had an excel sheet I use live in the draft which helps me make informed choices. I’ve tried mock drafts blind without it (just in case) and I’m much better in the mid rounds having it. In the early and late rounds I seem to do OK regardless.

Early rounds you’re pretty-much limited in choices to good players, and there isn’t enough of an indication of which way people are drafting (everyone is just grabbing the best players they can). Late rounds you are filling out your roster with players who have a high upside and/or provide security for potential risks you picked earlier.

The mid rounds are when a rush on SP or closers can screw you if you catch it too late, so I would think it would help most there. With the info from the early rounds, I think you could start making better choices starting around round 4 or so. What type of sheet do you use/do you mind sharing one from a past draft and explaining it?

The sheet has become more and more complex over the years and it’s hard to explain. If anybody has ever looked at the xBABIP calculator, it’s a lot like that. Here’s the main inputs and outputs if you’re interested.

In:

– Projected player stats for all the relevant categories

– Player valuations based on the scoring system for my league

– Position eligibility for the upcoming year

– ADP and earliest pick for each player

Out:

– Projected player stats for all the teams

– Projection totals for all the teams by category

– Filled positions for all the teams

– Top remaining players at each position by valuation

None of this probably sounds all that exciting but when you put it all together it really helps.

The one thing I’ve really put any time into is the valuations. I used to base it on z-scores but now I use an absolute contribution to category totals for the year. I more or less know now what the means and standard deviations are for the categories in my league based on recent years, and aim for totals that are 1.5 or so z-scores above the mean. The sheet recomputes the contribution each player would make towards my totals each time a player is picked, and reduces the weight of the contribution in categories I’m strong in compared to the ones where I’m weak.

After the first year I used this to solve my problem of working in positional scarcity. I have the sheet compute the average contribution for the rest of the players available at each position that are likely to get picked, based on the slots that teams have already filled. So each valuation is really like a valuation-above-replacement. On top of that I do a normalization to 100, and the top guys at scarce positions just pop out, as the position becomes scarce. The team totals include these “replacement” players for positions they haven’t filled, so differences in draft strategy still lead to reasonable comparisons among the teams.

My last thing about the valuation is I don’t weight the categories evenly in the valuation score. Part of this is to account for the different category means since I don’t use z-scores, but another part is to boost the categories with low volatility and cut the ones with high volatility. I still end up with SB and 3B but these vary so widely from week to week it’s tough to win with speed. BB for the hitters and SO for the pitchers are my top weighted categories.

Works for me — I’ve done very well since I’ve had it. The rounds it helps me most are 6-12 or so. My league overvalues pitching IMO so I usually don’t take a pitcher until 6-8. This thing helps me find enough SP2 and SP3 to make up for having no ace. Once I’m done with the draft I keep the thing up to date for the first few weeks of the season and use it to evaluate trades.

David,

I think using some modified version of year N-1 makes most sense, but it would be really nice if there was a Quick Calculator using whatever formula you settle on, wherein I can type in a player who, over 200 AB, gets 65 hits, 10 HR, 2 SB, 30 RBI, 25 R and find out how much value such a player would have. I think THAT is where the true cool feature could be found, rather than in just here is what Bruce will be worth according to X-Y-Z. No matter what system you settle on, the relative weights should be similar, so it is crucial that people then be able to plug-and-play with their own individual player numbers.

VORP is key to any fantasy baseball draft analysis.

FanGraphs Supporting MemberAfter reading up on z-scores, can someone explain to me why this will work for a non-normally distributed population? Baseball talent and fantasy value clearly don’t distribute normally. (There will be dozens of near-replacement level guys for every Albert Pujols….)

“That is true, but when you consider the number of opportunities each player gets, the total effective talent distribution is rather typical”.

http://www.tangotiger.net/talent.html

I’ve been working on some simulations for “head-to-head roto” where the league is decided by head-to-head match-ups. It’s my contention that the standard z-score method doesn’t completely work for the variability encountered in the short scoring sessions. Essentially I used ‘real data’ to simulate weekly HR/SB, etc. to see what a 65 SB guy gives you each week vs. a 40 SB guy when SB categories in H2H are much closer than end-of-season category spreads. It ends up being more complex, and of course you can always use ‘expected values’ as the z-score method would be doing, but I think there’s more to it.

However, combining the z-scores with a z-value-over-replacement has worked relatively well in general. However, calculating replacement often depends highly on the depth of the rosters in your league. This is especially true for daily lineups without play time limits (valuable bench players can be useful). I think the real key is deciding on this level.

I’ve also toyed with making the value increases non-linear (to get to philosofool’s question above).

Pitchers are a little tricky, especially when considering rate stats and play time weighting for these and the ability to add and drop relievers at will for ERA/WHIP help.

Couple of thoughts (14 team league, 6×6 roto (we have batter K’s and pitchers L’s), 1 c/1b/2b/ss/3b, 5 OFs, 2 utility, 5 SP, 2 RP)

For appropriate player pool I usually use any player with a projection in all the projection resources I use (all the ones on your site), I haven’t seen much benefit to limiting the player pool. I also use a weighting system of 33% prior year and 66% projections (I’m not entirely sure why)

I also use a position scarcity factor (based on average of the starting positional lineup spots) instead of using 0 since we have bench/utility players and just because a player is the xth best at a postion doesn’t mean he has the same value as a different player that is the xth best at their position.

Readjusting the rankings based on players taken is something I have tried in the past but seems to breakdown later in the draft (think about steals and a player projected to steal 20 but with no other value when very few, if any steal players are left). I end up utilizing a “best player available” theme but also pay attention to my lineup construction and I have the highs in each category listed above my projected roster totals so I don’t overdraft a category. Basically its beyond my capabilities to have a truly automated system (which is why you need to create this monster).

I don’t use ADP in the actual rankings but have it next to the ranking so I can gauge whether I can wait. If I used it I’d use it as an additional category that would basically inflate or deflate a players score based on their pick position.

I use an indexed score but will probably experiment with the z-score.

Lastly, I use a weighting system per category to account for the large fluctuations in some categories year to year (so pitchers K’s are more heavily weighted than W’s and L’s) and to take into account the historical biases people in my leagues have. I will sometimes tweak this if I’m going heavy in a category early on.

This would be really cool if you guys build this David.

I’d like to see some separate calculations for head-to-head leagues, like Millsy is describing just above.

The right way to calculate these is to find the standard deviation in the weekly totals, and thus the standard deviation in the difference between my weekly total and my opponent’s weekly total. (The whole point is you want this difference to be positive, as that means a win for you.)

I.e., if my weekly totals for HR are 5, 8, 3; and my opponent’s weekly totals are 6, 4, 4; the differences are -1, 4, -1 for a standard deviation of the differences of about 3.

That SD becomes your divisor for the category. So a 15-HR hitter is worth 5 Wins, and then you just have to subtract off replacement value to find his overall value.

So that’s the theoretically correct way to do it (I believe) but of course you don’t have all the data up front. So I can do it for my league based on past data of average weekly differences for each category. I don’t know how any of that can be made more universal, though.

Hey Matt, I don’t see how a standard deviation of 3 HR difference between the winning and losing team makes a 15 HR guy worth 5 wins. Can you explain a little more?

I described all that stuff in my excel sheet before but never mentioned I’m also in H2H leagues, not roto. I use projected season totals instead of trying to simulate H2H because given enough iterations, the simulation should converge based on the totals anyway. As I understand it this is how pythagorean winning percentage works — the more you outscore your opponent on average, the higher your expected winning percentage.

My thinking is that if you know the mean and standard deviation for each category, then if your totals for the year are 1 z-score above the mean for each category you’d expect to win nearly every category every week. If the team category totals for the year have roughly a normal distribution then you’d expect to win 5 of every 6 weeks.

The distribution is probably not normal, projections are just projections, and things like MLB matchups, injuries, and your H2H schedule lead to a lot of volatility in the weekly data. It doesn’t make this type of analysis invalid, it just means when you look at a few weeks’ results over the season it won’t necessarily represent the whole.

If all you have is a static valuation for each player, that only helps you so much. The valuation should change as the market changes — as players are picked in the draft, or later, free agents are picked up and dropped, players go on the DL, etc. So I think tango’s units above replacement method only works to rank players before the draft. Even if the market didn’t change during the draft you’d still have to account for the change in marginal value of a HR, RBI, etc. as you accumulate higher or lower (projected) numbers of them during the draft.

I created a valuation model last year, which coincidentally was very similar to what was laid out on lastplayerpicked (so I used iterations to get my player pool). What was very interesting was that I was able to pull mock auction values from yahoo and run a systems optimization during my draft that gave me the best remaining team I could draft with my remaining money. If you let it maximize Z-Scores (instead of over replacement level positional Z-Scores), then it will automatically account for positional scarcity in the optimization.

I’m very excited to see this develop. I have been using Mays’ LPP model for 3 seasons now, with great success.

here is some food for thought:

1) Playing time projections are super important to get right. the community playing time projections, which were incorporated into LPP very late in the game last year, produced the best valuations I think. the earlier you can start to get these incorporated, the better

2) a system which averages out projections also seemed to work the best. going with something like CHONE by itself can produce wacky results for players that CHONE loves but nobody else does. a blended projections using zips/marcel/tht/whatever is preferable, especially for the superstars. these are the guys you can’t go wrong.

3) how will this be different from LPP? by the iterations and standard scores?

btw, Tango just linked to this

any update on this? still happening?