[tahoe-dev] Selecting optimal FEC parameters (also, idea for new peer selection policy)

Zooko Wilcox-O'Hearn zooko at zooko.com
Wed Aug 12 15:49:37 UTC 2009

Interesting ideas, Shawn.  I'll just respond to a couple of details,  
below, plus I'll say that I think Kevan should proceed with his plan  
to implement "servers of happiness" now, as it is a small enough  
change that he can finish it before the fall semester starts.  :-)

In the long run, I can see how some people (possibly including me)  
would like the sort of sophisticated, heuristical, statistical  
approach that you envision, but I can also see how some people  
(possibly including me) would want a dumber, more predictable set of  
rules, such as "Make sure there are always at least K+1 shares in  
each of these Q co-los, and that's the entire ruleset.".

On Wednesday,2009-08-12, at 9:01 , Shawn Willden wrote:

> 1.  Though I've previously indicated that it's a bad idea to keep a  
> share locally when uploading for backup, I've reconsidered this  
> notion.  A local share is useless for ONE purpose of backups --  
> disaster recovery -- but it improves performance of retrievals for  
> many other purposes of backups.

Ah, good point!

> 2.  Retrieval performance is maximized when shares are retrived  
> from as many servers at once (assuming all are roughly equally  
> responsive).  This means that K should be set to the size of the  
> grid, and M adjusted based on reliability requirements.  This is  
> the reverse of what I have been thinking.

I went through a similar reversal:

http://allmydata.org/pipermail/tahoe-dev/2009-April/001554.html #  
using the volunteer grid (dogfood tasting report, continued)

> 3.  M larger than the grid means that each server will receive  
> multiple shares.  A reasonable first assumption is that all shares  
> on a given server will survive or fail together, so the effective  
> reliability of a file is a function not of how many shares must be  
> lost for it to disappear, but how many servers.

Yes, thus the motivation for #778.

By the way, I wonder if #678 would interest you.  If we had #678,  
then your strategy could delay making up its mind about the best  
value of M until later, during a repair process, possibly after the  
set of servers and their known qualities has changed.  It should be  
relatively easy to make up your mind about a good value for K -- for  
example, maybe just K = number-of-servers for starters?

#678 is not a ticket that can be fixed before the fall semester  
starts, though.  Hopefully it will be fixed by the next generation of  
capabilities (http://allmydata.org/trac/tahoe/wiki/NewCapDesign ).



tickets mentioned in this mail:
http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is  
the wrong measure; "servers of happiness" is better
http://allmydata.org/trac/tahoe/ticket/678 # converge same file, same  
K, different M

More information about the tahoe-dev mailing list