[tahoe-dev] server selection

Zooko O'Whielacronx zooko at zooko.com
Tue Apr 21 19:42:48 UTC 2009


One thing I learned from attending CodeCon is that there are a lot of  
people who have ideas about how to use Tahoe and how they want its  
server selection to be this or that specific policy.

Here are my notes, which I just entered onto a wiki page.  Please  
reply to the list and then update the wiki page to reflect the new  

Different users of Tahoe have different desires for "Which servers  
should I upload which shares to?".
  * allmydata.com wants to upload to a random selection, evenly  
distributed among servers which are not full; This is,  
unsurprisingly, what Tahoe v1.4 currently does.
  * Brian has mentioned that an allmydata.com-style deployment might  
prefer to have the servers with more remaining capacity receiving  
more shares, thus "filling up faster" than the servers with less  
remaining capacity.
  * Kevin Reid wants, at least for one of his use cases, to specify  
several servers each of which is guaranteed to get at least K shares  
of each file, in addition to potentially other servers also getting  
  * Shawn Willden wants, likewise, to specify a server (e.g. his  
mom's PC) which is guaranteed to get at least K shares of certain  
files (the family pictures and movies files).
  * Some people -- I'm sorry I forget who -- have said they want to  
upload at least K shares to the K fastest servers.
  * Jake Appelbaum has said that he wants to specify a set of servers  
which collectively are guaranteed to have at least K shares -- he  
intends to use this to specify the ones that are running as Tor  
hidden services and thus are extra attack-resistant but also extra  
slow-and-expensive to reach.
  * Several people -- again I'm sorry I've forgotten specific  
attribution -- want to identify which servers live in which cluster  
or co-lo or geographical area, and then to distribute shares evenly  
across clusters/colos/geographical-areas instead of evenly across  
As I, Zooko, have emphasized a few times, we really should not try to  
write a super-clever algorithm into Tahoe which satisfies all of  
these people, plus all the other crazy people that will be using  
Tahoe for crazy things in the future. Instead, we need some sort of  
configuration language or plugin system so that each crazy person can  
customize their own crazy server selection policy. I don't know the  
best way to implement this yet -- a domain specific language?  
Implement the above-mentioned list of seven policies into Tahoe and  
have an option to choose which of the seven you want for this upload?  
My current favorite approach is: you give me a Python function. When  
the time comes to upload a file, I'll call that function and then use  
whichever servers it said to use.




