[tahoe-dev] [tahoe-lafs] #872: Adjust the probability of selecting a node according to its storage capacity

tahoe-lafs trac at allmydata.org
Wed Dec 30 07:01:53 UTC 2009


#872: Adjust the probability of selecting a node according to its storage
capacity
------------------------------------------------------------+---------------
 Reporter:  davidsarah                                      |           Owner:           
     Type:  enhancement                                     |          Status:  new      
 Priority:  major                                           |       Milestone:  undecided
Component:  code-peerselection                              |         Version:  1.5.0    
 Keywords:  performance scalability preservation bandwidth  |   Launchpad_bug:           
------------------------------------------------------------+---------------

Comment(by davidsarah):

 Replying to [comment:10 davidsarah]:
 > Besides, I have an idea about how to do better, but I'll have to think
 about it some more.
 The idea worked out.

 The comment:3 algorithm is equivalent to picking the minimum hash value
 out of ''e'' independent hash values for each server. We can get the same
 result by taking a single hash, and transforming it so that it follows the
 same distribution as the minimum of ''e'' hashes would have done.

 Let X''e'' be the distribution given by the minimum of ''e'' independent
 uniform distributions U1..''e'', each on [0, 1). The
 [http://en.wikipedia.org/wiki/Cumulative_distribution_function cumulative
 distribution function] of X''e'' is given by:
 ||F_X''e''(x)|| = P(X''e'' <= x)                            ||
 ||           || = P(min(U1, U2, ... U''e'') <= x)           ||
 ||           || = 1 - P(U1 > x) P(U2 > x) ... P(U''e'' > x) ||
 ||           || = 1 - (1-x)^''e''^                          ||

 Then we can use [http://en.wikipedia.org/wiki/Inverse_transform_sampling
 inverse transform sampling] to generate samples from X''e''. For that we
 need the inverse of F_X''e'' which is
 ||(F_X''e'')^-1^(y) = 1 - (1-y)^(1/''e'')^||

 So if we let y be the peer id hash for a given server scaled to the range
 [0, 1), and ''e'' be its capacity estimate, then sorting according to 1 -
 (1-y)^(1/''e'')^ will give the same distribution of permutations that we
 would have got by the comment:3 algorithm.

 Plotting (F_X''e'')^-1^ for various ''e''
 [http://fooplot.com/index.php?&type0=0&type1=0&type2=0&type3=0&type4=0&y0=1-(1-x)^(1/1)&y1=1-(1-x)^(1/2)&y2=1-(1-x)^(1/3)&y3=1-(1-x)^(1/4)&y4=1-(1-x)^(1/5)&xmin=-0.35&xmax=1.35&ymin=0&ymax=1
 gives results] that are intuitively reasonable in order for this to work:
 increasing ''e'' biases the transformed hash toward lower values that are
 more likely to be near the start of the list (but for any ''e'', there is
 still some chance that the server will not be picked).

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/872#comment:11>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list