[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Mon Jan 25 16:51:03 UTC 2010

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
 Reporter:  zooko                      |           Owner:  zooko   
     Type:  defect                     |          Status:  assigned
 Priority:  critical                   |       Milestone:  1.6.0   
Component:  code-peerselection         |         Version:  1.4.1   
 Keywords:  reliability review-needed  |   Launchpad_bug:          
---------------------------------------+------------------------------------

Comment(by zooko):

 Okay a few quick notes:

 I'm getting closer!  For one thing, there is less code now, and it is
 starting to make more sense to me.  But I didn't manage to really
 understand the whole thing on this bus ride.

 I got stuck on this line for a while:
 {{{
     # This is a list of every peer that has a share in our layout.
     servers = reduce(lambda x, y: x + y, [list(x) for x in
                                           existing_shares.values()], [])
 }}}

 In general when reading Python I find it easier to understand "imperative
 style" in which variables get initialized and mutated than "functional
 style" in which values get computed as compound functions of their inputs.
 After I figured out what the line above does then I saw that the later
 code uses {{{count()}}} on {{{servers}}}, and so I think the intended
 semantics of {{{servers}}} is a list where each serverid appears one time
 for each unique share that the server holds.

 I would find it easier to understand if {{{servers}}} were a map from
 serverid to number of unique shares that that server holds (or will hold
 according to the current plan as represented by {{{used_peers}}}).  Oh, in
 fact, how about calling {{{servers = shares_by_server(existing_shares)}}}
 (after {{{existing_shares}}} has been updated to include the information
 from {{{used_peers}}}), then use {{{len(servers[serverid])}}} to get the
 number of shares?

 Oh, this {{{existing_shares}}} has the wrong name now that it includes
 information about planned shares from the {{{used_peers}}} argument.
 Maybe just {{{shares}}}.  :-)

 Okay, then I ran out of time and didn't read the new changes to upload.py.

 I think it would help if the algorithm for calculating servers-of-
 happiness were written out in docs.  I mean the algorithm as described
 imperatively as a sequence of calculations and mutations.  Let me try to
 explain the algorithm myself as a way to get you started on documenting
 it:

 The algorithm for calculating servers-of-happiness starts with a map from
 servers to shares that the server holds. More than one server can map to a
 single share and more than one share can be mapped to from a single
 server.

 The algorithm looks for a share that has more than one server mapping to
 it and then excludes all but one of those links (from server to share).
 Then it iterates this until there are no more cases of a share with more
 than one server mapping to it, and then it is done and the happiness value
 is the number of servers mapping to a share.

 Now to finish fully defining the algorithm we have to explain how it
 chooses which link to retain when it finds multiple links pointing to a
 share.

 By the way last night a servers-of-happiness puzzle occurred to me.
 Suppose server A maps to shares 0 and 1, and server B maps to shares 1 and
 2, and server C maps to share 2.  Now if the servers-of-happiness
 algorithm described above first looks at share 1, it has to decide whether
 to exclude the link from server A to share 1 or from server B to share 1.
 If it chooses to exclude the link from server B to share 1 then it is
 going to end up returning a servers-of-happiness value of 2.  But if it
 chooses to exclude the link from server A to share 1 then it is going to
 end up returning a servers-of-happiness value of 3.  But I don't think the
 algorithm can figure that out just by examining share 2.  We should write
 a unit test of this case and (a) see whether the current algorithm passes
 that test and (b) if it does, figure out why it does and see if we can
 write another unit test that it won't pass. :-)

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:156>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid