[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Sat Oct 10 15:49:31 UTC 2009

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:  kevan
     Type:  defect              |          Status:  new  
 Priority:  critical            |       Milestone:  1.5.1
Component:  code-peerselection  |         Version:  1.4.1
 Keywords:  reliability         |   Launchpad_bug:       
--------------------------------+-------------------------------------------

Comment(by zooko):

 Kevan:

 It's great to see that you study the code so carefully.  This gives me a
 nice warm fuzzy feeling that more eyeballs have looked at it.  Anyway, I'm
 very sorry it has been two weeks since it was my turn to reply on this
 ticket and I haven't done so.  I've been really busy.

 I guess the key fact that you've shown that I didn't appreciate is that
 the variable {{{_servers_with_shares}}} holds only servers that have a
 ''new'' share that isn't already held by one of the servers in that set.
 Perhaps it should be renamed to something like
 {{{_servers_with_unique_shares}}}.  (If you do that, please use the
 {{{darcs replace}}} command to rename it.)

 Now I can think of one more issue.  You've pretty much convinced me that
 this way of counting {{{_servers_with_shares}}} can't overcount unique
 shares which are available on separate servers, but it could undercount.
 For example, suppose {{{s_1: f_1}}}, {{{s_2: f_2}}}, {{{s_3: f_3}}},
 {{{s_4: f_1, f_2, f_3}}}.  Then if {{{s_4}}} is counted first it will
 prevent {{{s_1}}}, {{{s_2}}}, and {{{s_3}}} from being counted because
 they don't have any new shares, so the final value of
 {{{_servers_with_shares}}} will be 1.  On the other hand if {{{s_1}}},
 then {{{s_2}}}, then {{{s_3}}} are counted first the final value will be
 3.  If this is right, then it means that sometimes an upload could be
 reported as failing (because the uploader happened to talk to {{{s_4}}}
 first) when it should have been reported as succeeding.

 What do you think?  It might be worth committing your patch as is, meaning
 that trunk would then potentially suffer from uploads spuriously failing
 when they shouldn't have (but they never suffer from uploads spuriously
 succeeding when they shouldn't have), and then starting on a separate
 patch to avoid that problem.  Or, perhaps we should keep this patch out of
 trunk even longer and think about that issue.

 Regards,

 Zooko

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:53>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid