[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

tahoe-lafs trac at allmydata.org
Sat Oct 10 22:23:54 UTC 2009


#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:  kevan
     Type:  defect              |          Status:  new  
 Priority:  critical            |       Milestone:  1.5.1
Component:  code-peerselection  |         Version:  1.4.1
 Keywords:  reliability         |   Launchpad_bug:       
--------------------------------+-------------------------------------------

Comment(by kevan):

 Hm. That scenario would be a problem, and I don't really see an obvious
 solution to it.

 We could alter the logic at
 [source:src/allmydata/immutable/upload.py at 4045#L225] to not just give up
 after determining that there are no homeless shares, but that there aren't
 enough distinct servers with shares to consider the upload a success.

 We could, for example, figure out how many more servers need to have
 shares on them for the upload to work ( {{{n = servers_of_happiness -
 servers_with_shares}}}). We could then unallocate {{{n}}} shares from
 servers that have more than one share allocated, stick them back in
 {{{self.homeless_shares}}}, and then let the selection process continue as
 normal. We'd need a way to prevent it from looping, though -- maybe it
 should only do this if there are uncontacted peers. Would we want to
 remove shares from servers that happen to already have them if we're not
 counting them in the upload? If so, is there a way to do that?

 Does that idea make sense?


 Regarding holding up this patch versus committing now and making it a
 separate issue:

   * We'd probably want to write tests for this behavior. Do the test tools
 in Tahoe include a way to configure a grid so that it looks like the one
 in your example (I spent a while looking for such tools last weekend when
 I was trying to implement a test for your first example, but couldn't find
 them)? If not, we'd probably need to write them.
   * We'd probably want to make a better-defined algorithm for what I said
 in the paragraph up there (assuming that it is agreeable to everyone).

 I have school and work to keep me busy, so I'd be able to dedicate maybe
 an afternoon or two a week to keep working on this issue. I'm happy to do
 that -- I'd like to finish it -- but it would probably be a little while
 before we ended up committing a fix if we waited for that to be done (if
 someone with more time on their hands wanted to take over, that issue
 would be solved, I guess). So I guess that's one argument for making it a
 separate issue. On the other hand, it'd be nice to eliminate edge cases
 before committing. So there's that. I'm not sure which way I lean.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:55>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list