[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

tahoe-lafs trac at allmydata.org
Thu Sep 10 10:20:09 UTC 2009


#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
 Reporter:  zooko               |           Owner:  kevan   
     Type:  defect              |          Status:  assigned
 Priority:  critical            |       Milestone:  1.5.1   
Component:  code-peerselection  |         Version:  1.4.1   
 Keywords:  reliability         |   Launchpad_bug:          
--------------------------------+-------------------------------------------

Comment(by kevan):

 (maybe writing up the problem in detail will help me to think of a
 solution)

 If I understand the code, Tahoe-LAFS (in
 [source:src/allmydata/immutable/checker.py at 4045#L287 checker.py]) defines
 an immutable file node as healthy if all of the shares originally placed
 onto the grid ({{{m}}}) are still available (for some definition of
 available, depending on the verify flag), unhealthy if fewer than {{{m}}}
 but more than {{{k}}} shares are still available, and unrecoverable if
 fewer than {{{k}}} shares are still available.

 In [source:src/allmydata/interfaces.py at 4045#L1628], ICheckable defines the
 method {{{check_and_repair}}}, which tells the receiving object to check
 and attempt to repair (if necessary) itself: this interface and method are
 implemented by [source:src/allmydata/immutable/filenode.py at 4045#L181
 FileNode], which represents an immutable file node on the Tahoe-LAFS grid.

 The check and repair process proceeds something like this (again, if I
 understand the logic):
   1. A Checker is instantiated and started on the verifycap of the
 FileNode.
   2. If the results of the Checker indicate that the FileNode is in need
 of repair, and that the FileNode can be repaired, a Repairer is
 instantiated and started.
   3. The results of the repair operation are reported back to the caller.

 (I know there's a bit of hand waving in there, but hopefully I got the
 gist of it)

 The repairer ([source:src/allmydata/immutable/repairer.py at 4045#L14]) is
 pretty simple: it downloads the content associated with the FileNode in
 the normal way using a DownUpConnector as a target, and then uploads the
 DownUpConnector in the normal way. Since
 [source:src/allmydata/immutable/repairer.py at 4045#L87 DownUpConnector]
 implements [source:src/allmydata/interfaces.py at 4045#L1400
 IEncryptedUploadable], it is responsible for providing the encoding and
 uploading operations with encoding parameters, including
 {{{servers_of_happiness}}}.

 The problem that this long-winded comment is getting to is here:
 [source:src/allmydata/immutable/download.py at 4048#L869]. The
 CiphertextDownloader sets the {{{happy}}} encoding parameter of its target
 to be {{{k}}}. Since {{{k}}} can be bigger than
 {{{servers_of_happiness}}}, this isn't good. In most cases, the
 accompanying comment is right; in the case of a file repair, it isn't,
 because the encoding parameters stored in the DownUpConnector are used by
 the encoding + upload process.

 I think that the CiphertextDownloader should ideally be following the
 user's configured {{{happy}}} value instead of setting it to something
 else. Where I'm stuck is in figuring out a way to tell it what {{{happy}}}
 should be. Some ideas:
   * Parse the configuration file: this is straightforward, but ugly,
 because it duplicates the configuration file parsing code, and duplicates
 it in a part of the program that doesn't really have anything to do with
 parsing the configuration file.
   * Pass it as a parameter to the Repairer, which then passes it as a
 parameter to CiphertextDownloader, which then uses it: but I don't see
 where in FileNode I'd get {{{happy}}}.

 In any case, that's basically the one stumbling block that I'm aware of in
 this ticket.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:45>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list