[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

tahoe-lafs trac at allmydata.org
Wed Oct 28 01:54:12 UTC 2009

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
 Reporter:  zooko               |           Owner:  kevan
     Type:  defect              |          Status:  new  
 Priority:  critical            |       Milestone:  1.6.0
Component:  code-peerselection  |         Version:  1.4.1
 Keywords:  reliability         |   Launchpad_bug:       

Comment(by kevan):

 I was thinking about this the other day, and got to wondering about how
 the Encoder handles preexisting shares in the event of some servers
 failing during an upload.

 (note that the following example is in terms of the existing
 {{{shares_of_happiness}}} behavior -- it is easier to link to that code
 than to my patches)

 As an example, we first look at
 [source:src/allmydata/immutable/upload.py at 4045#L711 start_encrypted] in
 CHKUploader. This method creates and runs a Tahoe2PeerSelector to
 distribute shares of an IEncryptedUploadable across the grid. The results
 of this are handled in [source:src/allmydata/immutable/upload.py at 4045#L753
 set_shareholders]. Note that the PeerTracker instances in {{{use_peers}}}
 are send to the Encoder instance, while the peerids in {{{already_peers}}}
 are only used in the upload results. In any case, after invoking
 {{{set_shareholders}}} on the Encoder, the CHKUploader starts the upload.

 The part of the Encoding process that concerns me is
 [source:src/allmydata/immutable/encode.py at 4045#L489 _remove_shareholder].
 This method is called when there is an error sending data to one of the
 shareholders. If a shareholder is lost, the Encoder will check to make
 sure that {{{shares_of_happiness}}} is still met even with the lost server
 -- if not, it will abort the upload. The problem with this check is that
 the Encoder, from what I can tell, has no way of knowing about the shares
 that already exist on the grid, and thus can't take them into account when
 making this check. So, if I (say) 8 shares for my storage index already on
 the grid, {{{shares_of_happiness = 7}}}, only two things for the Encoder
 to actually send, and one (or both) of those transfers fail, my upload
 will fail when it shouldn't.

 Does it seem like I'm off-base there? If not, then it certainly seems like
 my implementation of {{{servers_of_happiness}}} would fall victim to
 pretty much the same issue. Is there an obvious way to fix that?

 (this comment should have a unit test or two written for it, so that what
 I'm saying is demonstrated/more easily understood, but I need to leave for
 class now)

Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:66>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid

More information about the tahoe-dev mailing list