[tahoe-dev] Help uploading when file exists but needs repair

David-Sarah Hopwood david-sarah at jacaranda.org
Thu Dec 2 02:40:54 UTC 2010


On 2010-12-02 00:37, Brian Warner wrote:
> On 11/30/10 10:58 PM, Kyle Markley wrote:
[...]

>> allmydata.interfaces.UploadUnhappinessError: shares could be placed on
>> only 3 server(s) such that any 2 of them have enough shares to recover
>> the file, but we were asked to place shares on at least 4 such
>> servers. (placed all 4 shares, want to place shares on at least 4
>> servers such that any 2 of them have enough shares to recover the
>> file, sent 4 queries to 4 peers, 4 queries placed some shares, 0
>> placed none (of which 0 placed none due to the server being full and 0
>> placed none due to an error))
> 
>> So it appears it's failing to upload the .login file.  The specific
>> error message doesn't make sense to me -- if all 4 queries placed some
>> shares, and 0 queries placed none, then why hasn't the file become healthy?
> 
> There are two confusing things going on here. The first is that I think
> (but I'd have to check the code to be sure) the "4 queries placed some
> shares" message is including any "I already have a share" responses. The
> second is that the UploadUnhappinessError criteria is more strict than
> simply getting all four shares into the grid: it wants the arrangement
> of those shares to meet the "servers-of-happiness" criteria. The "at
> least 4 such servers" means s-o-h (aka tahoe.cfg's misnamed
> "shares.happy") is equal to 4.
> 
> Uploading consists of two phases: share placement, then share upload. If
> the proposed arrangement that comes out of the placement phase does not
> meet the s-o-h criteria, the upload stops before any shares are placed.
> 
> The share-placement algorithm is usually expecting the
> file-doesn't-exist-in-grid-yet case. It sends "please accept share X
> (and by the way do you have any other shares?)" messages to each server
> in permuted order, all in parallel (I think), with shnums chosen to get
> exactly one share per server if everything goes well (i.e. each server
> accepts the share offered it, and no preexisting shares were found).
> 
> I'm suspecting that something in the share-placement algorithm is
> getting stuck: the particular placement of preexisting shares and the
> order in which the queries are being sent/received is causing the
> placement algorithm to terminate, but which doesn't result in an
> arrangement that will pass the s-o-h test.
> 
> David-Sarah, you know more than I do about s-o-h and the new placement
> algorithm.. could you take a look? Given the serverids and SI described
> here, I think the permuted order should have been (xxaj,juwm,vjqc,47cs),
> but I'd like to confirm that (maybe with a flog trace), because I can't
> make that order fit with the other evidence.

I can't look at this specific case right now, but the current placement
algorithm is known to be insufficient in several cases, which are tested
by the following test cases in allmydata.test.test_upload.EncodingParameters:

  test_problem_layout_comment_187
  test_problem_layout_ticket_1124
  test_problem_layout_ticket_1128

The exception message above is in fact identical to that in both #1124
and #1128. The latter is a duplicate of
<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1130>.

A think we reached a concensus on how to fix this in
<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1212>, starting at comment:14
(kevan had previously suggested a similar algorithm in
<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/778#comment:194>.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20101202/17e56aec/attachment.asc>


More information about the tahoe-dev mailing list