[tahoe-dev] uploading unhappiness error messages

David-Sarah Hopwood david-sarah at jacaranda.org
Wed Jan 5 03:11:08 UTC 2011


On 2011-01-04 16:30, Zooko O'Whielacronx wrote:
> I've heard from several people that they don't like the complex,
> detailed error messages that you get if you try to upload and the
> uploader can't achieve happiness.
> 
> So these people naturally suggest reducing the amount of information
> in the error message to make it nicer to read.
> 
> However, please understand that this error message started out being
> nice and simple like that, but during the development of #778, I
> repeatedly encountered cases where an upload could fail and the error
> message could leave the user really mystified as to why it didn't
> work. So I repeatedly asked Kevan (the architect of #778) to add more
> information into the error message to clarify that case. At the end of
> this process, the error message had a fairly complete description of
> what went wrong in it -- how many servers failed to accept new shares
> and why, what level of happiness was achieved, etc.
> 
> Now, it may be possible to make the error message clearer, more nicely
> formatted, or even shorter without regressing to the earlier
> situation, where a user could be utterly baffled and unable to figure
> out why it didn't work. Referring to an explanation outside of the
> text of the error message might help.
> 
> But if you do so please be careful not to undo the work that Kevan and
> I did to avoid cases where the user could be stuck with no way to
> understand what happened. You can find almost all of that process
> recorded in the giant ticket #778.
> 
> I suspect that the problem is inherently complex -- that there isn't a
> concise way to completely explain failures of servers-of-happiness.
> 
> This would lead us to wonder if the servers-of-happiness design itself
> is unnecessarily complex. (Brian has expressed dissatisfaction about
> servers-of-happiness a few times.) However, I haven't yet seen a
> simpler design with similarly good safety properties, so my current
> hypothesis is that erasure-coding your file onto multiple servers just
> has inherently complex failure modes, and there is no way to implement
> it much simpler than servers-of-happiness, and no way to show error
> messages to the user much simpler than the current error messages.
> 
> Prove me wrong! :-)

Our current share placement algorithm can cause failures even if
the number of servers that place shares correctly would have been
sufficient had we attempted a better placement. *If* we fix this
problem, then I think we could remove some information from the
message without loss.

What I suggest in that case is something like:

"We started with A servers holding unique shares for this file, and
 need at least B more to reach the happiness threshold (C), but only
 D servers placed a share successfully. Of E connected servers, F
 reported an error and G were out of space."

(where A+B = C, and D+F+G = E)

In the case A = 0 (a new upload), we can reword this as:

"We started with no servers holding shares for this file, and need at
 least C holding unique shares to reach the happiness threshold, but
 only D servers placed a share successfully. Of E connected servers,
 F reported an error and G were out of space."

Note that for these messages to make sense and give sufficient information,
we have to only try to place shares on servers that are not in the
maximum matching used to compute the initial happiness A. Also, D must
be less than B (otherwise we have a bug in the share placement). That
means that switching to these messages depends on implementing the
better share placement algorithm in
<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1212#comment:14>.

If we detect some bug in the share placement, OTOH, then we should dump
as much information as possible, and not worry too much about the
niceness of the error message other than making clear it is a bug (for
example the message can give the complete 'before' and 'after' sharemaps).

> Also, this is further reason, in my mind, to make the default value of
> K be 1, simplifying the user experience for first time users in
> several different ways.

We shouldn't set the default for K to 1 without fixing #1293.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110105/5cde4031/attachment.asc>


More information about the tahoe-dev mailing list