[tahoe-dev] default values of K, H, N (was: I assumed each share would go to a different server...)

David-Sarah Hopwood david-sarah at jacaranda.org
Thu Jan 13 08:48:37 UTC 2011

On 2011-01-12 07:40, Zooko O'Whielacronx wrote:
> I think we should either change the default behavior to match the
> common user expectations, or else add documentation to, if possible,
> explain the surprising thing for them when they begin trying to use
> it.
> Note that another user, more than one year ago, reported the same confusion:
> http://tahoe-lafs.org/pipermail/tahoe-dev/2009-August/002494.html
> Which is why I created ticket #778 (servers of happiness).
> There are some reasons (mostly to do with performance and
> availability) why someone might want N > H, but the newbies seem to
> expect H == N. Perhaps we should set H == N in the defaults and then
> let more sophisticated users tune the (K, H, N) for their particular
> grid and their preferences?

Here is the argument against setting H == N:

Suppose that H is equal to the number of servers on the grid. Then loss
(temporary or otherwise) of any server would prevent uploads. So for
upload availability, H should be less than the number of servers on
the grid.

In that case, setting N == H fails to make use of all the servers to
improve the preservation of files. (If you have more than 10 servers,
you might not need to put shares on all of them, but for grids of
10 servers of less, you probably do want each file's shares to be
distributed over all the servers you have.)

So to get the best preservation for a given upload availability,
H should be less than N.

> Speaking generally, I think there are at least three different
> desiderata that we could have for our default settings, and we
> probably can't have all of what we want:
> 1. (Safety) Users who entrust valuable data to it without changing the
> defaults won't lose integrity, confidentiality, or data-preservation.
> 2. (Unsurprisingness) Users will rarely be surprised by the default behavior.
> 3. (Performance and Features) Users will get good transfer speeds, the
> ability to migrate or rebalance files without having to re-encode
> them, better storage efficiency, higher fault-tolerance, etc.
> I would really like to prioritize them in this order.
> (Hm, in a sense Unsurprisingness is really the essence of Safety.
> Regardless of what the settings are, if the user understands the
> consequences of those settings then they won't be harmed.)
> I don't think default settings are a good way to accomplish
> desideratum 3 very well because the settings probably have to be tuned
> to the particular grid.

I agree. So we should concentrate on how the defaults satisfy 1 and 2.

> So I would favor some default settings like (1, 1, 1) or (1, 3, 3) or
> (3, 10, 10), because those seem to score higher on Unsurprisingness in
> my book.
> Honestly at the moment I think I favor (1, 1, 1). It works on any grid
> (even "the 1-server grid", which I imagine might turn out to be a
> valuable use case), the safety qualities should be obvious to any
> user, and it arranges for users to learn about the confidentiality and
> integrity properties first, and then separately to learn about the
> consequences of erasure-coding.

If H <= K, then files are not guaranteed to be preserved when even a
single server is lost. This was surprising to the user who wrote
and I think it would be surprising to most users.

In general, a grid can guarantee to tolerate the loss of max(0, H - K)
servers without losing any successfully uploaded files (assuming there
are no other losses of individual shares).

So putting H about half-way between K and N, gives a reasonably good
tradeoff between upload availability and preservation. That's why I
think the existing K = 3, H = 7, N = 10 defaults are reasonable, for a
"production" node. (They may not be for a demo node.)

On the other hand, both the share placement algorithm, and the documentation
about how these parameters work, definitely need some attention in v1.9.0.

David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110113/965ac10c/attachment.asc>

More information about the tahoe-dev mailing list