[tahoe-dev] default values of K, H, N (was: I assumed each share would go to a different server...)

Zooko O'Whielacronx zooko at zooko.com
Wed Jan 12 07:40:21 UTC 2011


Dear Josh, Shawn, Greg, et al.:

While it warms my heart to see people teaching each other, it's not
"scalable" for new users to be surprised when the behavior doesn't
match their assumptions, then post on the mailing list and get an
explanation about why the actual behavior differs from their
assumptions.

I think we should either change the default behavior to match the
common user expectations, or else add documentation to, if possible,
explain the surprising thing for them when they begin trying to use
it.

Note that another user, more than one year ago, reported the same confusion:

http://tahoe-lafs.org/pipermail/tahoe-dev/2009-August/002494.html

Which is why I created ticket #778 (servers of happiness).

There are some reasons (mostly to do with performance and
availability) why someone might want N > H, but the newbies seem to
expect H == N. Perhaps we should set H == N in the defaults and then
let more sophisticated users tune the (K, H, N) for their particular
grid and their preferences?

Speaking generally, I think there are at least three different
desiderata that we could have for our default settings, and we
probably can't have all of what we want:

1. (Safety) Users who entrust valuable data to it without changing the
defaults won't lose integrity, confidentiality, or data-preservation.

2. (Unsurprisingness) Users will rarely be surprised by the default behavior.

3. (Performance and Features) Users will get good transfer speeds, the
ability to migrate or rebalance files without having to re-encode
them, better storage efficiency, higher fault-tolerance, etc.

I would really like to prioritize them in this order.

(Hm, in a sense Unsurprisingness is really the essence of Safety.
Regardless of what the settings are, if the user understands the
consequences of those settings then they won't be harmed.)

I don't think default settings are a good way to accomplish
desideratum 3 very well because the settings probably have to be tuned
to the particular grid. François has a grid with three physical
machines and 60- or 70- odd storage server processes. I have a grid (I
just set it up!) with eight storage server processes on a single
Amazon EC2 virtual machine. The volunteergrid1 has 17 physical servers
of hetergeneous size and performance, each one operated by a different
volunteer. In the future, more people might set up their own "personal
Tahoe-LAFS grid" consisting of only a single storage server owned by
them. There are no default settings that are optimal for all of these
cases.

Documentation is probably the best way to accomplish desideratum 3.
(Our documentation is already better than most open source projects,
but it could also has lots of room for improvement. Volunteers
needed!)

So I would favor some default settings like (1, 1, 1) or (1, 3, 3) or
(3, 10, 10), because those seem to score higher on Unsurprisingness in
my book.

Honestly at the moment I think I favor (1, 1, 1). It works on any grid
(even "the 1-server grid", which I imagine might turn out to be a
valuable use case), the safety qualities should be obvious to any
user, and it arranges for users to learn about the confidentiality and
integrity properties first, and then separately to learn about the
consequences of erasure-coding.

Thoughts?

Regards,

Zooko

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/778# "shares of
happiness" is the wrong measure; "servers of happiness" is better



More information about the tahoe-dev mailing list