[tahoe-dev] tahoe-dev Digest, Vol 46, Issue 20

Josh Wilcox wilcoxjg at gmail.com
Wed Jan 12 22:22:37 UTC 2011

> Message: 6
> Date: Wed, 12 Jan 2011 00:40:21 -0700
> From: "Zooko O'Whielacronx" <zooko at zooko.com>
> To: Tahoe-LAFS development <tahoe-dev at tahoe-lafs.org>
> Subject: [tahoe-dev] default values of K, H, N (was: I assumed each
>        share would go to a different server...)
> Message-ID:
>        <AANLkTikEiOMwbfHW=WufvjjPXxqquo9J5REBq=_gG8k9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> Dear Josh, Shawn, Greg, et al.:
> While it warms my heart to see people teaching each other, it's not
> "scalable" for new users to be surprised when the behavior doesn't
> match their assumptions, then post on the mailing list and get an
> explanation about why the actual behavior differs from their
> assumptions.
> I think we should either change the default behavior to match the
> common user expectations, or else add documentation to, if possible,
> explain the surprising thing for them when they begin trying to use
> it.
> Note that another user, more than one year ago, reported the same
> confusion:
> http://tahoe-lafs.org/pipermail/tahoe-dev/2009-August/002494.html
> Which is why I created ticket #778 (servers of happiness).
> There are some reasons (mostly to do with performance and
> availability) why someone might want N > H, but the newbies seem to
> expect H == N. Perhaps we should set H == N in the defaults and then
> let more sophisticated users tune the (K, H, N) for their particular
> grid and their preferences?
> Speaking generally, I think there are at least three different
> desiderata that we could have for our default settings, and we
> probably can't have all of what we want:
> 1. (Safety) Users who entrust valuable data to it without changing the
> defaults won't lose integrity, confidentiality, or data-preservation.
> 2. (Unsurprisingness) Users will rarely be surprised by the default
> behavior.
> 3. (Performance and Features) Users will get good transfer speeds, the
> ability to migrate or rebalance files without having to re-encode
> them, better storage efficiency, higher fault-tolerance, etc.
> I would really like to prioritize them in this order.
> (Hm, in a sense Unsurprisingness is really the essence of Safety.
> Regardless of what the settings are, if the user understands the
> consequences of those settings then they won't be harmed.)
> I don't think default settings are a good way to accomplish
> desideratum 3 very well because the settings probably have to be tuned
> to the particular grid. Fran?ois has a grid with three physical
> machines and 60- or 70- odd storage server processes. I have a grid (I
> just set it up!) with eight storage server processes on a single
> Amazon EC2 virtual machine. The volunteergrid1 has 17 physical servers
> of hetergeneous size and performance, each one operated by a different
> volunteer. In the future, more people might set up their own "personal
> Tahoe-LAFS grid" consisting of only a single storage server owned by
> them. There are no default settings that are optimal for all of these
> cases.
> Documentation is probably the best way to accomplish desideratum 3.
> (Our documentation is already better than most open source projects,
> but it could also has lots of room for improvement. Volunteers
> needed!)
> So I would favor some default settings like (1, 1, 1) or (1, 3, 3) or
> (3, 10, 10), because those seem to score higher on Unsurprisingness in
> my book.
> Honestly at the moment I think I favor (1, 1, 1). It works on any grid
> (even "the 1-server grid", which I imagine might turn out to be a
> valuable use case), the safety qualities should be obvious to any
> user, and it arranges for users to learn about the confidentiality and
> integrity properties first, and then separately to learn about the
> consequences of erasure-coding.
> Thoughts?
> Regards,
> Zooko
> http://tahoe-lafs.org/trac/tahoe-lafs/ticket/778# "shares of
> happiness" is the wrong measure; "servers of happiness" is better

  For a default how about H = N?

   I notice that in:


  the user only used a one server grid in a an attempt to create an error,
which would have happened had H = N been true in his case.

  Isn't having H < N, a setup that trades off reliability for performance?

  If so, then this is trading desideratum (1) (or some component thereof)
for desideratum (3).

  Seems like H = N meets (1) and (2) at a possible cost to (3).

  Given the way you've ordered your values this choice seems good.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110112/ae646a4f/attachment.html>

More information about the tahoe-dev mailing list