[tahoe-dev] #467 static-server-selection UI (was: web "control panel", static server selection UI)
chris at noncombatant.org
Tue Jan 25 04:36:44 UTC 2011
Brian Warner writes:
> Our assumption is that server reliability (and the user's opinion of it)
> is a fuzzy concept that the Tahoe client node cannot figure out on its
> own. It depends upon all sorts of complicated human things like whether
> the server operator is a good admin or a lazy one, a friend of yours or a
> vendor/customer, whether they'll hold shares as a favor for you, or in
> exchange for money, or if they'll delete the shares at the first sign of
> the disk getting full. So I think we need a way for the user to explain
> what they want to the client node, delegating a lot of the
> reliability-prediction work to a human.
As usual, I must disagree. :)
Remember, our aim is to achieve provider-independent security, where
reliability is a security property. The problem is not that reliability is
too fuzzily-defined, but that it is hyper-precisely defined. Why should our
reliability depend on the nature or behavior of the server operator or on
our relationship to her or on our bandwidth/latency to her? We seek to be
independent of that. Additionally, the problem most often exists between
keyboard and chair (especially in my case!), and we're really confusing the
hell out of the poor problem agent.
Always start with the user story, preferably including UI mockups, and nail
down the technical bits last.
Example: What if the user moved a slider on a continuum from Low Cost to
High Reliability, and the storage client redefined its erasure coding/share
creation/share storage policies to suit? Maximally Low Cost is k = m (like
RAID 0), maximally High Reliability is k = 1. Imagine further that m is a
function of the number of servers available on average, and that it too can
change. A moderately (but not maximally) High Reliability selection might
result in k = 3, m = 10, and for all the shares to be reproduced on "virtual
grids" of size m, any of which are accepted as servers of shares later.
(Think of RAID striped mirrors and mirrored RAID 5s and so on.)
Now, that's pretty hard to imagine implementing, I admit. For example,
changing k and m has been a pain point for people, and tuning them
dynamically just sounds hard.
My plan for Octavia is/was along these lines, but design and implementation
is made simpler by the lack of erasure coding. Octavia always mirrors every
file segment to as many servers as it can. This is dumb, high cost, and high
reliability, and it is ok because disk space is the cheapest computing
resource we have. Wasting money here allows us to save on UI and
implementation complexity, and also guarantees that the parallel-reads
optimization is always available (because read time for a network filesystem
is always a pain point). In Octavia, the Cost/Reliability slider merely
means "How many servers should I copy files to: 1, 5, or Overkill?"
Then, rather than letting slow servers block uploads ("not happy yet!",
"slow server stopping us from reaching k!", et c.), we can merely provide a
status indicator (red, yellow, green) telling the user how close we are to
reaching their desired point on the Cost/Reliability slider.
Writes are cached locally (hence seemingly "fast"); each client is at least
a caching server to itself and possibly a server to friends in the grid.
This gets us status = Red.
Asynchronously, the client tries to copy segments to other servers,
gradually reaching status = Green. If a server is offline, the client just
keeps trying (like an email server) for the server to come back. A slow
server is annoying (or completely ignorable, if enough faster servers are
available), rather than fatal.
Now, we have a situtation that normal users, expert users, and implementers
can all understand. All the words have their colloquial meanings: status,
green, yellow, red, 1, 5, overkill, cost, reliability. Advanced users can
look up "asynchronous" in the dictionary and think, "Ohh, that's why it's
always Red for a moment after I save something. Cool. One time it stayed Red
for like 10 minutes and I was like whoa! Then it turned Yellow and I was
Yes, we've pessimized storage efficiency and made fully Green writes slow
(because many copies), so the design is "dumb" or "wasteful" or
"embarrassing". But we've improved understandability (for all classes of
person), read/write performance, and provider-independent reliability.
More information about the tahoe-dev