[tahoe-dev] mutable file and directory safety in allmydata.org "tahoe" 0.9
zooko
zooko at zooko.com
Wed Mar 12 15:48:11 UTC 2008
Folks:
We need to release v0.9 before taking the time to either implement
Tahoe Lock Files (and Brian doesn't like the idea), or Recovery From
Colliding Writes (and I don't like the idea of relying on recovery as
a means of handling multi-writer).
Therefore, what is left to do for v0.9 is:
1. Fix the ?t=uri and ?t=set_children methods and the DELETE method
to check the directory's version number, do delta application of the
requested change, upload the new version of the directory, catch
UncoordinatedWriteError, re-download, re-apply the requested change,
upload, etc. until the upload succeeds. (To understand more about
this idiom, read on.)
2. Document the limitations on safe usage of v0.9.
The limitations on safe usage of v0.9 is summarized thusly:
a. Immutable files are really, really, safe. They are as safe as we
could make them. If at least three out of the ten servers that the
client used when uploaded the file are still in working order, then
you can get the file back.
b. Mutable files and directories are as safe as we could make them
-- just like immutable files -- except for the following two caveats:
i. Mutable files do not have the property of "write but don't
overwrite". Whenever you write a mutable file, you are overwriting
whatever is currently in that file, and there is no way for you to
know what was there. Therefore, for example, if you download a
mutable file, append some new data to the end of it, and write your
new version, there is no way for you to know whether someone else
uploaded a different change after you downloaded the old version and
before you uploaded your new version. If they did, then your upload
of a new version will silently wipe out their change.
ii. Mutable files have a ...
Argh. Folks: I just went to implement "robust application of
set_children", as per #1 above, and discovered *two* previously
unknown ways that multiple uncoordinated writes to a directory can
cause silent data loss. I see a way to prevent one of these newly
discovered problems by extending the mutable file write protocol --
it would probably take a few days because Brian and I have previously
discussed it and more or less know what to do. I see no way to
prevent the other newly discovered problem without a multiple-week
development effort.
Okay, let's back up a step. Basically what is going on here is that
we designed the Tahoe mutable file scheme with the explicit
requirement that the user never tries to do multiple writes at the
same time. Subsequently, allmydata.com started using it with
multiple writes (there was a Documentation Failure), and in our
testing nothing bad happened, and so we started saying "Oh, maybe
writes are sufficiently robust that it is okay to use them like this.".
However, this was a mistake.
We already know of ways that, depending on how many writers and how
often they write, and depending on when the network connections or
the clients or the servers crash, that using Tahoe that way can
silently lose data.
Worse, we don't know if there aren't other, more likely, ways that
using Tahoe this way can silently lose data. Nobody has gone back
over the whole design to understand what the implications of this
usage are, and when I just tried to implement a simple extension to
the wapi to allow more convenient way to add children to a directory,
I found *two* more.
However, there is an easy solution. One sufficiently easy that we
can implement it today and release v0.9 on schedule. That solution
is to leave the Tahoe API with the same requirement that it has
always had since mutable files were first designed:
== The Prime Coordination Directive: "Don't Do That" ==
http://allmydata.org/trac/tahoe/browser/docs/mutable.txt?rev=2253#L48
and for users (i.e. allmydata.com) to make sure that they don't so
that. I will talk to Mike Booker to be sure, but I'm pretty sure
that allmydata.com can easily enough avoid uncoordinated writes in
the Allmydata 3.0 product, simply by having few or no shared-
writeable directories, or by creating a simple centralized lock
server when necessary.
Regards,
Zooko
P.S. Someday someone might search history for instances of the term
"LAUGFS", which stands for "Least AUthority Grid File System".
Hello, there, searcher from the future!
More information about the tahoe-dev
mailing list