[tahoe-dev] mutable file and directory safety in allmydata.org "tahoe" 0.9

zooko zooko at zooko.com
Wed Mar 12 17:15:56 UTC 2008


The following is very brief, because I want to get on the phone with  
Mike Booker and learn how the Windows client actually works.

I feel very urgent about the schedule of allmydata.org "Tahoe" 0.9.0  
and of allmydata.com 3.0.

I'm replying on tahoe-dev instead of in private, faster, voice  
channel only in order to preserve the public record.

On Mar 12, 2008, at 10:45 AM, Brian Warner wrote:

> I propose to fix this (today) in the following way:

The fix you suggest was indeed the one that I started on when I  
discovered the two new ways to lose data.  (Below)

Except for:

> I recommend that the dirnode delta operations (add_children() and  
> friends)
> *not* attempt to perform retry at this time.  We can make writes  
> safe from
> this blind overwrite bug by implementing update(), but continue to  
> treat UCW
> as a user error and not feel an urgent need to protect the user  
> from it. I
> believe that UCW will be rare enough for the next month that we  
> don't need to
> go out of our way to hide them.

I don't understand -- what would the Windows user interface do if it  
got a UCWError exception?

>> Argh.  Folks: I just went to implement "robust application of
>> set_children", as per #1 above, and discovered *two* previously
>> unknown ways that multiple uncoordinated writes to a directory can
>> cause silent data loss.
> Could you describe these two new problems?

Okay, but just to be clear, I am *not* saying "We should not ship  
uncoordinated multiple writers in Allmydata 3.0 because of these two  
problems.".  I am saying "We should not ship uncoordinated multiple  
writers in Allmydata 3.0 because there are an unknown number of  
problems, as demonstrated by the fact that I just found two without  
even trying.".

So, problem 1 with the "update" method that you described (in the  
letter to which this is a reply) is that, after you read directory  
version N, and then write back directory version N+1, and someone  
else has also (previously -- not even at the same time as you!)  
written up a directory version N+1, and if the "root hash" of your N 
+1 happens to sort higher than the root hash of their N+1  
lexicographically, then you will not get any indication that there  
ever was their version N+1 -- instead your version N+1 will silently  
overwrite theirs.

Problem 1 is the one that we could fix with a couple of days of work  
(including redoing some of the manual testing that Peter and others  
have been doing for the last week).  You and I have previously  
discussed how we could use < instead of <= in the testv-and-setv in  
order to detect collisions of this kind.  I'm not sure what other  
changes we would need to make to the mutable file write protocol.

Problem 2 is that when you do the read-back after detecting a UCW, if  
the first 3 servers that you talk to all have your version N+1, then  
you will treat that as the "current" version N+1, generate a version N 
+2, and then upload you N+2, which was made without knowledge of the  
other person's N+1.  Problem 2 is the one that I think would take a  
few weeks to do right.  Ideally, your version N+2 should probably  
come with a set of root hashes of predecessors, and the test-and-set  
should say "This is the set of versions that I have already seen and  
am intending to supercede -- if your current version is one of them  
then please overwrite it with my new version.  If your current  
version is not one of them then please do not overwrite it, and  
return an error.".

You and I tried to solve this one before and did not yet come up with  
a satisfactory solution.

> For the benefit of the non-allmydata folks: we haven't yet implemented
> directory sharing in the .com product (and when we do, we're  
> planning to use
> directed pairwise one-reader-one-writer directories, which doesn't  
> suffer
> from this concern because it doesn't give a write-cap to the  
> recipient). So
> the main concern right now is a user who has an automated backup  
> process
> writing a lot of data into a directory, at the same time that this  
> user using
> a web browser (on a different tahoe node) to modify those backup  
> directories.
> As long as we continue in this approach (i.e. *not* taking  
> advantage of
> tahoe's easily-shareable directory capabilities), then a per- 
> account lock
> (respected by both the FUSE plugin and all web frontends) will be  
> sufficient
> to completely avoid UCWs.

Good summary.  This what I want to work out in detail with Mike  
Booker and you on the phone now.




> Or maybe even "laughfs" :).


I suspect that "laughfs" might be pronounced "laugh eff ess", but  
that "laugfs" will always be pronounced "laughs" (unless pronounced  
"l ow gufs").  ;-)

More information about the tahoe-dev mailing list