[tahoe-dev] Tahoe Lock Files

zooko zooko at zooko.com
Tue Mar 4 00:20:47 UTC 2008


Folks:

We had a huge discussion on IRC and voice channels today about  
UncoordinatedWriteError.

When we designed Tahoe's mutable files, we explicitly decided to punt  
on the fearsome coordination/consistency problem in favor of offering  
excellent availability.  (Later, we were somewhat reassured when we  
saw the Amazon Dynamo paper which advocated something similar.)

We wrote "The Prime Directive of Uncoordinated Writes: Don't Do  
That!" [1], instructing programmers who use the Tahoe API to figure  
out some way to make sure that they never have two separate,  
uncoordinated processes trying to write to the same mutable file or  
directory.

But of course, Tahoe is gaining more use cases -- namely the MacFUSE  
layer thanks to Rob Kinninmont and the Windows SMB layer thanks to  
Mike Booker -- and we need a better solution than telling the upper  
layers to deal with it.

We sketched out three possible solutions on IRC and phone today:

1.  Make handling of colliding write more robustness (we need to do  
that anyway -- ticket #272), and rely on that for write coordination,  
but don't rely on it "too hard" -- warn the application programmer  
that it should not be used at high frequency (more than, say once  
every 30 seconds), or with many uncoordinated writers (more than,  
say, 3).

2.  Implement lock servers -- give out a furl to a server which you  
can talk to acquire a lock on a mutable file.  Include such furls in  
dirnodes, or otherwise make sure they are available when needed.   
Decide what to do when you can't reach that server.

3.  Use Tahoe storage servers as lock servers.  On the plus side, you  
know that enough of them are available (if you can write files at  
all), and using a bunch of them in a decentralized algorithm can help  
solve the availability problem with lock servers.  On the downside,  
this sounds complicated.  How would it work exactly?


Here is an idea does #3 -- use Tahoe storage servers as lock servers  
-- but in a nice simple way by re-using Tahoe mutable files as black  
boxes.  I fleshed out in my own mind after getting off the phone with  
Brian and Mike just now.  I like this one!  I propose to implement  
this, or something like it, and make the Tahoe client use it so that  
the coder working at the next layer up (MacFUSE or Windows SMB) can  
simply rely on magic coordination at the cost of an occasional delay  
if his client has to wait for other clients to finish.


TAHOE LOCK FILES

When you give someone a write-cap to a mutable file-or-directory, M1,  
which you yourself are also intending to write into in the future,  
you also give them a write-cap to a mutable Tahoe lockfile, L1.

Thereafter, whenever you want to write to M1, you first read L1 to  
see if it is currently locked.  If L1 is empty (zero length), then M1  
is currently unlocked.

To lock M1, you pick a random 32-byte string and write that string  
into L1.  If you get an UncoordinatedWriteError, then you read L1 to  
see if your string was the winner of the write collision, and if not  
then do an exponential back-off and then re-read L1 to see if it is  
locked.  If you don't get an UncoordinatedWriteError, or if you do  
but then it turns out that your lock string was the winner, then that  
means (modulo certain assumptions about the Tahoe storage servers  
that will be more carefully documented later) that you are the only  
one who has a write lock on M1.  Go ahead and write your new M1, and  
then write the empty string into L1 to unlock it.  You are not  
allowed to hold a lock for more than 300 seconds, but fortunately it  
almost never takes more than 300 seconds to write a mutable file.   
(If looks like it is going to take more than 300 seconds to write M1,  
then you need to acquire a new lock by writing a newly generated 32- 
byte lock string into L1.)

If you read L1 and find it locked, you remember the random lock  
string that was in it and set a 300 second timer, and then re-read  
L1.  If it still has the same random string in it, then this means  
that client who locked it has failed, and you are allowed to break  
the lock by overwriting it with your own random lock string.

That's it.


There are interesting robustness details that I know of, and no doubt  
Rob or Brian can come up with more interesting robustness details,  
but as a solution to the write-coordination problem, Tahoe Lock Files  
are simple and modular enough that it just might work.


Regards,

Zooko

[1] http://allmydata.org/trac/tahoe/browser/docs/mutable.txt? 
rev=2145#L415

tickets mentioned in this e-mail:

http://allmydata.org/trac/tahoe/ticket/272 -- "http://allmydata.org/ 
trac/tahoe/ticket/272"



More information about the tahoe-dev mailing list