[tahoe-dev] safety and Tahoe Lock Files
warner-tahoe at allmydata.com
Tue Mar 4 03:43:32 UTC 2008
Interesting! I'll think more deeply about this later, for now I have just a
couple of smaller practical issues.
> > When you give someone a write-cap to a mutable file-or-directory, M1,
> > which you yourself are also intending to write into in the future,
> > you also give them a write-cap to a mutable Tahoe lockfile, L1.
Since we don't know whether we're going to eventually share a mutable file
write-cap or not, really this means we need to double the length of the
write-caps, so that they always contain both M1 and L1.
> > Thereafter, whenever you want to write to M1, you first read L1 to
> > see if it is currently locked. If L1 is empty (zero length), then M1
> > is currently unlocked.
> > To lock M1, you pick a random 32-byte string and write that string
> > into L1.
This will slow down the common case, since we always have to write to the
lock file, even if contention is unlikely. I'm not sure how much of a
slowdown it represents, but my hunch is between 3x and 4x. The non-locking
scheme costs one write plus if (there's a collision) the recovery time. This
locking approach costs one read, three writes, plus (if there's a collision)
recovery time plus an extra read. Writes to tiny mutable files (over DSL to
our automated perfnet) currently take about 324ms, and reads are 71ms, so
this would increase a dirnode update (which has an extra read) from 395ms to
1.1s, about 4x.
We already need to batch dirnode updates together because of their latency.
It would be great if we could find a way to avoid making them slower.
However, it may be unavoidable.
I'll think more about this tonight. The part that's bothering me right now is
that mutable files don't themselves provide atomic updates: that's part of
the joy of a distributed system :). So I'm not certain that seeing your own
lock value in L1 is a particularly reliable way to know that nobody else
grabbed the lock. After all, what if you just happened to contact a different
set of servers than the contender did?
But, I like the idea of using a sacrificial piece of data (the L1 contents)
to avoid risking your valuable data (the M1 contents).
More information about the tahoe-dev