[tahoe-dev] a few thoughts about the future of leasedb

Zooko Wilcox-O'Hearn zooko at zooko.com
Wed Nov 21 16:36:42 UTC 2012


Folks:

I just reviewed (again), the design document for leasedb:

https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst

Leasedb is implemented and passes unit tests (#1818), and we're
currently working on merging it with cloud-backend (#1819).

This is exciting, because it is the next step in LeastAuthority.com's
project that we're doing for DARPA — Redundant Array of Independent
Clouds.


Here are a few comments on the leasedb design — not issues which could
block the acceptance of this patch, but just topics for future
reference:

 • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#design-constraints

   "Writing to the persistent store objects is in general not an
atomic operation. So the leasedb also keeps track of which shares are
in an inconsistent state because they have been partly written. (This
may change in future when we implement a protocol to improve atomicity
of updates to mutable shares.)"

   I'm not 100% sure, but I *think* that this use of leasedb could be
replaced in the future by the end-to-end 2-phase-commit that I
recently posted about (#1755). End-to-end 2-phase-commit requires more
complex service from the storage server than the current one-shot
updates to mutable files do, but it requires less state to be stored
in the leasedb since the equivalent state is now stored in the storage
backend plus the LAFS client. In E2E 2PC, the storage backend has to
be able to receive and store updates to a mutable file (including the
initial upload of a large mutable file, which is the same as a large
update to an initially empty mutable file), while retaining the option
of rolling back to the previous version. This means the storage server
has to write these updates into the storage backend in some
non-destructive way and then have a relatively efficient way to
"switch over" from the old to the new version.

   If the storage server is able to do that, then it might be nice if
it can do it without relying on state held in the leasedb, because
then loss or corruption of the leasedb won't result in the corruption
of any files.

   This isn't a big deal — the state related to this kept in leasedb
currently isn't expensive for the storage server to maintain, and loss
or corruption of leasedbs will hopefully be rare.


 • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#accounting-crawler

   "A 'crawler' is a long-running process that visits share container
files at a slow rate, so as not to overload the server by trying to
visit all share container files one after another immediately."

   Since I opened the following group of tickets, I've become happier
with the idea of removing almost all uses of "crawler", leaving as the
only remaining use of it to generate the initial leasedb or
reconstruct the leasedb in case it has been lost or corrupted. I'm
waiting for Brian to notice these tickets and weigh in: #1833, #1834,
#1835, #1836.

   This would change the leasedb design state machine by changing two triggers:

   - STATE_STABLE → NONE; trigger: The accounting crawler noticed that
all the store objects for this share are gone. implementation: Remove
the entry in the leasedb.

     This edge would still be here, but the trigger would be
different. There would be no crawler noticing such things, but this
edge would be triggered when a client requests a share, the storage
server looks in the leasedb and sees that the share is listed as
present, but then when it tries to read the share data it finds out
that all of the share data is gone.

   - NONE → STATE_STABLE; trigger: The accounting crawler discovers a
complete share. implementation: Add an entry to the leasedb with
STATE_STABLE.

     Likewise, this edge would still be here, but the trigger would be
different. There would be no crawler noticing such things, but this
edge would be manually triggered by the server operator using an
"import" tool (probably option 4 from #1835).


 • https://github.com/davidsarah/tahoe-lafs/blob/1818-leasedb/docs/proposed/leasedb.rst#unresolved-design-issues

   "What happens if a write to store objects for a new share fails permanently?"

   I don't understand. If an attempt to write fails, how can you
distinguish between a temporary and permanent failure?

   "What happens if only some store objects for a share disappear unexpectedly?"

   Log it, remove the share entry from the leasedb, and leave what's
left of the share data alone? Because perhaps operators or developers
want to investigate the exact shape of the lossage/corruption.

   "Does the leasedb need to track corrupted shares?"

   This is the same question as the previous one — a corrupted share
is the same as a share with some of its objects missing. In the design
I'm current envisioning — where we no longer have a crawler
discovering share-like things and trying to add them, and we rely on
the leasedb as the single source of truth for the presence *and*
absence of shares — then we don't need to track corrupted shares. Just
log it, remove the entry from the leasedb, and leave the remains of
the share for post-mortem analysis.


Regards,

Zooko

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1755# 2-phase commit
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1818# leasedb: track
leases in a sqlite database, not inside shares
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1819# cloud backend:
merge to trunk
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1833# storage server
deletes garbage shares itself instead of waiting for crawler to notice
them
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1834# stop using share
crawler for anything except constructing a leasedb
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1835# stop grovelling
the whole storage backend looking for externally-added shares to add a
lease to
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1836# use leasedb (not
crawler) to figure out how many shares you have and how many bytes



More information about the tahoe-dev mailing list