[tahoe-dev] weekly Tahoe dev call report: 2012-07-10

Wed Jul 11 15:17:10 UTC 2012

On 7/11/12 8:06 AM, Terrell Russell wrote:
> For the cloud version of leasedb, if EBS has problems, which it is
> known to have had, and a leasedb is corrupted/locked/lost - is that
> okay?

I think it's "okay enough". One of the leasedb design criteria is:

 DB can be deleted and regenerated (with no shares lost), given enough
 time (e.g. a month or two), and tolerance of some "floating garbage"

That means that an EBS explosion will invoke the same code path as a "rm
BASEDIR/storage/leasedb.sqlite", which is the same code path as what you
get the first time you boot up 1.11 or 2.0 or whatever version includes
the new leasedb code. The node will crawl all the existing shares and
add temporary "starter leases" for them, which wil keep those shares
alive until their actual owners come back for their periodic add-lease
(perhaps once every two weeks). After a month or two, the starter leases
will expire, allowing the floating garbage to go away.

This should tolerate a few leasedb failures per year without really
impacting operations, which seems like a reasonable amount to ask from
EBS. Especially if the sqlite file is backed up to S3 on a regular
basis.

davidsarah: hm, now I'm not sure that it'd be safe to restore a stale
leasedb from a backup: I'm thinking of a sequence where:

 1: there's a soon-to-expire lease on a share
 2: a backup snapshot is taken
 3: someone adds a long-to-expire lease
 4: leasedb is lost
 5: leasedb is restored from the backup snapshot
 6: the short lease expires, share deleted
 7: the long lease is still valid

So maybe backups *aren't* appropriate, or at least when we restore from
a backup, we should still add starter leases to all shares (but also
preserve the leases from the backup, for accounting purposes).

cheers,
 -Brian