[tahoe-dev] How do lease renewal and repair work?

Brian Warner warner at lothar.com
Tue Jan 11 19:33:27 UTC 2011


On 1/11/11 7:19 AM, Shawn Willden wrote:

> Specifically, what I'm wondering what happens if a client running a
> deep-check --repair --add-lease tries to add a lease on an existing
> share and the storage server refuses the lease renewal?

That part of the code needs some work, on both sides. At present, the
storage server will never refuse a lease renewal, even if the server is
in readonly mode (server.py StorageServer.remote_add_lease and the
unused remote_renew_lease). And the client will ignore failures in the
remote_add_lease call (immutable.checker.Checker._add_lease_failed),
partially because older versions of the server didn't support
remote_add_lease, and we want repair to work anyways.

> Will the repairer assume that the unrenewed share needs to be placed
> somewhere else? Or will the client have to wait until the unrenewed
> share actually expires before the repairer will place another copy?

The current repairer won't notice a renewal-rejection. Since the share
will stick around until expiration actually kills it, the repairer won't
do anything special until it expires, at which point it'll create a new
share as usual.

We should change this, especially w.r.t. Accounting, since leases are
the basis of storage-usage calculation. Servers should reject lease
add/renewal requests when in readonly mode, or when the Account does not
allow the claiming of additional space. The upload-a-share and
add/renew-a-lease calls should be expanded to allow requesting a
specific duration on the lease (defaulting to one month, as before).
When repairing a file, the client should not be happy until all N shares
have a lease that will last at least as long as the client's configured
lease-duration value. We might need a "please tell me how long lease XYZ
will last" request. If a renewal request is rejected, and the existing
lease will expire too soon, the repairer should upload additional shares
to other servers.

> The motivation is that I'm thinking about how storage nodes can
> withdraw gracefully from the grid. If the storage servers can refuse
> to renew leases and if the repairer assumes that unrenewed shares need
> to be placed elsewhere, then it should be very simple to create a new
> storage server configuration flag "withdrawing", which tells the
> storage server to refuse new shares, and also to refuse lease
> renewals. Then, with lease expiration turned on, all of the shares
> it's holding will eventually expire, but all of the clients who own
> those shares will have ample opportunity to relocate them. When the
> last of the withdrawing server's shares expire, then it can be shut
> down.

That sounds like a great approach. Maybe "retired"/"retiring"? We've
also used "spin down" and "decommission" to describe this state in the
past. We've also kicked around the idea that storage servers should be
able to "abandon ship" on their own: upload their shares directly to
other servers (do the permuted-list thing on their own, remove
themselves from the result, find the best place to evacuate the share
to, upload the share, then delete their local copy). This could only
work for immutable shares, since the mutable write-enabler gets in the
way as usual, and it might interact weirdly with Accounting (the old
server would effectively be "paying" for the new share until the real
clients established new leases and took over ownership). But it'd
probably be more efficient: the share already exists, so no need to
re-encode the file.

cheers,
 -Brian



More information about the tahoe-dev mailing list