[tahoe-dev] a few thoughts about the future of leasedb

Mon Nov 26 18:33:06 UTC 2012

On Wed, Nov 21, 2012 at 10:42 AM, David-Sarah Hopwood
<david-sarah at jacaranda.org> wrote:
>
> Remember that those share states are needed anyway to avoid race conditions between adding and removing shares. There are no additional states just to support marking of potentially inconsistent shares.

Good point. If we had #1833 and #1834, then wouldn't those race
conditions be solved more simply with an in-memory serialization in
the storage server?

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1833# storage server
deletes garbage shares itself instead of waiting for crawler to notice
them

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1835# stop grovelling
the whole storage backend looking for externally-added shares to add a
lease to

> Also, clients will need to support non-leasedb servers for a while. (I'm looking forward to the point where they can drop that support, since it will allow deleting the rest of the code that implemented renewal secrets.)

Good point. Maybe we should open a ticket to remind us to reconsider
when to drop support for that.

> I'm still quite keen on my suggested variation of option 3 on #1835, let's call it 3a):
>
> # If [a share that has been added directly to backend storage] is ever requested, the server could then notice that it exists and add it to the leasedb. In that case, doing a filecheck on that file would be sufficient.
>
> I think you didn't want to do that because you thought there would be a performance advantage in treating the leasedb as authoritative. But the check for whether a share is on disk when it isn't in the leasedb is an uncommon case, and does not affect performance in the common case. (It shouldn't matter if servers take longer to report that they *don't* have a share, because a downloader should use the first k servers to respond. Actually, I think the current downloader might be waiting longer than that, but if so, that is easy to fix.)

Hm. I'm not sure about this. I wasn't thinking only of the
"performance" from the client's point of view, of how fast they can
download a file, but also of the "load" from the server's point of
view, of how much service they can deliver using one disk. I.e., how
many clients they can serve at once.

One of the tightest bottlenecks on that sort of service is probably
number of disk seeks, which are slow and serial, so if it is common to
receive queries for "do you have shares of X?" when the server doesn't
have any shares of X, then it might be a significant extra load for it
to have to go check its filesystem to make sure that it doesn't.

In general, if there is a *lot* of constant background noise of
clients asking if a server has shares that it doesn't have, then I
would be unhappy with translating that load into a constant churn of
disk seeks, and would much rather answer those queries just out of the
leasedb. On the other hand, if those are rare in practice then it
might be worth it to offer a way to import shares that is relatively
easy for the operator to do, because it doesn't require the use of any
extra tool besides "cp".

At the moment I am really liking the idea of erring on side of best
efficiency: relying on leasedb as much as possible to reduce the load
and the time. I'm also liking the conceptual simplicity of saying "We
rely solely on leasedb for metadata. Filesystem is never looked at
except when we know that we want data and leasedb says that the data
is on the filesystem.".

The cost of that is that the server operator who wants to "import"
shares has to run a special "share importing" tool that updates the
leasedb. That doesn't sound like a big cost to me, possibly because I
don't see manual shuffling of shares among servers as a common use
case.

Regards,

Zooko