[tahoe-dev] accounting ideas

Brian Warner warner-tahoe at allmydata.com
Tue Jul 22 01:48:12 UTC 2008


We've been talking about Accounting some more recently, and we think
we've made some progress (mostly due to Zooko's continual encouragement
to separate upload accounting, storage accounting,
garbage-collection/reference-counting, and checker/repairer stuff).

The new idea has the following properties:

 * uploading a file or creating a new mutable file (i.e. mkdir) can only be
   done by suitably authorized users, and the user's "account identifier"
   (i.e. a public key) is attached to the new object. The server can quickly
   and efficiently report how much disk space is being used by any user,
   indexed by this identifier. By adding a separate "pet name" table, the
   server's report can look like:

    account [0e6a2fb] "Bob" is using 723MB, in 1031 files and 345 directories

 * uploaders cannot avoid having their new files accounted to them: they
   cannot dodge their quota

 * once created, a file or directory is accounted to the uploader (Bob) until
   it is destroyed (by its reference count going to zero), even if Bob ceases
   to be able to reach the file. This could happen if Bob shares the file
   with Carol, then deletes his last reference to it. This is unfortunate,
   but tolerable, and there is a way to deal with it later.

 * reference counts are used to keep file/directory objects alive, one for
   each directory that references the object, and one for each user who wants
   to explicitly retain the object (e.g. as a rootcap). When the reference
   count drops to zero, the object is deleted.



The basic scheme is to separate "reference leases" (used to hold off garbage
collection) from "accounting labels" (used to figure out who is using up all
the space).

Each file/directory object has one or more reference leases. Anyone can
create a new lease on any object, and only the creator of the lease will have
the ability to cancel it later on. Each lease has a "lease secret", and the
storage server has two basic operations:

 add_or_renew_lease(storage_index, lease_secret)
 cancel_lease(storage_index, lease_secret)  ->  remaining_leases_p

For leases that represent user rootcaps, the lease secret will be the hash of
a long-term secret and the storage index. For ones that represent parent
directories, the lease secret wlil be the hash of the directory's writecap
and the object's storage index, so that any writer of the directory can
cancel the children's leases.

Clients are responsible for adding a parent-lease to a child when they add it
to a parent directory. They are responsible for cancelling the child's lease
when they remove it from a parent directory. They are also responsible for
detecting when a directory's refcount has dropped to zero, and recursively
decrefing its children. (note, there are some issues to work out involving
check-then-decref-then-recurse; it looks easiest to handle if we declare that
an orphaned dirnode will stay alive for at least a few minutes, so the client
can read it one last time so it can track down the children for decrefing).

These leases will have timestamps, but they won't be automatically cancelled
if they go too long without renewal. The timestamps are a hedge against bugs
or partitions that prevent all shares from being decreffed. If we suspect
this is happening in sufficient quantities (i.e. more bytes of garbage than
we want to keep around), we do a big batch of recursive lease-renewal calls,
then look on the servers for the shares that have old timestamps.


The accounting labels, in contrast, are associated with specific user
accounts: "Bob" and "Carol". Each is represented by a public key. These
labels are placed on shares by the upload operation, which, in an
accounting-enabled Tahoe grid, is performed through an account-specific facet
(Bob uses some mechanism to connect to the Bob facet, and thereafter all the
shares he uploads are given a "Bob" label).

These labels are tracked by the server in a table: every time a new file is
created, the size of the file is added to the "Bob" entry. Mutable files are
tracked too: when the mutable file grows by 50 bytes, Bob's account is
increased by the same amount. This way, the server has the ability to answer
questions about how much each user is consuming very quickly.

There will be an "add-label" operation, eventually, so if necessary we can do
a recursive walk of everything Bob owns and label it all. But we expect to
start by just relying upon the shares that were labelled at creation time.
The labels will also have a timestamp, which we can use in the future to
ignore old labels. "add-label" (with somebody else's label) and
"remove-label" are privileged operations.

My hunch is that we'll be able to fit all this lease and label information
into the existing share file format, since it would be great to avoid the
hassle of a version bump. It helps that the reference-lease does not require
separate renew and cancel secrets.

Issues to be dealt with:

 * we use shared secrets as renewal/cancel tokens, scoped to a single storage
   server (to prevent storage servers from obtaining authority on each
   other). When shares are migrated from one server to another, clients will
   need a way to update or otherwise deal with these out-of-date tokens.
 * if we end up using timestamps and expiration, we need to renew those
   timestamps, and we might want to delegate this work to some external party
   (a "renewal agent"). To do this safely, we need distinct tokens for renew
   and cancel, so that the renewal agent can't instead cancel the leases or
   labels. The original lease design used a hashing scheme that served one
   set of goals (being able to grant all your renewal authority to an agent,
   or all your cancel authority, or all authority for a single file, or just
   one-file renewal, or one-file cancel). The new design may not need to work
   this way, and we could derive the renewal token from the cancel token, and
   store less data in the share.

So far, we've come up with a few problems, but they seem solveable:

 * bugs or partitions result in garbage. There are three basic options:
   * 1: assume that partitions won't happen (i.e. they don't happen
        frequently enough to worry about, and most orphaned files get
        decreffed properly)
     2: tolerate the garbage that results (i.e. not enough space is
        consumed by orphaned >1-ref files to worry about)
     3: use timestamps, and find a comfortable tradeoff between renewal
        time and expiration time that gets:
        * low traffic (i.e. large renewal time)
        * low garbage (i.e. small expiration time)
        * high reliability (i.e. expiration time >> renewal time)
        Note that these are mutually exclusive goals.
 * if Bob uploads a file, gives it to Carol, deletes his own references to
   it, then Bob may be unhappy that he's still being billed for it, while the
   storage provider may be unhappy that Carol is not being billed for it. If
   we think this becomes a problem, we maybe able to convince Carol to
   perform a deep-label operation and add her label to it. This split
   lease-vs-label scheme does not provide suitable incentives to get Carol to
   want to do this (we'll keep your files alive if you let us charge you for
   that space). But, it is much simpler and feasible than the schemes that
   did align these incentives.

Note that storage quota and upload quota are not quite the same thing: if Bob
uploads the file, but later Carol lays claim to it, then Bob's on the hook
for the upload, but both of them are on the hook for the storage.

The tie-in we've considered with checker/repairer was that file-checking
wants to be done periodically and over all the files that you care about.
Coincidentally, lease/label renewal (in a system that uses expiration) is
something that wants to be done periodically and over all the files that you
care about. A deep-check-and-renew operation could do both at the same time.
But, we'd prefer to operation without mandatory renewal (since automatical
expiration feels dangerous), so the plan is to start without it and find some
way to estimate the size of the garbage, then not worry about expiration
until/unless that garbage grows too large.


cheers,
 -Brian



More information about the tahoe-dev mailing list