[tahoe-dev] Thinking about building a P2P backup system

Fri Jan 9 03:53:58 UTC 2009

> I'd be shocked if encryption were a performance problem. Crypto stuff has
> been my day job for over a decade, so I'm well aware of how blisteringly
> fast AES is, and RSA isn't too bad as long as you're not doing too much of
> it (especially if you're doing mostly public key ops, not private key). I'd
> expect a lot bigger performance issue from the erasure coding (BTW: ever
> considered Tornado coding instead of Reed-Solomon?).

Incidentally, Tahoe will tell you how long the various parts of the
upload/download process took. The front webapi page has a link to "Recent
Uploads and Downloads", and each operation gets a separate page there. For
upload, that page includes the time consumed by encryption (AES) and encoding
(zfec). Sampling our prodnet right now, it looks like zfec is encoding at
8MBps, decoding the primary shares (i.e. joining strings) at 1.9GBps, and
decrypting at 7MBps. The real speed problem is the number of round trips
required by our not-yet-optimized download process, and the way our download
process sticks with the first servers to respond, even if they turn out to be
slow or stalled. The next layer of speed problems will probably be the
time spent by our encrypted transport layer ("Foolscap") to serialize
the share requests and responses, and the time spent by our
single-threaded python-based storage servers to grab shares off the
disk.

> I'm thinking about bandwidth, both being able to rsync changes -- important
> because most home users' net connections are very asymmetric -- and to
> avoid hitting the network at all in the "Mom browsing my photos" case.

As Francois pointed out, we'd love to use something clever like rsync, but at
least the basic rsync algorithm assumes that one end has fast random access
to the file. We could pre-compute some hashes and store them next to the
file, but I don't think we could usefully do the fast rolling-checksum that
rsync wants.

rsync is usually cited as a quick way to modify an individual file.
We have some designs sketched out for mutable files that can be updated
efficiently, based upon Mercurial's 'revlog' format (which I think Git uses
too), but we haven't had the time to build any of them yet.

The other use of rsync is to update a directory, where you believe that many
of the files are already in place, and that only requires you to store a hash
of the file or some other identifier, and to generate the corresponding hash
locally. Tahoe basically does this already: the client hashes the file and
computes a "storage index", which determines both the order in which the
servers are queried and the index of the storage slot that will be accessed
on those servers. If a client proposes to upload a share that already exists
on the server (i.e. it already has a share for that storage index), the
client will bypass uploading that share and will use the pre-existing one
instead).

The directories themselves are just serialized lists of (childname,
childcap, metadata), stored in a mutable file. It turns out that we
spend about 10% of our recursive-tree-walking time just parsing this
structure, and it has no B-trees or hash tables or anything, so there's
a lot we could do to improve lookup speed for large directories.

We've also discussed creating a local "backupdb", for use by a "tahoe backup"
command, or an extension of some sort. This backupdb would map pathname to
size/timestamp/hash/filecap, and could be used by an insufficiently paranoid
user (one who believes that any change to a file will modify either its size
or timestamp) to avoid even the hashing pass. This is the sort of backup tool
that I want for my own files: I'd like the "null backup" pass to take the
same amount of time/effort as a 'find /' operation.

> I haven't had a chance to look through the code much yet. Is there an
> overview document somewhere that covers the structure?

The docs/architecture.txt is the best starting point, as well as the other
design notes in docs/ . Those are more design/arch docs than codemap, though.
The filenames are pretty close to their functions.. all the immutable-file
stuff is in src/allmydata/immutable/ , all the CLI tools are in
src/allmydata/scripts/ , and all the webapi code is in src/web/ .

cheers,
 -Brian