[tahoe-dev] The airspeed of a disk-laden swallow (African)

Shawn Willden shawn-tahoe at willden.org
Mon Jun 1 02:16:35 UTC 2009


Has there been any discussion or thought about how hard it would be to 
bootstrap Tahoe storage nodes via sneakernet?

I'm finding myself putting a huge amount of effort into how to deal with 
problems of files "aging" between the time it's recognized that they need to 
be backed up and the time they actually get uploaded.  This "bootstrap" 
problem for a backup is a really big one given the amounts of data that most 
people I know have, and the bandwidth that most of them have.

Since my backup work is focused primarily on "friendnets", sneakernet may be a 
very viable option.  "Never underestimate the bandwith of a station wagon 
loaded with tapes hurtling down the highway", and all that.

If I could mount a 2 TB drive on my machine, generate 3-of-10 shares for 600 
GB of data and write them to the drive, and then take it around to my 
friends' homes copying the right 200 GB for each of them and appropriately 
registering it with their Tahoe node, I could eliminate over two *years* of 
uploading at my upstream data rates.  Having bootstrapped the storage nodes 
that way, I calculate that my connection could easily keep up with daily 
backups, even without using compressed deltas, and without a helper.  
Occasionally I might get a big surge in new data, causing it to take a few 
days (or weeks) to catch up again, and perhaps once in a while I might have 
to do the sneakernet thing again, but probably not.

Has anyone thought about this, or what would be involved?

On the client end, it seems like Tahoe could go through the normal process of 
identifying storage servers, generating shares, etc., but then simply write 
them to disk somewhere rather then uploading them, each share somehow labeled 
with the intended destination storage server.  Theoretically, it could even 
be mixed -- if sending to server foo, bar or baz, then write to this disk, 
otherwise transmit normally.

On the server end, either the data could be placed directly into the storage 
area, with whatever necessary bookkeeping updated, or else a simulated client 
could push the data to the server, but at IPC (or at least LAN) speeds.

Crazy idea, I know.  It sure would solve a lot of my problems, though.

Any thoughts on how difficult this would be to implement?

	Shawn



More information about the tahoe-dev mailing list