[tahoe-dev] [tahoe-lafs] #796: write-only backup caps

Sat Aug 22 23:55:34 UTC 2009

#796: write-only backup caps
--------------------------+-------------------------------------------------
 Reporter:  warner        |           Owner:           
     Type:  enhancement   |          Status:  new      
 Priority:  major         |       Milestone:  undecided
Component:  code-mutable  |         Version:  1.5.0    
 Keywords:                |   Launchpad_bug:           
--------------------------+-------------------------------------------------
 David-Sarah Hopwood points out an even more interesting
 direction to take in a recent tahoe-dev posting:

  http://allmydata.org/pipermail/tahoe-dev/2009-August/002653.html

 The goal is to have one cap (used frequently and stored online)
 to do write-only backups, and a different cap (used only for
 recovery and stored offline) to perform the reads. The effect
 would be close to that of the Mac OS-X shared public "Drop Box"
 folder, or of GPG-encrypting a piece of data to a private key
 that is held offline: normally a one-way operation, but when you
 need to, you open up the vault and pull out the decryption key.

 This would be pretty cool. This ticket is to sketch out what the
 crypto layout would look like. #795 (append-only files) will be
 a starting point, and there will certainly be an asymmetric
 encryption/decryption keypair involved.

 From the UI point of view, you'd have some sort of magic
 append-only no-reading directory cap, which you keep in your
 private/alises table. There would be a corresponding
 read-everything cap (or maybe just the full-fledged writecap;
 these could be stored separately), which you keep in a vault and
 only type in to test the system and to recover data. Then you
 type "tahoe backup ~ backup-appendonlycap:", and you expect that
 this unreadable "backup-appendonlycap:" object will acquire
 another child, with a timestamp name that is hopefully (but not
 guaranteedly) unique.

 You might also like the unchanged-directory-sharing properties
 of "tahoe backup" to keep working, so that you don't spend a lot
 of time or disk on things that haven't changed. I don't know if
 it's possible to accomplish this without recording some
 information which would violate the no-reading properties of the
 parent. This would probably be easier to pull off if we have
 immutable directories (#607). I suspect that you'll still have
 to read and hash your whole disk, and generate the CHK
 identifiers, and then discover that they're already uploaded. So
 you might save the storage space and the upload bandwidth, but
 not the local disk IO.

 (hm, so the current backupdb would record the uploaded filecaps,
 which starts to violate the goals once the original file gets
 deleted and the backupdb doesn't also delete the stored filecap.
 But if your local filesystem allows you to attach metadata to
 the files you're backing up, then just attach the tahoe filecap
 and a ctime/mtime/filesize snapshot to the original file, so the
 filecap dies with the file. The backup process would look for
 this metadata, compare the ctime/mtime/size snapshot to decide
 if the cached filecap is stale, then upload or not. This would
 be pretty slick, actually, and I think several modern
 filesystems let you attach this sort of metadata (HFS+ for one).
 If you can attach metadata to directories, then you write the
 verifycap of the immutable dirnode last used for that directory:
 on each new backup, you figure out the new dirnode contents,
 hash them into the CHK key, hash *that* and compare it against
 the verifycap, if they match then boom now you have the dirnode
 readcap for going up to the parent, if they don't match then you
 must upload the new version of that dirnode. This avoids keeping
 the old dircap cleartext around. The only remaining security
 issue is that you'd be keeping the individual filecaps around
 for old versions, until the next "tahoe backup" process came
 along and replaced them, but this is a much smaller exposure
 than the dirnodes. It would leak the following information: if
 an attacker gets a copy of your disk at time T=2, they might be
 able to learn the contents of modified-but-not-deleted files
 that we previously backed up at time T=1.)

 It's probably ok for the "tahoe backup" process to upload files
 and create directories, generating temporary caps which it is
 obligated to forget after the top-level append operation. If the
 whole backup is created out of immutable objects, the only
 mutable slot is the top-most timestamped-version holding
 directory, and that's where the append-only operation would be
 used.

 I'm trying to imagine if it would make sense to add an
 "append-only" or "write-only-no-reading" column to the dirnode
 table (to provide something like "transitive append-only-ness").
 I'm not even sure if that's sane, so I'll put off thinking about
 it until later. (if you can't read, is "transitive" even
 defined?).

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/796>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid