[tahoe-dev] Keeping local file system and Tahoe store in sync

Shawn Willden shawn-tahoe at willden.org
Tue Feb 3 04:46:26 UTC 2009

On Monday 02 February 2009 08:58:42 pm zooko wrote:
> Brian has been posting about this on the issue tracker, e.g.:
> http://allmydata.org/trac/tahoe/ticket/598

Thanks.  It looks like his approach is sufficiently different from mine that 
I'm going to just keep going as I am.

Key differences are:

1.  With mine, mirroring the directory structure is optional.  Not mirroring 
it should make backups somewhat more efficient (and initial backups much more 
efficient) because there's no need to create all those dircaps.

2.  Forward-difference increments.  This should make uploading small changes 
to large files very efficient.

3.  Backupdb may be stored locally OR in the grid.  I haven't gotten far 
enough to test it yet, but I think the performance hit for storing it in the 
grid should be pretty small.  It may be zero in some cases.

4.  Backup of metadata in addition to file contents.  Permissions, ACLs, 
resource forks, etc.  My ultimate goal is to be able to do whole-system 
backups and restores, so this is essential.

5.  Smart handling of hardlinks and symlinks.

6.  A focus on the issue of initial, large uploads.  A backup session can be 
terminated and resumed, and reasonable timestamping of backups is maintained 
to facilitate a future "Time Machine"-like view.

7.  A general focus on efficiency.  Not micro-optimization, but structuring 
the backup process to avoid re-scanning, to facilitate streaming uploads, to 
minimize creation of mutable nodes (i.e. dirnodes), etc.

I should probably write up another design post, because I've made some major 
changes since my initial thoughts, but I think I'll get back to hacking 
instead :-)

> I think we should start adding tahoe-dev at allmydata.org to the Cc:
> line on trac tickets that are likely to be of interest to readers of
> the list.

That's probably a very good idea, since most of us probably don't follow the 
Trac tickets closely.


More information about the tahoe-dev mailing list