[tahoe-dev] Storing large trees on the grid

Shawn Willden shawn-tahoe at willden.org
Thu Jan 29 05:33:06 UTC 2009


On Wednesday 28 January 2009 08:02:54 pm Brian Warner wrote:
> We don't have
> any answers yet, but I imagine that the "backupdb" mentioned in #598 (and
> #597) could include a table that maps from (devno, inodeno) to filecap

My in-progress backup tool notices hard links and doesn't bother uploading 
them more than once.  When it lstats a file, it checks the nfiles attribute.  
If nfiles > 1, it stores the inode and device number alongside the rest of 
the metadata, and tosses the dict like { inode : [ filenames ] }.

When it uploads files (right now I'm just copying them to a different place in 
the file system, not actually uploading), it again notices files that have 
nfiles > 1.  It searches a set of inodes to see if this hardlinked inode has 
already been uploaded.  If so, it skips it.  If not, it does the upload and 
adds the inode to the set.

However, Ben's question makes me wonder how this would work for him, because 
the inode dict and heap are in-memory structures.  It didn't occur to me that 
someone might have enough hard links to make those data structures too big. 
I'm not sure what sorts of limits dict and set have, nor how much overhead 
they have.

	Shawn.



More information about the tahoe-dev mailing list