[tahoe-dev] Storing large trees on the grid

Benjamin Jansen tahoe at w007.org
Wed Jan 28 02:35:04 UTC 2009


Hello,

I have a local tahoe node that is configured for no local storage and  
is attached to the allmydata.com storage grid. I am attempting to copy  
my BackupPC backend storage to the grid, so that I have an offsite copy.

I executed "tahoe cp -r -v . bupc:backuppc" a while ago... probably  
close to a week. After days and about 1.3M lines of "examining N of N"  
output, it said:

attaching sources to targets, 0 files / 1 dirs in root
targets assigned, 160300 dirs, 1058696 files
starting copy, 1058696 files, 160300 directories

Right now, it claims that it has copied about 50K files and 7700  
directories. If things keep going as they are, that means I have about  
5 months remaining. I'd rather not wait that long. :) I have a  
synchronous 15Mbit internet connection; most of the time, when I watch  
a graph of traffic at my router, it's sitting at < 5KB/sec out. So,  
the bottleneck is definitely not my connection.

Based on my understanding of BackupPC's backend storage, most of those  
million files are hard links. Knowing what BPC is backing up, I'd say  
10-20% are unique files. Does "tahoe cp" recognize hard links and copy  
them as such?

I thought about uploading a tarball instead of each file. The nature  
of what I'm storing makes it unlikely that I would want to access an  
individual file, anyway. However, my understanding is that tahoe  
currently cannot store a 156GB file. Is that correct?

I'd appreciate any advice on how I can speed this up - I'd like for my  
line to be the bottleneck. ;)

Thanks,
Ben



More information about the tahoe-dev mailing list