[tahoe-dev] Keeping local file system and Tahoe store in sync
shawn-tahoe at willden.org
Wed Feb 4 02:11:33 UTC 2009
On Tuesday 03 February 2009 06:40:13 pm Brian Warner wrote:
> It's not fast, no.. in my experiments, hashing the whole disk is at least
> several hours, and sometimes most of the day. But I think we're both
> planning to use a cheap path+timestamp+size(+inode?) lookup table and give
> the user an option of skipping the hash when the timestamps are still the
Yes, except that I'd say the user has an option of forcing the use of the hash
even though the timestamps are the same, because by default if the metadata
matches, I don't hash.
If I could see a way to compress the hashing time further I would, but at
least on my machine, which I think is fairly typical, the scanning and
hashing is I/O bound, and there's obviously no way to avoid reading the data
> So, given a file on disk, you have to do almost the entire Tahoe upload
> process to find out what the eventual Tahoe readcap is going to be. This
> sounds like it's at odds with your plan to upload the "backuplog" before
> you finish uploading some of the actual data files. I'm not sure how to
> rectify this.
Hmmm. I knew I should have read that code... it's been on my to-do list for a
Yes, that does mess up the plan to upload the backuplog before the data. I
could still do it at the expense of increasing the size of the log, by
leaving the read caps out of the log entries and appending a table that maps
hashes to read caps, but that's unpleasant.
I suppose generating the full read cap early and doing the upload later could
still be a win for users whose machines have slow upstream connections.
Another option would be to put dircaps in the backuplog, but that would
require all those key pair generations, at least the first time.
This requires some thought.
Thanks for pointing that out though. I'm pretty sure that's the only
potentially-invalid assumption I'm making.
More information about the tahoe-dev