[tahoe-dev] tahoe backup re-uploads old files

Greg Troxel gdt at ir.bbn.com
Thu Mar 1 19:27:45 UTC 2012

Brian Warner <warner at lothar.com> writes:

> Yeah, that's a fair argument. I built "tahoe backup" because it seemed
> the best way to take advantage of tahoe's unique features. The
> orthogonal way to handle backups, as implemented in a zillion existing
> programs, generally expects a POSIX-like backend filesystem. Tahoe is
> both more and less than that:
> * it has immutable files and directories, which can safely be shared
>   between subsequent backups
> * modifying files is expensive, and new files should be written
>   all-at-once

True, but I wonder if that means that a tahoe-specific backup program is
needed, or just one that uses a mostly-posix filesystem in a careful
way, so that it's reasonably efficient for a class of filesystems.

> * tahoe files need to be checked/repaired/renewed every once in a while

That's what deep-check is for, and I don't think it needs to be part of
the backup program.

> Using "cp -r" into a FUSE-mounted Tahoe filesystem would miss all of
> this: each pass would try to re-copy pre-existing files (unless you
> build a backupdb to avoid it), each pass would duplicate existing
> directories,

That's true, but it's also an argument why 'cp -r' to an external HDD is
not a good backup scheme.

So I do think a backup program that is aware of the
write-file-don't-change-them notion and the sharing-of-existing-file
notion is needed.

> and the FUSE layer would add a lot of overhead. (I've never
> really been content with FUSE-over-Tahoe, it basically works, but the
> impedance mismatch is just too great to make it a happy experience).

I'm not convinced of the FUSE overhead claim, but I think part of the
concern is that we don't have a first-class FUSE implementation -
playing with py-filesystem is on my todo list.  I've seen people run
glusterfs (on Linux and NetBSD), complaining about TCP performance
because they only get 40 MB/s instead of 75 MB/s (through FUSE), and
then get 75 when the driver bug is fixed.  tahoe's speed seems slow
enoguh that it's hard to believe that fuse would slow it down much.

> Of course, it's also there because of historical Tahoe's origins in a
> backup-centric company.

a fair point

> FWIW, "tahoe backup" is basically a standalone program that speaks the
> tahoe webapi to achieve backup tasks, that just happens to use bin/tahoe
> as an entry point, and is distributed along with the rest of tahoe. With
> some architectural changes, it could be a plugin (sort of like how "git
> foo" vectors off to a program named "git-foo", so adding shallow plugins
> is as easy as dropping a git-foo executable into your $PATH). If you
> were to write an independent backup program that took advantage of
> tahoe's unique features (instead of targeting a POSIX filesystem), it
> would probably look a lot like src/allmydata/scripts/tahoe_backup.py .

I didn't know that, but the command-line integration wasn't the root of
my complaint - it's the use of a fs-specific interface when I haven't
convinced myself that it's really necessary.

> There are some other, similar tools that I'd like to have: "tahoe
> mirror" to do one-way syncing of local-fs to tahoe-fs, "tahoe sync" to
> do a bidirectional sync (ala Dropbox). And then I'd like "tahoe backup"
> to be more integrated into the tahoe daemon (or into an "agent", as we
> discussed at the last Summit), to be run periodically and safely without
> me having to set up a cronjob for it. And "tahoe sync" could be driven
> by inotify/fseventsd-style events. But, I'd expect to need to make
> similar arguments about why such features should go into Tahoe itself,
> rather than being implemented in standalone tools, before putting
> serious time into writing them.

Interesting points, and someday we will both have enough copious spare
time to discuss....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120301/d37e969e/attachment.asc>

More information about the tahoe-dev mailing list