[tahoe-dev] A tahoe user's Wish List
kyle at arbyte.us
Sun Jan 17 19:23:35 UTC 2010
Having used tahoe for a little while now, I've come up with a wish list of
things that that would improve my user experience. Many of these may
already be tickets; I haven't browsed through the open tickets to look for
things like these. This looks like a top-ten list, but they're in no
1) I wish tahoe would save symlinks as symlinks. Sometimes I create a
directory full of symlinks pointing to other files, in order not to waste
disk space on copies of the files. If I tahoe backup such a directory and
my hard drive crashes, I can't restore my pile of symlinks as a pile of
symlinks. I'll get extra copies of the files instead.
2) tahoe backup dies on broken symlinks. I would prefer if it happily
saved the broken symlink (as a symlink).
3) In general, tahoe stops when it encounters a serious error. Often it
could and should keep going. For example:
- tahoe backup stops on a locked file or a broken symlink. Don't stop!
Just skip the one file, and tell me.
- tahoe deep-check --repair stops when it can't repair a file. Don't
stop! Skip only what's necessary, and tell me.
(There's no practical way for me to resume a repair operation
immediately "after" the bad file.)
- tahoe backup gives a fatal BackupProcessingError if it sees a file, but
the file is deleted before tahoe backs it up.
4) Error messages should always identify what file was being worked on
when the problem occurred. Sometimes they do, but often they don't.
5) I am nervous about synchronization between backupdb.sqlite and the
grid. As happened recently, if a large chunk of the grid goes down, some
files may go below their minimum number of shares. I could repair those
files by re-uploading them, but the backupdb assumes those files are fine.
So those files won't be reuploaded and they'll stay broken on the grid --
possibly forever. I'll think they're backed up, but they aren't! I'd like
to have some way for a deep-check to inject a dose of reality into my
6) For extra credit, I'd like behavior that's a hybrid between backup and
deep-check. Run a backup operation, but check every file as we go along,
and repair or re-upload the file if the grid doesn't have enough shares.
7) I'd like to be able to pass verify caps to deep-check on the command
line. (I heard that this is implemented, but not for the command line.)
8) I'd like to have better lease control. For example, I'd like to have
my Latest backup live "forever" but to assign garbage collection dates to
the earlier backups. This would allow me to keep, say, weekly backups for
the past month, and quarterly backups for the rest of the year. I would
expect to do my own scripting to make this happen. I realize I could
mostly get this with careful use of tahoe rm, but then my files are
inaccessible immediately, whereas I'd rather the older backups be allowed
to stick around until a garbage collection is run. "You may delete these
files at your convenience, but please keep them until you need the space."
9) I wish the prodgrid helper was working. I've never been able to use it
(not even once) since I started using tahoe.
10) I wish tahoe backup --exclude could accept a pathname, not just a
filename. (I would like to --exclude $HOME/foo/bar as a named directory,
instead of excluding files named 'bar' wherever they may exist.)
More information about the tahoe-dev