[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"

tahoe-lafs trac at allmydata.org
Wed Jan 13 02:22:14 UTC 2010

#897: "tahoe backup" thinks "ctime" means "creation time"
 Reporter:  zooko                                    |           Owner:  warner   
     Type:  defect                                   |          Status:  new      
 Priority:  major                                    |       Milestone:  undecided
Component:  unknown                                  |         Version:  1.5.0    
 Keywords:  forward-compatibility docs tahoe-backup  |   Launchpad_bug:           

Comment(by warner):


 == using ctime/mtime in backupdb ==

 So, first, let's make the docs (source:docs/backupdb.txt#L84) clearer,
 by replacing the reference to "creation time, and modification time"
 with just "ctime/mtime". The backupdb does not care about the semantics
 of these timestamps. All it cares about is having a cheap
 sometimes-false-positive proxy for detecting changes to file contents.

 In particular, I'm not worried about trying to avoid re-uploading in the
 face of user-triggered changes to metadata that doesn't actually change
 file contents. If someone does a "chown" or "chmod" or "touch" on a
 bunch of files, I think they'll accept the fact that "tahoe backup" will
 subsequently do more work on those files than if they had not gone and
 run those commands.

 So I think that comparing the (size/ctime/mtime) tuple (specifically the
 {{{(stat.ST_SIZE, stat.ST_MTIME, stat.ST_CTIME)}}} tuple) will serve
 this purpose, regardless of what {{{os.stat(fn)[stat.ST_CTIME]}}}
 actually means. We could change the backupdb to record more
 semantically-accurate fields, and fill in some but not others depending
 upon which platform we were using, but since we're only comparing this
 data against itself, I don't see enough value in adding that complexity.

 == putting timestamp metadata into backups created by "tahoe backup" ==

 As a separate issue, I guess I'm +0 on changing the metadata that "tahoe
 backup" creates to have more accurate names. Thanks to the patch from
 #628, "tahoe backup" is actually the only place that even reads local
 filesystem metadata (i.e. {{{find src -name '*.py' |xargs grep os.stat}}}
 is almost all tahoe internal files). "tahoe backup" currently
 does the simplistic thing of copying {{{stat.st_ctime}}} into
 {{{metadata["ctime"]}}}, etc.

 I'm not sure how to value timestamps (or other metadata) in backups.
 When you restore from a backup, do you expect all of the files to have
 the same creation/modification timestamps as they did on the original
 disk? The same permission bits? The same owner? The same inode numbers?
 The same {{{atime}}}? (I'd guess a survey would show users expecting
 these properties in descending order, from like 70% or users for
 timestamps to 1% of users for atime).

 But I think most users of a "tahoe cp" tool would expect the
 newly-generated local files to have all timestamps set to the present
 moment (as /bin/cp does), and for permission bits/owner to be set by the
 current umask setting/login.

 Other tools that I use for backup purposes (like version-control
 systems) don't record this metadata, because it doesn't generally make
 sense to restore it (when I do an 'svn update', I really don't want the
 timestamps of the newly-modified files to wind up in the past, because
 then my builds will get messed up. Likewise, changing the mode bits,
 other than sometimes the execute bit, is probably a bad idea).

 So this suggests that we'd need a special "tahoe restore" (or maybe an
 option on "tahoe cp", like /bin/cp's --preserve) to use this extended
 metadata. And then, if we had that, it would make sense for "tahoe
 backup" to record more accurate information about platform-specific
 timestamps, such that "tahoe cp --preserve tahoe:backups/Latest
 ./local-restore" could take your Unix-generated backup and copy it onto
 your windows box and reset as much metadata as made sense.

 Eh, I dunno.

 Incidentally, part of the "timestamps are unimportant" philosophy
 described above is embedded in "tahoe backup"'s design: if the local
 timestamps have changed but file contents have not, we won't upload
 anything new, so the backup snapshot will continue to have the same
 timestamps from the original upload. This may mean that you shouldn't
 put too much trust in the tahoe-side timestamp metadata anyways. We
 could change this to upload more frequently, but personally I prefer the
 performance wins of sharing directories between snapshots.

Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid

More information about the tahoe-dev mailing list