[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"

tahoe-lafs trac at allmydata.org
Wed Jan 13 07:47:07 UTC 2010

#897: "tahoe backup" thinks "ctime" means "creation time"
 Reporter:  zooko                                    |           Owner:  warner   
     Type:  defect                                   |          Status:  new      
 Priority:  major                                    |       Milestone:  undecided
Component:  unknown                                  |         Version:  1.5.0    
 Keywords:  forward-compatibility docs tahoe-backup  |   Launchpad_bug:           

Comment(by warner):

 Ok, Zooko and I had a long discussion about this in IRC. There's a bit of
 tension between three goals:

  1. preserving information, even if it is confusing or badly labeled, so
     future developers can figure out where the timestamps came from
  2. not confusing busy developers by perpetuating ambiguous labels like
  3. hiding irrelevant platform details, making life easier for developers

 Goal 1 is about not trying to be too clever. The original problem here is
 that Python tries to be too clever and reports a windows os.stat field
 {{{ftCreationTime}}} in the underlying API) as {{{st_ctime}}}, the same
 that POSIX's st_ctime is reported. This decision was probably based on
 mistakenly believing that they have the same semantics, and a desire to
 irrelevant platform details from developers who shouldn't have to care.
 However, if they hadn't done that (i.e. report {{{st_creationtime}}} on
 windows and {{{st_ctime}}} on unix), then we'd have less-convenient but
 less-ambiguous os.stat results.

 Systems which try to hide details from developers can cause frustration,
 especially if the developers understand the quirks and foibles of the
 underlying system, because then the "helpful" intermediate layers are
 just getting in the way.

 To implement goal 1, we would copy all of the {{{os.stat()}}} fields into
 metadata as-is, and probably include an extra field (perhaps labeled
 {{{st_platform}}}) as a hint to cyber-historians who know better than we
 what os.stat returns on various platforms, and how to interpret it.

 Goal 2 would be accomplished by never using the word "ctime" in our
 even though it's used in two other places ({{{os.stat}}} return value, and
 POSIX's stat(2) call). Evidence suggests that the majority of developers
 believe the wrong thing about what POSIX's ctime means (and I've certainly
 been in this camp). So giving them a word other than "ctime" will either
 more meaningful (e.g. if we called it posix-metadata-change-time) or will
 force them to look up our actual definition (e.g. if we called it
 tahoe-bagel-kumquat and dared them to search webapi.txt for details).

 Goal 3 would be accomplished by using a common, easy-to-understand word
 "changetime" or "creationtime" for all platforms, despite whatever name is
 used by the underlying system call. POSIX and windows return "mtime"
 with (as far as I've been told) the same semantics. So it's probably fair
 say that the fact that (A: POSIX stat() returns it in st_mtime, while B:
 windows returns it in ftModificationTime or something) is an "irrelevant
 platform detail", and that developers lives are easier if this distinction
 hidden from them.

 So, as a compromise between these goals, we settled on the following keys:

  * unix: (st_platform, st_dev, st_mode, st_ino.., modification-time,
  * windows: (st_platform, st_dev, st_mode, st_ino.., modification-time,

 The synthetic "st_platform" key will contain {{{sys.platform}}}, so
 like "linux2" or "darwin" or "windows". The hope is that this is a cheap
 to provide some useful information to future developers and cyber-
 to interpret the rest of the st_* fields in some meaningful way.

 st_dev, st_mode, etc, will be copied directly from the os.stat call. Other
 attributes (perhaps platform-specific fields like OS-X's st_creator and
 st_type) will be copied here too.

 {{{modification-time}}} will be copied from st_mtime on all platforms,
 on the conclusion that it represents the same concept on all platforms:
 most recent time that the file's contents have been modified.

 {{{posix-change-time}}} will be present for files that came from a POSIX
 filesystem, and will be copied from st_ctime.

 {{{windows-creation-time}}} will be present for files that came from a
 windows filesystem, and will be copied from st_ctime.

 Having longer and more-detailed names for the ctime values will help with
 goal 2 (help developers correctly interpret this field). Not calling them
 "ctime" will help developers who would otherwise misinterpret
 {{{posix-change-time}}} as if it were the mythical "posix-creation-time"
 everyone really wants. We cannot provide goal 3 here, because there is no
 common semantic between POSIX and windows.

 (note for future discussion: some POSIX-ish filesystems do provide
 creation-time, in the form of OS-X's st_birthtime, and supposedly
 that ZFS offers. If we can determine that the semantics of these are the
 same, it could be argued that windows-creation-time should be renamed
 {{{creation-time}}}, and only populated on platforms that offer it, which
 would be st_birthtime from HFS+/OS-X, st_ctime on windows, and something
 on ZFS)

 (and note that, if we *cannot* determine that the semantics are the same,
 then we should probably refrain from trying to coerce them into the same
 field, lest we make the same mistake that Python's os.stat did, making
 more difficult for somebody in the future who is trying to figure out
 a given file's so-called "creation-time" was really the ZFS notion, or the
 HFS+ notion, or whatever).

Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid

More information about the tahoe-dev mailing list