[tahoe-dev] down with filesystems! up with the web! -- Re: [tahoe-lafs] #776: users are confused by "tahoe rm"

Zooko Wilcox-O'Hearn zooko at zooko.com
Mon Dec 28 19:59:38 UTC 2009


On Sunday, 2009-12-27, at 19:19 , Shawn Willden wrote:

> Indeed, the files are just as much deleted as they are in any Unix  
> file system.  The only difference is that in a Tahoe grid garbage  
> collection is much slower (really slow if the storage nodes have GC  
> turned off).

It's true that this same issue is present in any unix file system,  
but the speed of garbage collection is not the only difference.  An  
important difference is that every unix filesystem disallows hard  
links to directories.  (An exception that proves this rule is that  
Apple recently extended HFS to allow hardlinks to directories, but  
only with some specific limitations intended to prevent cycles, and  
only to support Time Machine backups.)  Also non-unix filesystems  
such as Windows and pre-unix Mac disallow hardlinks to directories,  
and even hardlinks to files.  This makes me suspicious that the  
designers of those systems had good reasons for this, and the fact  
that Tahoe-LAFS gaily allows hardlinks to any object is probably an  
example of fools rushing in where angels fear to tread.  That is:  
users are inherently confused by a "path-based filesystem"  
abstraction or a "folders-and-documents" abstraction built on top of  
an arbitrary directed graph.  The most successful filesystem products  
try to hide the arbitrary graph layer as much a possible, where Tahoe- 
LAFS tries to expose it as much as possible.

Further cause for concern: many Unix users, even "power users", try  
to avoid the use of hardlinks whenever possible, considering them a  
confusing and error-prone feature.

Pretty gloomy picture.  But there is hope: The Web!

Suppose instead of thinking of their Tahoe-LAFS-hosted files and  
their Tahoe-LAFS directories as being part of a "folders-and- 
documents" abstraction, and instead of them being part of a unixy  
path-based "filesystem", they thought of them as a collection of web  
pages which could have hyperlinks to one another.  Then there is no  
more "impedance mismatch" between the abstraction in the user's head  
and the underlying graph structure.  No user is ever surprised that  
multiple web pages can point to the same web page, or that following  
a series of hyperlinks can take you in a circle.  No software  
intended for the Web assumes that the set of web pages that it will  
visit forms a perfectly hierarchical tree structure without cycles or  
converging links.

Basically, the Web has proven to be both a more powerful and a more  
user-friendly abstraction for managing collections of documents than  
the old path-based filesystem abstraction or the old folders-and- 
documents abstraction.

Regards,

Zooko

P.S.  My brother Nejucomo says that it should be named "tahoe unlink"  
instead of "tahoe rm".  I think that that would be a good usability  
improvement.

P.P.S.  My wife Amber says that the only reason people limited  
filesystems to a tree structure is that many important algorithms  
that you might want to use on your filesystem would be inefficient on  
non-tree structures, but now that the Web has formed itself as a non- 
tree structure we have been forced to develop heuristics and work  
around such inefficiencies anyway.

P.P.P.S.  See Mark Bernstein's blog entry on how Engelbart's vision  
for what is now The Web had a hierarchical principle and Nelson's had  
that "everything is deeply intertwingled": http:// 
www.markbernstein.org/Feb0301/Engelbart.html



More information about the tahoe-dev mailing list