[tahoe-dev] notes about the pycon paper

Fri Mar 14 10:34:58 UTC 2008

Dear Brian:

http://allmydata.org/~warner/tahoe.html

Way to go on the pycon paper!  Here are a few notes.  They may sound  
negative, but only because I'm not taking the time to mention all the  
positive things -- all of the paper that I've read so far is chock  
full of good stuff.  Here are a few minor complaints.

  * zfec isn't just a Python wrapper around Rizzo's fec library -- I  
also changed the fec implementation itself in C.

  * Some people will probably assume that the word "DHT" implies  
scalable algorithms.  They may subsequently be disappointed if they  
learn that Tahoe's doesn't have a scalable DHT.

  * Something about the term "Virtual Drive" bugs me, but I can't  
quite put my finger on it.  I conceive of a "drive" as being a  
container holding a monolithic bundle of data which is accessed  
through a single mount point (which is a "drive letter" on Windows).   
This just doesn't fit with my conception of the Tahoe decentralized  
filesystem.  I call the middle layer "the decentralized filesystem  
layer".

    Hm, except that I see that my terminology of "decentralized  
filesystem layer" is broader -- I sometimes think of the mutable and  
immutable slots as being part of the "filesystem", but you  
distinguish those from the "virtual drive layer" and call them the  
"DHT layer".  I guess the part that you call the "virtual drive  
layer" is what I call "directories".  :-)

  * "Each client stores a specific 'root directory'".  This is not  
strictly true -- the "virtual drive" (or "directories") layer has no  
conception of root directories; root directories are a concept of the  
application layer, and not all apps have them (wui's don't and cli's  
don't unless manually set up to do so).

    I think this is actually related to your and my difference of  
terminology about "the vdrive layer".  You define a "vdrive" (in the  
pycon paper) as the transitive closure of the filesystem which is  
reachable from a given directory (called a "root directory").  I  
think this notion is appealing but misleading -- thinking of the  
transitive closure of directories and files from a certain root  
directory as being in a single container is likely to confuse you,  
because other people can have some of those same directories and  
files in their containers.  Furthermore, you don't have only one root  
directory -- you sometimes want to consider the transitive closure of  
directories and files reachable from other starting directories, so  
therefore you can have some of those same directories and files in  
other containers yourself!

    Most people would be confused if there were two different  
"drives" (i.e. two different drive letters on their Windows machine)  
which had some of the same files and directories inside them.  (Note:  
I mean the same files and directories -- not identical copies, and  
not shortcuts or symlinks.)  People think of "drives" as being  
separate, monolithic and tree-structured (plus shortcuts/symlinks).   
Tahoe is not like that.  There are no "drives" in Tahoe.

    Obviously if you forbid all sharing and you never mount any  
starting directory but a distinguished one, then you can think of the  
transitive closure of files and directories as being a "virtual  
drive", but this is not a layer of Tahoe -- this is one possible  
application of Tahoe, and not a very good one.

    Actually we have a design for an application which allows sharing  
but which deliberately breaks shared links to mutable objects by  
performing deep copies whenever the user drags a node out of a  
friend's drive.  I'm not sure, but I think maybe in that hypothetical  
application, then there would be a virtual drive.  But again in  
Tahoe, there is no virtual drive layer.

Regards,

Zooko