[tahoe-dev] Use Tahoe as a real-time distributed file system?

Shawn Willden shawn at willden.org
Tue May 24 16:49:09 UTC 2011


On Tue, May 24, 2011 at 9:55 AM, Neil Aggarwal <neil at jammconsulting.com>wrote:

> Hello all:
>
> I have been reading the Tahoe docs and am a bit confused.
>
> I am looking for a distributed real-time filesystem.
>

Tahoe is a distributed file system.  Whether or not it is "real-time" is
somewhat debatable.  In particular, if you may have simultaneous writes to
the same mutable file (directories are mutable files), then Tahoe won't
work.   Some extra-Tahoe mechanism for serializing updates is required.


> Does Tahoe allow my to access it just like a regular
> filesystem?  For example, do I cd to a directory and
> list the files?
>

There are some FUSE modules that provide access to Tahoe through a standard
file system, but their quality is not high and there are some limitations.


> My network looks like this:
>
>  Colo 1                        Colo 2
> Server 1                        Server 5
> Server 2                        Server 6
> Server 3                        Server 7
> Server 4                        Server 8
>
> Colo 1 and 2 are separated by a large distance.
> The servers in each colo are on the same network.
> All servers will be running CentOS.
>
> Here is what I need:
> 1. Any server may create or modify a file
>        and changes should be immediately available
>        to the others.
>

As long as there's no chance of two servers trying simultaneously to modify
the same file or the same directory, Tahoe will do that.


> 2. I need to have no single points of failure.
>

Tahoe does this part very well.  This is one of Tahoe's main goals... not
only will there be no single point of failure, but the data will be spread
across the servers so that several of them could fail simultaneously without
affecting the availability of the data to the others.

Assuming you can work around the simultaneous-update problem, and assuming
that you can deal with the FUSE implementation imperfections (or with using
a different way to read/write data), and assuming that you're not looking
for extreme performance, then Tahoe will work.  I would suggest setting your
N (the number of shares of each file to create) to 8, the total number of
servers you have, and K (the number of shares needed to be able to retrieve
a file) to something less than 4.  If K > 4 then losing the connection
between the data centers will mean the data is unavailable until the
connection is restored.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110524/0aa3157d/attachment.html>


More information about the tahoe-dev mailing list