[tahoe-dev] Any project related to network security in Tahoe LAFS project

Greg Troxel gdt at ir.bbn.com
Sat Oct 9 15:01:31 UTC 2010


Rahul Golwalkar <rahulgolwalkar at gmail.com> writes:

>     As a part of our curriculum we are supposed to contribute towards an
> open source project. So, can anyone suggest a Network Security  project
> available in Tahoe-LAFS which requires attention.
>     I have read most of the material related to Tahoe-LAFS available on the
> site.

I'm not Zooko, but as a semiregular ranter I'll offer my $0.02:

* availability in the face of flaky servers and networks

tahoe-lafs does very well at placing multiple shares of data and being
able to reconstitute the original data.  However, the code is not
currently robust against servers that appear to be present but are
flaky, and not entirely robust in the face of servers that come and go.
integrity and confidentiality can be achieved through crypto, but
availability is much harder.

This project involves taking a system-level look at the issue of
availability under two assumptions: flaky servers and malicious servers.
Then, it involves code and perhaps protocol changes to mitigate any
problems that are apparent.

There are several specific issues already:

** non-responding servers

It's known that having a server that connects to the introducer but
doesn't respond to any queries slows 'tahoe check' to a crawl.  The
entire fix is not clear, but surely an important step is to have each
client scoreboard the behavior of servers and e.g. stop waiting for them
after they have been shown to be nonresponsive a few times.

** servers that won't accept shares

Currently, one sees the number of servers connected, but in the pubgrid
many of them are not taking shares.  This should be apparent in
monitoring, as the lack of awareness contributes to system-level poor
availability.

** mutable file repair

Currently, mutable file repair seems to place shares of an incremented
seqN++.  My opinion is that this is the wrong choice and instead missing
shares of the current sequence should be regenerated and placed.  It
would be interesting to build a simulator that has (different) poisson
distriutions for on and off and run this against clients that place a
hierarchy and periodically 'tahoe deep-check --verify --repair
--add-lease' or equivalent.

* read-only vs writable introducer caps

Currently there is a volunteergrid, but the introducer cap is guarded
because there is a potential leaching problem, in terms of storage used
vs provided to the group.  Having a read-only introducer cap would help;
this would let people connect to the grid and fetch shares but not
upload them.

* quotas 

In a shared grid of multiple people, a natural desire is to make sure
everyone is being evenhanded in terms of resource consumption vs
provision, at least as soon as things become full.  Typical filesystems
have quotas, or someone runs du and yells at people, but in tahoe one
can't do that (and that's a feature).

A possible way to do this is to have leases on shares be associated with
some 'storage use capability', and perhaps this should be via digital
cash.  Someone who provides 1 TB of share storage for a month would
perhaps get 500G-months of share storage credits.

The trick is to do this without breaking any of the security properties
tahoe-lafs already has.

* NAT problems

The pubgrid currently has servers that are unreachable via their
advertised addresses.  However, storage servers with real addresses
connect to them, surely because the NAT/FW-impaired servers connect out.
However, client nodes cannot use these servers.

So, files placed by nodes offering storage cannot in general be
retrieved by nodes not offering storage, or if so they won't be healthy.

This problem is really a subcase of 'flaky servers'.

The challenge is to find some way to deny unreachable servers from being
part of the storage grid while not opening up any opportunities for an
adversary to manipulate the grid into a non-working state.

It's possible that a client-side fix not to advertise RFC1918 addresses
would take the edge of this problem.

Another approach is a distributed directory of performance data conveyed
back to the introducer.  Each node could sign a statement about each
storage node saying what it can connect to and whether they are taking
shares.  But, publishing lots of data could have privacy implications.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20101009/1c6085e9/attachment.asc>


More information about the tahoe-dev mailing list