[tahoe-dev] Issue with Unplugging One Node from private grid

Brian Warner warner at lothar.com
Mon Nov 29 20:57:45 UTC 2010


On 11/29/10 11:41 AM, Bostonian wrote:

> Based on these tests, here are my guesses:
> 
> i) when stopping tahoe, it pro-actively notifies introducer about its
>    leave.
> 
> ii) when unplugging the cable, introducer does not know that one node
>     leaves. As a result, it still tries to connect it when requests.

Yup. The specific issue is the TCP connections between the client and
each server. When a server process terminates, the OS kernel closes all
of its open sockets, which sends a TCP "FIN" packet to the client, so it
can tell that the connection is now closed. It the cable is unplugged
instead, the connection is merely unusable (perhaps temporarily).
Applications cannot reliably distinguish between this state and simple
network congestion or servers being slow.

There are two sorts of problems here, with different kinds of solutions
for each.

The first problem (which isn't the most important one) is how to detect
non-explicit connection loss The best you can do here is a heuristic, by
sending a keepalive packet every once in a while, and abandoning any
connection that has been quiet for too long. Tahoe (or rather the
Foolscap library that it uses) pays attention to the "timeout.keepalive"
and "timeout.disconnect" settings in tahoe.cfg to control this. The
default value causes a keepalive to be sent roughly once every 6
minutes, and relies upon TCP's built-in SO_KEEPALIVE timer to give up on
lost connections (which can take anywhere from 10 minutes to several
hours, depending on the OS).

The second problem (which is the real issue) is how Tahoe's
upload/download algorithms deal with very slow and/or "lost" servers
(i.e. servers which still look like they're connected, but which never
respond to queries). This is improving over time, but some operations
can still get stuck waiting for a server to respond, so a "lost" server
can cause it to stall for a long time (until TCP gives up on the
connection).

The recently rewritten immutable file downloader should tolerate lost
servers very well. The only case that might trip it up is when a server
becomes lost during the middle of a download. We have a plan to improve
this, but it requires deciding upon an "impatience" timer, the point at
which we give up waiting for one block and switch to using a different
server.

Mutable file downloads (used for reading directories) should behave the
same way. The code is different, but I think it has a similar two-phase
behavior (servers that are lost at the beginning of the download are
handled well, but servers which become lost during the download can
cause a stall).

Immutable file uploads are probably more susceptible to stalls with lost
servers because the upload starts by asking a bunch of servers if
they're willing to hold a share, and there is no "impatience timer" to
switch over to different servers when a response is overdue. Likewise
mutable files uploads (used when creating a directory) want to hear back
from many servers before they will start, so they can be stalled by lost
servers too.

There are some interesting and deep issues here, particularly for
mutable files: you really want the client to try hard to find existing
shares when publishing an update, rather than quickly giving up and
allowing old shares to remain un-updated, because those old shares can
cause rollback and inconsistencies later. Deciding between write
availability and consistency is hard.


For the most part, we've designed Tahoe around the idea of fairly static
grids, where the server nodes stay up for long periods of time, and
connections are not coming up and going down quickly. To accomodate more
dynamic/transient grid membership, we'll need some new tools.

> This seems to an bug with Tahoe. Given the nature of the system, it
> will see this situation quite often. Is there any monitoring mechanism
> at introducer?

Not really. The Introducer is not the one paying attention to whether a
server is up or not. It's job is to tell all clients how to contact each
server, and once it's delivered that data, its job is done. The clients
themselves are responsible for managing connections to the servers.

cheers,
 -Brian



More information about the tahoe-dev mailing list