[tahoe-dev] how to see performance numbers (Re: linuxpal updated)
warner at lothar.com
Fri Mar 25 16:22:51 UTC 2011
On 10/5/10 4:47 PM, Greg Troxel wrote:
> I have two theories:
> A) I ran 'find . -size +8192 | xargs rm' in the storage area on some
> nodes to reclaim space so I could repair 1 KB files. I don't think I
> did this on linuxpal, as it still has lots free, and I think the ones
> I did it on are fine, but the theory is that this causes DYHB to say
> yes from some db and then choke when asked for it. I don't really
> believe this theory.
You're right to not believe this theory :). Each DYHB query turns
directly into an os.listdir() and open() of the relevant share dir/file.
No DBs involved. (the only DB we're using in tahoe so far is a SQLite db
on the CLI side in 'tahoe backup').
> B) There is some firewall impairment. Evidence for this theory is
> that if I stop my client and then restart it, then a deep check fails
> on servers-of-happiness issues promptly, and I get responses in a
> reasonable time from sunpal7 and linuxpal.
> I'm looking at netstat on both ends, but I wonder if tahoe has any
> keepalives/makedeads in the connections to servers.
This sounds more plausible.. we've historically had problems with
silently-disconnected TCP sessions (either caused by NAT table entries
being dropped or laptops being closed). There are tahoe.cfg options to
turn on keepalives ([node]timeout.keepalive and .disconnect), but the
default tahoe.cfg leaves them blank, which tells the underlying foolscap
Tub to use its own defaults, which are keepalive=4*60 and
disconnect=None. This means every four minutes it will send a keepalive
if nothing else has been sent in the previous four minutes (so
worst-case is one message every 8 minutes), and it will never drop the
connection just because of a timeout. See ticket #521 for a discussion
about choosing timeout values.. maybe your firewall is silently dropping
the outbound connections in like 5 minutes of inactivity.
More information about the tahoe-dev