[tahoe-dev] Perf-related architecture question

Wed Jul 21 06:01:41 UTC 2010

Hi experts, sorry to trouble you all again, but my observations on my
local grid have made me curious about tahoe upload architecture.  I have so
far resisted learning Python, so I can't just read the code, and asking
here is educational for more people, anyway.  :)

I am running a helper, and see that while the helper is fetching
ciphertext that the storage nodes see essentially no activity.  Makes
sense.  But what I don't understand is why it takes so long to fetch the
ciphertext.  The node performing the upload is connected to the same router
as the helper node, both via 100Mbps ethernet.  I see that the network
utilization is pretty consistent at around 2%.  So encryption must be the
bottleneck, right?  It doesn't appear so; CPU utilization is almost always
under 10%.  I'd like to understand what's slowing things down here -- it
looks like things ought to be able to run about 10x this speed.  Are there
a lot of serialized network round-trip messages in the upload protocol, or
something?

I'm also curious about how the helper distributes shares to the storage
nodes.  In my configuration of 4 storage nodes, 3 are wired at 100Mbps and
1 is wireless.  It looks like when the helper is distributing shares, this
happens at roughly the same pace to all nodes, despite some nodes having
faster connections than others.  I would have expected the wired nodes to
finish receiving their shares significantly sooner than the wireless node. 
Maybe this is just another manifestation of the same thing causing the
ciphertext fetching to be slow, but I wanted to understand what's supposed
to happen.  Are shares supposed to be placed on storage nodes at roughly
the same time, or are the shares going to nodes with faster connections
allowed to race ahead?

I can't "see" what the speed-limiting element in the system is, and
performance is a hobby of mine, so I'd like to understand.  :)

-- 
Kyle Markley