[tahoe-dev] Tahoe-LAFS v1.8 planning / Administrivia / Big Picture

Tue Nov 16 17:42:14 UTC 2010

On 11/15/10 3:02 AM, Francois Deppierraz wrote:
> 
> 1-of-3     58
> 2-of-3     63
> 3-of-3     58
> 10-of-30  210
> 20-of-60  394

Hm. Just before 1.8.0, I was using the JS/Protovis -based download
timeline visualization tools (which didn't get landed) to investigate
the overhead of large k (i.e. talking to lots of servers). I identified
a couple of potential performance hits in.. whatever ticket was hot at
that time. (the one that comes to mind is the unnecessary eventual-sends
to do_loop() in the Share object, which I think scales with k, and
causes it to recompute the desire/satisfaction bitmaps even though
nothing has changed).

It'd be worth getting those tools landed, or at least updating the
patch, and looking at these different cases to see if the slower forms
are spending a lot of time in that recomputation. The timeline view
should also make it clear if some servers are taking a lot longer to
respond than others.

(ooh, I just had an idea: what if the RIStorageServer response included
a [local] timestamp indicating when the request was received, and when
the response was sent? perhaps in a separate/subsequent 'timeline of
recent events' message, so it could capture the time it took to
serialize the get-block response? You'd have to try to correct for clock
skew between the two systems, of course, but the idea would be to add
markers to the Protovis timeline view to show you that the server
provided the block of data at time=ABC even though the client didn't
start seeing that data until time=DEF, to distinguish between a server
getting the request late, pulling data off the disk slowly, and having
its response delayed by the network. Simultaneous requests to multiple
servers get in each other's way, and somebody has to be last.. it might
be nice to be able to blame that rather than blaming the server).

Other than that, higher k means more network overhead (contacting more
servers, sending more messages, more contention overall), so I can
imagine *some* slowdown for large k. But I'd expect it to be small
compared to the overall data and bandwidth limits, whereas your data
shows something worse than that.

intrigued,
 -Brian