[tahoe-dev] atlasgrid performance testing results

Sat Sep 24 00:16:13 UTC 2011

On 9/23/11 2:57 PM, Zooko O'Whielacronx wrote:

> Perhaps this should be obvious, but why are MDMF retrievals and
> immutable ("CHK") retrievals currently different in this regard? As
> far as I understand they each proceed (in the main part of the process
> where they are downloading the bulk data) by sequentially requesting
> one block and then the next. Don't they?

They both do one-segment-at-a-time, but each segment requires both the
main data block (segsize/k) and a bunch of hashes (the Merkle tree
"uncle chain"). The mutable-share storage API was designed later, with
more experience, and it got a readv() method. The earlier
immutable-share API only has a single-span read() method, so the
immutable downloader has to simulate readv() by sending a whole bunch of
separate read() calls, sometimes dozens per segment (especially for
large files with a deep hash tree). My theory is that the foolscap
marshalling overhead of those extra messages is significant.

What I really don't get is why MDMF seems immune to slowdowns with large
'k': even with readv(), the number of reads should still grow linearly
with k. I don't have a theory for that yet.

cheers,
 -Brian