[tahoe-dev] Tahoe benchmarking data

Mon Jul 26 02:29:40 UTC 2010

Wow, what awesome data!

On 7/25/10 1:02 PM, Kyle Markley wrote:
> 
> For large files, tuning the parameters seems not to make a lot of
> difference. Increasing the pipeline_size helps, but quickly levels
> off, and changes in the segment_size then don't seem to matter either.
> 
> For small files, the results are surprising! The default settings
> (pipeline 50000 segsize 128KiB) give significantly better performance
> on the wired LAN than anything else I tried -- this is a huge outlier
> when compared to the other data points. What explains this? Why does
> the wireless network perform so much better than the wired network for
> small files (except for the single outlier)? Why are small segments so
> much worse on wireless for small files but not for large files?

Huh, that's pretty weird. It seems to suggest that the pipeline is doing
it's job (hiding the latency), so that's a relief.

The fastest data rate you're seeing here is 64MiB/14.80s, so about
4.47MB/s or roughly 35-40Mbps, which is probably about the middle of
what you'd expect out of a 100Mbps ethernet (maybe a bit on the low
side, but not by much). Was the client CPU pegged during the upload? I
suspect you're CPU bound, but that overall your network is pretty close
to being saturated too.

The pipeline will really become important when your client-to-server
latency is more than the fraction of a millisecond that you probably get
on your LAN. How do ping times compare between the wired and the
wireless connections? On this network I'd expect them to be too close to
have a huge effect on tahoe's performance, although clearly there *some*
kind of effect. The one thing that I know wireless networks are more
susceptible to is up/down interference: the ACK coming back from the
server will contend with the next data packet coming from the client
(it'd be worse if both client and server are on wireless connections).

Each upload involves a lot of roundtrips to find a home for each share:
these roundtrips are amortized better for large files. That's probably
the source of the large (10-20x) difference you're seeing between large
and small files. As for that 68.59 anomaly, I've got no idea.. have you
tried running that one a couple more times? best-of-3 for each
datapoint? I can't imagine how *larger* pipelines should slow things
down. My only idea is that the up/down interference can get into some
kind of rhythm: if pipeline/segsizes are just right, the ACK arrives
during a gap in the upload's datastream, and causes less interference
than usual.

I'll try to take a look at the tcpdump data, but I'm gearing up for a
conference, so it might not be until next week.

cheers,
 -Brian