Tahoe Storage Node causes high CPU usage on Marvell Armada 375

Brian Warner warner at lothar.com
Mon Nov 14 06:34:31 UTC 2016


On 11/13/16 4:13 AM, exception0x876 . wrote:

> As I read in Tahoe-LAFS FAQ, the storage node does not require much
> CPU power due to all CPU heavy stuff being done on the gateway node.

Relatively speaking, that's true, although we were thinking about much
slower data rates when we wrote that :). Tahoe was originally written to
serve a home backup tool in 2005, and in those days, most people's
upstream links were pretty slow.

> However while I transfer the file from the client (gateway is node is
> running on the localhost), tahoe process on the storage node goes up to
> 100%. The transfer speed is around 20Mbps, while I get up to 1Gbps using
> iperf3 between gateway and storage node.

I haven't been able to test it on WAN links that fast. I know that
Foolscap, our network protocol, uses a lot of CPU to encode and decode
objects on the wire. It's a very general serialization scheme, with a
lot of options and parts that can be overridden on the fly: much too
flexible for our needs. I'd bet iperf3 is written to use as little CPU
as possible, so where iperf3 would be network-bound, Foolscap is likely
to be CPU-bound.

A smaller contributing factor is the expansion due to erasure coding:
with the default settings of k=3/N=10, uploads are pushing 3x more data
than a plain 'scp' or other ordinary copy. (You could set k=1/H=1/N=1 to
disable this, for testing).

> The above makes me believe the storage node still does transit
> encryption/decryption. After the quick look at the sources I assumed
> it uses pycryptopp for this, which in turn uses crypto++.

Tahoe uses pycryptopp (and libcrypto++) for encrypting file objects, but
those encrypted shares are then sent to storage servers over Foolscap.
Foolscap uses TLS, which it gets from pyopenssl, which then uses
libopenssl. The data is encrypted twice, briefly.

So any compilation options that speed up crypto++ will provide
performance improvements in the "encryption/encoding" phase of an
upload, while options that speed up openssl will improve the "pushing
shares" phase. We nominally display these timings separately in the
"Recent Uploads And Downloads" status page for each upload, but since
uploads are streaming, the pushing-shares phase is not a simple
measurement (it gets kind of mushed together with other network delays).

> So the question is, if this is an expected behavior or I am unaware of
> something to do it right?

I haven't characterized the performance on such fast upload links, but
yea,h I'm not surprised that we've got CPU limitations at those rates.

We've been generally planning to migrate away from Foolscap and switch
to something simpler (which could then serialize/deserialize data more
efficiently), probably something closer to plain HTTP. I expect that'll
be the quickest/ most effective performance improvement we could make.

hope that helps,
 -Brian



More information about the tahoe-dev mailing list