[Tahoe-dev] upload performance numbers, testnet upgraded (and flushed)
warner-tahoe at lothar.com
Sat Jul 14 06:18:38 UTC 2007
I've just upgraded testnet to the most recent code, and have been playing
with larger uploads (now that they're finally possible). A couple of
uploading a copy of the tahoe source tree (created with 'darcs dist'),
telling the node to copy the files directly from disk, using:
time curl -T /dev/null 'http://localhost:8011/vdrive/global/tahoe?t=upload&localdir=/home/warner/tahoe'
about 4.6MB of data
upload takes 117 seconds
about 30MB consumed on the storage servers
0.3 seconds per file, 3.3 files per second
39kB per second
With the 3-out-of-10 encoding we're now using by default, we expect a 3.3x
expansion from FEC, so we'd expect those 4.6MB to expand to 15.3MB. The 30MB
that was actually consumed (a 2x overhead) is the effect of the 4096-byte
disk blocksize, since the tahoe tree contains a number of small files.
Uploading a copy of a recent linux kernel (linux-220.127.116.11.tar.bz2, 45.1MB)
tests out the large-file performance, this time sending the bytes over the
network (albeit from the same host as the node), using an actual http PUT:
time curl -T linux-18.104.22.168.tar.bz2 'http://localhost:8011/vdrive/global/big/linux-22.214.171.124.tar.bz2'
1 new directory
45.1MB of data
upload takes 44 seconds
151MB consumed on the storage servers
1.04MB per second
The 3.3x expansion of a 45.1MB file would lead us to expect 150.3MB consumed,
so the 151MB that was actually consumed is spot on.
Downloading the kernel image (on the same host) took place at 4.39MBps on the
same host as the node, and at 4.46MBps on a separate host (the introducer).
Please note that these speed numbers are somewhat unrealistic: on our
testnet, we have three storage servers running on one machine, and an
introducer/vdrive-server running on a second. Both machines live in the same
cabinet and are connected to each other by a gigabit-speed network (not that
it matters, because the introducer/vdrive-server holds minimal amounts of
data). So what we're measuring here is the speed at which a node can do FEC
and encryption, and the overhead of Foolscap's SSL link encryption, and maybe
the rate at which we can write shares to disk (although these files are small
enough that the kernel can probably buffer them entirely in memory and then
write them to disk at its leisure).
Having storageservers on separate machines would be both better and worse:
worse because the shares would have to be transmitted over an actual wire
(instead of through the loopback interface), and better because then the
storage servers wouldn't be fighting with each other for access to the shared
disk and CPU. When we get more machines to dedicate to this purpose, we'll do
some more performance testing.
More information about the tahoe-dev