[tahoe-dev] Observations on Tahoe performance
shawn at willden.org
Tue Aug 25 12:47:54 UTC 2009
On Tuesday 25 August 2009 02:59:10 am Brian Warner wrote:
> The actual call that Foolscap makes is a transport.write(), which is
> implemented in Twisted by appending the outbound data to a list and
> marking the socket as writeable (so that select() or poll() will wake up
> the process when that data can become written).
Can't you pass a disk-backed buffer-like object to transport.write()? Perhaps
an mmap object? If there's a reason that doesn't work, then
transport.write() needs to either accept a file-like object or implement
disk-based buffering itself. Expecting the data to be small enough to be
queueable in RAM isn't a good idea, even with ubiquitous virtualized memory
and gigabytes of physical RAM.
> So the kernel will consume 64KB, and the transport's list (in
> userspace/python/Twisted) will consume N/k*1GB. Badness.
> Whereas, if we just put off creating later segments until the earlier
> ones have been retired, we don't consume more than a segment's worth of
> memory at any one time.
Another advantage of delaying segment creation is that it will be necessary
for (someday) streaming uploads, which are an important feature, IMO. But,
as you mentioned with the 1.5.0 improvements, it is important to pipeline the
process and ensure that you always have enough buffered up to keep the
sending socket busy.
> We've always had low-memory-footprint as a goal
> for Tahoe, especially since the previous codebase which it replaced
I think that's a very important goal. Especially since using home
router/access point devices as storage nodes is my strategy for making
GridBackup usable in homes without always-on desktop computers. Physical RAM
is still at a premium on such devices.
More information about the tahoe-dev