[tahoe-dev] Uploading huge files

Zooko O'Whielacronx zooko at zooko.com
Thu Jan 27 07:04:54 UTC 2011


Shawn's and Greg's suggestions were excellent ones. In addition to
those, go to the welcome page of your gateway (e.g.
http://127.0.0.1:3456 by default), click on "Recent Uploads and
Downloads", and click on the upload event to see progress and
performance metrics.

Note that the first stage of the upload is transferring the entire
contents of the file from your client (either the "tahoe" command-line
tool, a web browser, or in your case an (S)FTP client) to the gateway.
The upload doesn't appear in the "Recent Uploads and Downloads" until
after that first stage is complete and the entire contents of the file
are in the hands of the gateway.

If it never appears in the "Recent Uploads and Downloads" list then
perhaps it is failing before it reaches that stage. Maybe the gateway
is running out of temporary disk space (it stores the entire file in
temporary disk space while uploading it). Can FileZilla give you a
progress report telling you how much of the file has been uploaded?

Please report back and let us know what you learn about this issue.

By the way, the fact that the gateway has to receive the entire
contents of the file from the client before it can begin uploading is
a limitation that I don't like. The advantage of doing it this way is
that the gateway can then compute the secure hash of the entire file
before it begins uploading. If the same file contents are already
stored on the storage servers then the gateway will short-circuit and
not re-upload the data. This is file-level deduplication and it is
great when that's what you want -- when you try to upload a large file
that has already been uploaded and your gateway is closer to your
client than the storage servers (or helper) are. (Ideally the gateway
should run on the same host as the client.)

The disadvantages are that it takes potentially much longer to upload
large files when they are *not* already stored, it increases the
failure modes (as we see here--you have to investigate using multiple
tools to find out where in the process it is failing or taking too
long), and it requires temp space on the gateway which precludes
running the gateway on a small router.

The main ticket to support streaming upload through the web gateway is
#320. The ticket to support streaming upload through the SFTP
interface is #1288.

Regards,

Zooko

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/320# add streaming
(on-line) upload to HTTP interface
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1288# support streaming
uploads in uploader



More information about the tahoe-dev mailing list