[tahoe-dev] new Helper protocol, new upload results

Brian Warner warner-tahoe at allmydata.com
Wed Feb 6 09:36:00 UTC 2008


I've pushed a change which modifies the Helper protocol, so if you're
currently using a helper.furl, you need to update your tree and switch to a
new helper.furl . I've shut down the old helper ([6fyx] on port 58288) to
avoid confusion. The new helper.furl is:

 pb://basf4onby5p65zllbozg7te7lvahzrzd@tahoebs1.allmydata.com:41491/3xdkbraznpmqml6bi25n2lslbnvosan6

The benefit of the new Helper protocol is that it delivers more information
to the uploading client, specifically timing information. The human+browser
-facing tahoe web interface (lovingly known as the "wui", to distinguish it
from the RESTful program-facing web interface known as the "wapi" that lives
in the same server) has a "welcome page" which was improved earlier today to
display a bunch of information about the currently connected peers. (this was
the outcome of the new Introducer protocol). It's been updated again just a
few hours ago, to add some more HTML forms at the bottom for uploading
"unlinked" files: files that exist in the grid (with a URI) but not yet in
any vdrive.

And, as of a few minutes ago, those upload-unlinked forms now land you on a
brand new "upload results" page. This page tells you lots of details about
the upload you just finished: which share went to which peer, which peer got
which shares, and lots of information about how long the various phases of
the upload took.

As an example, here's an upload of a 20MB file to a local test grid running
entirely on my workstation:

 Uploading File... done!
 Upload Results:
 
     * URI: URI:CHK:3smoxi6n19ntkfaczabct6d5gy:98oprqqru4hwuunmpxnr7shersq5u76rcqah9pwtrujfrie455oo:3:10:20401521
     * Download link: /uri/URI:CHK:3smoxi6n19ntkfaczabct6d5gy:98oprqqru4hwuunmpxnr7shersq5u76rcqah9pwtrujfrie455oo:3:10:20401521
     * Sharemap:
           o 0 -> Placed on [62ubehyu]
           o 1 -> Placed on [5yyqu2hb]
           o 2 -> Placed on [onjqtb3j]
           o 3 -> Placed on [vb7vm2mn]
           o 4 -> Placed on [dmnkfosf]
           o 5 -> Placed on [62ubehyu]
           o 6 -> Placed on [5yyqu2hb]
           o 7 -> Placed on [onjqtb3j]
           o 8 -> Placed on [vb7vm2mn]
           o 9 -> Placed on [dmnkfosf]
     * Servermap:
           o [dmnkfosf] got shares: 9,4
           o [onjqtb3j] got shares: 2,7
           o [vb7vm2mn] got shares: 8,3
           o [5yyqu2hb] got shares: 1,6
           o [62ubehyu] got shares: 0,5
     * Timings:
           o File Size: 20401521 bytes
           o Total: 18.35s (1.11MBps)
                 + Storage Index: 280ms (72.74MBps)
                 + [Contacting Helper]:
                 + [Helper Already-In-Grid Check]:
                 + [Upload Ciphertext To Helper]: ()
                 + [Helper Total]:
                 + Peer Selection: 360ms
                 + Encode And Push: 17.70s (1.17MBps)
                       # Cumulative Encoding: 2.22s (9.19MBps)
                       # Cumulative Pushing: 15.18s (1.34MBps)
                       # Send Hashes And Close: 163ms
 
 Return to the Welcome Page


This page is designed to display data for both Helper-assisted uploads and
non-assisted uploads. Some of the fields are only meaningful when a helper is
in use, and I don't yet know how to turn them off, which is why you see some
empty entries there.

The "Storage Index" phase is the initial hashing to compute the CHK
encryption key. Then "Peer Selection" involves a bunch of roundtrips to
locate storage servers who are willing to hold our shares. "Encode And Push"
is the bulk of the upload work: we alternate between encoding a segment and
pushing the resulting shares to the servers. Then, once we've sent all the
shares, we sent out a bunch of final hashes and finalize the buckets.

The upload report includes rates (in bytes per second) for all phases that
operate on a significant amount of data (usually the whole file). This should
give a good idea of the relative cost of each phase. Many of these phases are
dominated by network round-trip times rather than bulk data processing.

(note that the Storage Index calculation involves both reading the file off
disk and performing a SHA-256 hash of its contents.. I suspect that my test
above already had the whole file in cache).


When the Helper is in use, we get a timing report like this:
 
 # Timings:
 
     * File Size: 20401521 bytes
     * Total: 195.05s (104.6kBps)
           o Storage Index: 330ms (61.75MBps)
           o [Contacting Helper]: 202ms
                 + [Helper Already-In-Grid Check]: 103ms
           o [Upload Ciphertext To Helper]: 188.84s (108.0kBps)
           o Peer Selection: 84ms
           o Encode And Push: 5.50s (3.95MBps)
                 + Cumulative Encoding: 948ms (21.51MBps)
                 + Cumulative Pushing: 4.22s (4.84MBps)
                 + Send Hashes And Close: 147ms
           o [Helper Total]: 194.53s

In this case, the "Contacting Helper" phase includes both roundtrips to the
helper and the helper's "is this file already in the grid" check (which
involves roundtrips to a number of storage servers). If the helper indicates
that the file isn't already present in the grid, we must perform the "Upload
Ciphertext To Helper" phase. Once the ciphertext is on the helper, the upload
runs as usual (with encode and push). The "HelperTotal" time measures
everything after the storage index calculation.


I plan to add a bit more data to this page, specifically to provide some
nicknames on the server peerids (so you can see exactly how much of your file
lives on Alice's or Bob's server). I also plan to add a similar page for
mutable files.

Finally, I plan to add a virtually identical page for uploads that are
performed with a vdrive directory's "Upload" button: at the moment, this page
is only displayed for "unlinked" uploads. Note that this makes the POST
t=upload interface more human-oriented than machine-oriented: we may need to
think a bit about the various javascript frontends that might use POST
instead of PUT. (PUT will always be purely for programmatic frontends, but we
can provide both human-centric and machine-centric POST interfaces if
necessary).


cheers,
 -Brian



More information about the tahoe-dev mailing list