[tahoe-dev] how to squeeze a CHK URI

Brian Warner warner at allmydata.com
Mon Sep 24 20:27:16 UTC 2007


> I know that the encoding parameters formerly had to be in the URI because
> we used them (along with the storage index) for locating shares in the
> "tahoe3" peer selection algorithm [1].

Right. Now that we use Tahoe2, we don't strictly need those in the URI. The
download algorithm would be to crawl the ring until you find any peer with
the storage index, grab the UEB from them, pull k from the UEB, then crawl to
find k-1 additional peers.

This would put a limit on the amount of parallelism we could achieve during
peer selection, adding an extra roundtrip or two, but we could probably deal
with some of that by speculatively contacting 5 peers in the hopes that k<=5.

If we want room to use a different peer selection algorithm that *was*
sensitive to k and or N, we'd need that information in the URI. Without it,
for something like Tahoe3, we'd start at 0 and try to find the first peer
(which would always be in the same place), but if that peer were missing then
in a large mesh we wouldn't have much of a hope of finding a peer.

I'm ok with removing k+N from the URI, but it will involve some code changes,
and the values are pretty small.

> I vaguely recall that Brian wanted filesize in the URI in order to make it
> easier for some kind of user to know the filesize without having to fetch a
> URI extension block, but now I'm not sure if that is correct.

> Brian: why do we have filesize in the URI?

The main reason was so that a "manifest" (derived from the read-capabilities
of all the files and directories that you can reach) could provide an easy
(but non-secure) way to measure how much space you're consuming. This is the
sort of thing that we can optimistically believe and then double-check later,
by checking on a random sample of them and pulling their UEBs (which is where
a harder-to-fake size exists).

Some of the proposals to remove information from the URI depend upon being
able to compute that information from other values, like the filesize.

Overall I'm +0 on keeping the filesize in the URI: it makes certain things a
lot easier (like displaying filesize in a directory listing, although we
could copy the filesize in to the dirnode edge instead).

> Yet another way to squeeze URIs is to remove the scheme and  
> separators, leaving just the uncompressible bits:

Remember that we have dirnode URIs, and we'll eventually have SSK URIs, so we
need to leave some bits for uri-type and of course a version number. These
don't strictly need to be human-readable, but it might be nice to be able to
tell the difference between a directory and a file just by looking at the
URI.


cheers,
 -Brian



More information about the tahoe-dev mailing list