[tahoe-dev] An idea for share placement

Wed Aug 19 21:59:17 UTC 2009

Shawn Willden wrote:

> The pathological case is that of a large grid and a file with less
> than K shares remaining. In that case, the client will query every
> server in the grid trying to find its shares.

Yeah, the downloader has to make a decision, about how much work it is
willing to do, versus the chance that that work do any good. We can try
to put more information out there (either statically in the filecap, or
dynamically on the servers) to help it make a better choice, but sooner
or later it will have to decide.

Part of this is a UI problem. Like our discussion about having uploads
fail if they can't be made reliable enough, perhaps the download API
should include a "how hard to try" parameter. The mutable-file API has
this, sort of (if you pass MODE_CHECK to the mapupdate step, it will ask
every single server, but the normal MODE_READ gives up after it sees
some number of servers without a copy). It's hard to imagine the UI for
this, though: "Dear user, I decided to stop looking for your precious
file, even though there are still 1.2 million servers I could ask,
because I don't want to do that much work. Sorry."  :-).

> One solution that has been suggested is to give each server holding a
> share a list of the other servers holding shares. That way, if a
> client can find just one share, it will know exactly where to look for
> the rest.

To be precise, it will have a much shorter list of where to look:
there's no guarantee that those hints will be up-to-date (particularly
if a repairer made some new shares because server X was offline, but was
unable to update the hints on server X, and then X comes back online).
It's all about increasing the odds that the servers at the top of your
list will have shares, to reduce the average search distance.

> That's good, but it doesn't address the pathological case. For a large
> grid, the client needs a way to know when to stop looking.
> 
> It occurs to me that perhaps we could add a little bit of information
> to the URI which tells the client how far down the list it must go
> before it should give up, and do this in a way that is stable in the
> presence of new nodes.

Keep in mind that this is all probabilistic. If you want to improve
reliability by allowing a Repairer complete freedom in where to place
new shares, there will be no such thing as a hard stop, or complete
stability in the presence of new nodes.

Ideally, there's a magical Repairer that runs constantly and
instantaneously, so all shares are always at the top of the search list.
But it's also quite likely that there is no repairer, and it would be
nice if grid growth didn't cause undamaged files to become
unretrieveable.

> I'm sure there are many ways to accomplish this, but the one that
> comes to my mind is simple hamming distance.

When we last kicked this idea around the office (maybe a year ago), the
working idea was to simply record an abbreviated prefix of the permuted
value of the last server that received a share (in fact, record a
logarithm of that value). Basically this would be the distance around
the permuted ring. If all N servers accept shares, and there are G
servers in the grid, then (on average) the last server will have a
permuted value of (2**256)*(N/G). This is equivalent to recording the
full serverid of the last server, except that you can approximate it
with fewer bits, probably 8-16. You'd effectively be recording the
percentage of the grid which was used during upload, which can be a good
proxy for the percentage of the grid that can be usefully explored
during download.

The nice thing about this metric is that, since new servers tend to get
added uniformly to the ring, the last server will still remain at about
the same place. If the grid grows (servers are added but not deleted)
and no repair happens, the percentage will stay the same: the downloader
will query more servers but will still stop at the right place. If the
grid changes (servers are added and deleted in equal numbers), the
downloader will query the same number of servers and still stop at the
right place. If the grid shrinks, the downloader will stop too early
(but presumeably it will still try at least N*epsilon servers, and can
be configured to only stop if it sees N*epsilon empty servers in a row
that are all past the cutoff point, and the problem is easier with a
smaller grid anyways).

If repair happens, the repairer will try to concentrate the shares at
the beginning of the permuted list, and if the grid has changed size,
then the static filecap's percentage indicator will become wrong. If the
grid grows, the percentage will be pessimistic (the repaired shares will
be clustered at the beginning of the list, but the percentage marker
will cover far more servers than necessary, but that's ok). If the
servers come and go but the overall grid size remains the same, then the
percentage continues to be correct. If the grid shrinks, again, the
downloader might stop too early.

I guess my issue with using a hamming distance is 1: it would require a
significant change to the way we currently permute the peerlist, and 2:
I don't have an intuitive grasp of how it behaves. (I have the same
problem every year when I try to remember how Kademlia's XOR metric
works.. I usually manage it, but then the understanding fades away). If
we sorted the peerlist by hamming distance, we'd need to use something
else as a secondary criteria, because the range of the hamming distance
function would be limited to 0..256, right? I like the load-balancing
properties of our permutation scheme.. I'd want to make sure any new
approach had similar properties.

cheers,
 -Brian