[tahoe-dev] Faster share retrieval by using probabilistic set-testing ?

David-Sarah Hopwood david-sarah at jacaranda.org
Mon Mar 5 16:16:35 UTC 2012


On 05/03/12 07:30, Johannes wrote:
> As far as I understand, if a k/N share is retrieved, the client has to
> ask each of at least k storage nodes whether they possess matching
> shares. Now, the simplest way is just to look whether a file name with
> the share key exists, which seem to happen above. This consumes a little
> bit time, as disk access is relatively slow.
>
> As a possible alternative, there exists an efficient algorithm 
> called "Bloom Filter" which is able to tell very fast whether a set
> member is *probably* there, this works by implementing a probabilistic
> set which can be held in memory and is much smaller than the actual
> list of shares.

Thanks for the suggestion, but I don't think the disk access time here is
likely to be significant relative to communication latency. Also remember
that the storage directory contents may be cached by the OS.

At some point we're probably going to change to using a database to keep
track of share metadata. Then, the metadata will be even more likely to
be cached for reads.

-- 
David-Sarah Hopwood ⚥

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120305/dedf610c/attachment.asc>


More information about the tahoe-dev mailing list