[tahoe-dev] Question about convergence keys

Jeremy Fitzhardinge jeremy at goop.org
Wed Aug 13 02:19:24 UTC 2008

Brian Warner wrote:
> However, the null key is pretty guessable, so you're effectively allowing the
> whole world to participate in your "convergence domain".
> As zooko described elsewhere, the convergence domain is the set of people
> with whom you share two properties:
>  1: your uploads will converge with theirs, allowing you to save backend
>     storage space and bandwidth when uploading identical files
>  2: the other people will be able to mount a partial-information guessing
>     attack against your files: the public information about your uploaded
>     file (like the storage index)[1] will reduce the work they need to do

Yes, that's acceptable in the use-case I'm considering.  Basically, I'm 
thinking that in the population of users we'd be dealing with, there 
would be a high likelihood of a large amount of shared content; OS 
installs, common media, etc.  That is, the data itself is probably 
public anyway.

I guess if you want to store a mixture of small really confidential data 
and large semi-confidential/public data, then you'd create two nodes 
with distinct convergence keys.  Or is there some more subtle way of 
achieving the same result?

> Also note that convergence is not necessarily as big a win as you might want.
> If both Alice and Bob have a bunch of identical files on their disk and are
> uploading them, then yeah, but in some quick tests on allmydata customer data
> we found the space savings to be less than 1%. You might want to do some
> tests first (hash all your files, have your friends do the same, measure the
> overlap) before worrying about sharing convergence secrets.

Yes, that would be an interesting experiment to perform anyway.

> [2]: the actual specification is in allmydata/util/hashutil.py:132, in the
>      convergence_hash() function, and is:
>       t = "allmydata_immutable_content_to_key_with_added_secret_v1+"
>       t += netstring(convergence_secret)
>       t += netstring("%d,%d,%d" % (k,N,segsize))
>       return SHA256d(netstring(t) + file)
>      We use netstrings and SHA256d (instead of plain SHA256) to avoid "chosen
>      protocol attacks", which would allow two different files to wind up with
>      the same hash.

OK, that's what I was hoping.  The key isn't exactly the file hash, so 
knowing the bare file hash doesn't let you decrypt it.


More information about the tahoe-dev mailing list