[tahoe-dev] Question about convergence keys

Jeremy Fitzhardinge jeremy at goop.org
Wed Aug 13 02:40:32 UTC 2008


zooko wrote:
> I'm glad to hear that you are experimenting with Tahoe.  Please do
> keep us informed of your impressions of it.

There's a couple of things that spring to mind in looking at it, and I 
see that they're things you've already thought about:

More comprehensive file metadata.  I'd like to do a semantically 
complete backup of unix/linux/macos, so I'd like to capture all the 
normal unix metadata, but also things like extended attributes for 
things like selinux, etc.  http://allmydata.org/trac/tahoe/ticket/117 
discusses it, but it looks like the current state is "good enough".  Is 
there anyone working on a more general metadata model?

More awareness of network topology.  If I have a local lan of machines 
which are participating in a grid, I'd like them to prefer to replicate 
to each other, and have a single, more connected, node responsible to 
sync off-site.  I had two ideas for using zeroconf/bonjour/avahi:

   1. Use it to discover the correct upload helper for a given network. 
      No one upload helper is necessarily correct for a mobile node
      which moves around and connects to different networks.
   2. Use it to find other nodes on the same lan, so that they can be
      used for preferential communication.  This would be particularly
      useful if the network is natted and doesn't give hosts externally
      routable addresses; they would still be able to locally discover
      each other.

One thing we'd like to experiment with is having "promiscious 
replication", where mobile nodes may come into contact with each other 
over relatively ephemeral connections, and still be able to usefully 
exchange data, even if they don't have any external network connection.  
I don't know if it's actually a useful thing to do, or if it can be made 
workable, but it does offer a completely decentralized, peer to peer 
ad-hoc model of distributed storage.

>   Your concern seems valid
> if I understand it correctly.  Let me see if I understand: you're
> concerned that if the symmetric encryption key is just the secure hash
> of the file (in the case that the added convergence secret is not used
> or is the empty string), then there might be people who know the
> secure hash of the file but who oughtn't be allowed to get the
> plaintext of the file.  For example, users may have published the
> SHA-256 hashes of their files even though the files are private.
>   

Yes.

> I think this is a valid concern, and this is why we use a "tagged
> hash" instead of a normal hash of the file.  A "tagged hash" is just
> that we use some unambiguous prefix when hashing the file to separate
> hash values which are used for this purpose from hash values which are
> used for other purposes.  Your concern can be seen as an example of
> the "chosen protocol attack" [1], and tagging your hashes is one
> defense against that attack.
>   
Yes, interesting.  That addresses my concern.

> This practice of ours is lightly documented in architecture.txt [2],
> and you can see the actual implementation and which tags we use for
> which purposes and so on in hashutil.py [3].
>   

I should re-read architecture.txt.  I read it early in my 
experimentations (ie, yesterday) without quite having a handle on how 
everything fits together.

> P.S.  In a distantly related story I'm interested to see that this
> program claims that SHA-256 is less secure than Tiger-192: [4].
>   

I had a quick look over that table, but I have no idea about the 
validity of that analysis.

> I've added a link and some notes about that to the Tahoe Bibliography
> page: [5].
>
> I like Tiger-192 because it is so efficient -- one third as much CPU
> load as SHA-256 on these benchmarks [6] -- and because it emits
> 24-byte outputs instead of 32-byte outputs, which would fit better
> into nice small caps.
>   

That's definitely appealing.  I can imagine the hash function is going 
to be a large operational cost when working with large files.  That 
said, I think network bandwidth will be the big bottleneck regardless.

Thanks,
    J



More information about the tahoe-dev mailing list