[tahoe-dev] how to encrypt and integrity-check with only one value

Mon Sep 7 07:48:53 UTC 2009

Zooko Wilcox-O'Hearn wrote:

> Now, convergent encryption could do both jobs with one value! If you
> let the symmetric key be the secure hash of the plaintext, then the
> reader could use the symmetric key to decrypt, then verify that the
> key was the hash of the plaintext.

In addition to the other reasons you listed, you might not be able to
use this because of alacrity: a CHK hash can't be validated until the
entire plaintext has been downloaded. OTOH, it's conceivable that you
could build up a plaintext merkle tree with about the same effort as the
normal CHK flat hash, and use the root of that as your encryption key,
and safely encrypt the plaintext hash tree in a way that lets you grab
it quickly (one node at a time). It'd be kinda complex, but that might
let you use CHK-like encryption keys that also gave you low-alacrity
integrity properties.

> Here's my idea about ensuring both confidentiality and integrity with
> a single crypto value.

Ah, good, thanks for writing this up. I certainly like your scheme
better than the fragments of your scheme that I was able to reconstruct
from a memory of a vague conversation :-). I'll try to update
NewImmutableEncodingDesign in the next few days with your algorithm.

Some observations:

 * obviously the "v = H(ciphertext)" could+should be expanded to include
   our usual UEB scheme, with all integrity information (merkle trees,
   share hash trees, ideally even an encrypted form of the plaintext
   hash data) going into the UEB, and "v" being the hash of the UEB.
   David-Sarah's point about making verifycap=H(v,K1enc) is spot-on.

 * verifycap cannot be offline-derived from readcap: you have to run
   through part of the download process, fetch at least "v" and the
   K1enc value, derive K1, hash K1+v together to confirm that you really
   do get the readcap, then emit H(v+K1enc) as the verifycap. This makes
   manifest/repaircap generation really expensive (a network trip per
   file). One mitigation strategy would be to store both readcap and
   verifycap in dirnodes, effectively caching the verifycap computation.

 * what should the storage-index be? It clearly must be the hash of the
   readcap, otherwise readers cannot find the shares (or must carry
   around some extra value, negating the shortness of the readcap).

 * but since storage-index != verifycap (i.e. H(UEBhash+k1enc)), servers
   will be unable to completely validate their shares. They can confirm
   that everything (including K1enc, thanks to David-Sarah's suggestion)
   matches the verifycap, but they can't tell that the verifycap matches
   the storage-index under which the share is stored (i.e. they'd be
   unable to detect two swapped sharefiles). This permits the
   "roadblock" attack and generally misses our goals of allowing full
   server-side validation.

 * we can't determine the storage-index until after we've encoded the
   entire file (which generally means after we've uploaded it). So we
   need a new uploader protocol that lets us upload to an as-yet-unnamed
   slot, and then provide the slot's storage-index at the very end of
   the process. This is more work, but it isn't a huge deal.

 * we wouldn't be able to directly use our permuted-list Tahoe2
   peer-selection protocol, since we won't know the storage-index (and
   thus the permuted list) until after we've uploaded all the shares. I
   think we'd have to go with the "server-selection-index" idea: a much
   shorter string (since it only needs to provide load-balancing, not
   collision resistance), either randomly generated or derived from a
   salted CHK hash (and thus computable before encoding/upload), used to
   permute the peerlist. This string must be included in the readcap,
   increasing it's length, but we could probably get away with maybe 20
   bits or so.

So, while I like the one-cryptovalue trick, I'm unsatisfied with both
the lack of server-side validation and offline readcap-to-verifycap
attenuation, and the separate SSI value makes me slightly nervous.

Incidentally, I kind of suspect that we could get away with longer
immutable readcaps if we had short directory readcaps, since I imagine
that people are more likely to share with dircaps (which get you
filenames) than with the raw filecaps. On the other hand, I fear that we
have even fewer tricks available for mutable encoding schemes, unless
semiprivate keys work out.

cheers,
 -Brian