[tahoe-dev] Removing the dependency of immutable read caps on UEB computation

Sat Oct 3 07:26:16 UTC 2009

Shawn Willden wrote:
> Specifically, it contains:
> 
> 1.  The root of a Merkle tree on the file plaintext
> 2.  A flat hash of the file plaintext
> 3.  The root of a Merkle tree on the file ciphertext
> 4.  A flat hash of the file ciphertext
> 5.  Roots of Merkle trees on each share of the FEC-encoded ciphertext

Incidentally, we removed 1 and 2 forever ago, to squash the
partial-information-guessing-attack. We'd like to bring them back,
safely encrypted with the readcap, to detect integrity problems relating
to having the wrong key or having a buggy AES implementation.

> To address these issues, I propose splitting the UEB into two parts

Interesting. As you point out, I'm not sure I like the introduction of
an extra layer of caps (and an asymmetric key) into the immutable file
scheme. It raises the question: who should hold onto these caps? Where
should they put them? I suppose the original uploader of the file is the
special party who then has the ability to re-encode it, but they'll have
to store it somewhere, and it feels wasteful to put an extra layer of
caps in the dirnodes (along with the writecap, readcap, and
traversalcap) just to track an object that so few people will actually
be able to use.

Adding an asymmetric key might also introduce some new attack vectors.
If I give you a readcap and claim that it points to a certain contract,
and you sign that readcap to sign the contract, can I pull any tricks by
also holding on to this newly-introduced signing key? I guess if the
readcap covers UEB1, then I can't forge a document or cause you to sign
something else, but I can produce shares that will look completely valid
during fetch and decode but then fail the ciphertext check. That means I
can make it awfully hard to actually download the document (since
without an effective share hash, you can't know which were the bad
shares, so you can try using other ones).

(the structure for this would probably put H(UEB1|VerifyKey) in the
readcap, and then store a signed UEB2 in each share).

I guess we should figure out the use case here. Re-encoding the file is
something that you'd want to do when the grid has changed in size, such
that it is now appropriate to use different parameters than before,
right? And if you're changing 'k', then you'll certainly need to replace
all the existing shares. So the goal appears to be to do all the work of
uploading a new copy of the file, but allow the old caps to start
referencing the new version.

Deriving the filecap without performing FEC doesn't feel like a huge win
to me.. it's just a performance difference in testing for convergence,
right? And if you (or someone you trust) uploaded the file originally,
you (or they) could just retain a table mapping file hash to readcap
(like tahoe's backupdb), letting you do this file-to-filecap computation
even faster.

I certainly see more value in being able to change the encoding
parameters after the fact. But I'm kinda hopeful that there might be a
way to allow re-encoding without such a big change (perhaps by
allocating more space in the share-hash-tree, to allow same-k-bigger-N
changes).

I *am* intrigued by the idea of immutable files being just locked-down
variants of mutable files. A mutable-file readcap plus a hash of the
expected contents (i.e. H(UEB1)) would achieve this pretty well.. might
not be too much longer than our current immutable readcaps, and we could
keep the encoding-parameter-sensitive parts (UEB2) in the signed (and
therefore mutable) portion, so they could be changed later.

cheers,
 -Brian