[tahoe-dev] Removing the dependency of immutable read caps on UEB computation

Fri Oct 2 17:38:22 UTC 2009

I'd like to have a little discussion on whether or not it makes sense in the 
new immutable cap design to remove the dependency on UEB computation.

As background for any who aren't familiar with it, and to confirm my own 
understanding, the UEB, or URI Extension Block, is a block of hashes that 
provides strong, multi-way integrity verification of the immutable file.  
Specifically, it contains:

1.  The root of a Merkle tree on the file plaintext
2.  A flat hash of the file plaintext
3.  The root of a Merkle tree on the file ciphertext
4.  A flat hash of the file ciphertext
5.  Roots of Merkle trees on each share of the FEC-encoded ciphertext

That's a lot of hashes, and it provides strong integrity guarantees.  It 
provides a way to verify the integrity of the plaintext, the ciphertext and 
each encoded share of the ciphertext.  That's all very good.

A copy of the UEB is stored with each share.

The current immutable read cap design embeds a hash of the UEB in the URI.  
Indeed, this 32-byte hash is comprises most of the length of current 
immutable read caps.  David-Sarah Hopwood's Elk Point design applies Zooko's 
ideas about how to combine security and integrity parameters to make the UEB 
hash 'implicit' in the read and verify caps, but it's still present.

The disadvantage of including the UEB hash in the read and verify caps, 
whether explicitly or implicitly, is that it means that FEC coding must be 
completed before the caps can be generated.  This is unfortunate, because 
without it, it would be possible to efficiently compute read caps separate 
from the upload process, and even long before the upload is performed.  I can 
think of many applications for that.

The larger issue, though, is that the present design binds a given read cap to 
a specific choice of encoding parameters.  This makes it impossible to change 
those parameters later, to accommodate for changing reliability requirements 
or changing grid size/structure, without finding a way to update all extant 
copies of the original cap, wherever they may be held.

To address these issues, I propose splitting the UEB into two parts, one part 
that contains the plaintext and ciphertext hashes, and another that contains 
the share tree roots and the encoding parameters.  Call them UEB1 and UEB2.  
UEB1 and any values derived from it can then be computed without doing FEC 
computations, and without choosing specific encoding parameters.

Based on UEB1, a client with the verify cap can verify the assembled 
ciphertext and a client with the read cap can verify the decrypted plaintext.  
What they can't do is to verify the integrity of a specific share.

Putting the UEB2 in the shares is the proximate solution to share validation, 
but raises the issue of how to validate the UEB2.  Since it would be 
undesirable to allow anyone with read access to the file the ability to fake 
valid UEB2s, this requires introduction of an additional cap, a "share 
update" cap, which is not derivable from the read or verify caps.  I suppose 
you could also call it a "repair cap".

One way to do this, using the nomenclature from David-Sarah's Elk Point 
immutable diagram, is to add a W key, from which K1 is derived by hashing.  
In addition, an ECDSA key pair is derived from W.  The UEB2 is signed with 
the ECDSA private key, and the signature is the UEB2 verifier, stored with 
each share.  The "share update" cap would consist of the SID and the private 
key.  W could also be used as a 'master' cap from which all others can be 
derived.

Another possibility is to use the Elk Point mutable structure and fix the 
content by including the UEB1 data in the information hashed to produce T|U 
and signed to produce Sig_KR.  To retain the idempotent-put characteristic of 
Tahoe immutable files, W can be a content hash, rather than a random value, 
and KD must be derived from W or omitted from the series of hashes that 
produces S. It may be valuable for both security analysis and code complexity 
to make mutable and immutable files be very similar in structure.

The obvious downside of both of those approaches is that they introduce a need 
for asymmetric signatures, where immutable files previously required only 
hashing and symmetric encryption.  I don't think there's any way to maintain 
share integrity while removing the dependency of the caps on FEC parameters.

Personally, I think being able to re-structure the encoding without updating 
all of the caps is sufficient justification to accept the use of asymmetric 
signatures in immutable file buckets, and being able to generate caps without 
performing FEC computations is a very nice bonus.

Comments?

	Shawn.