[tahoe-dev] how to encrypt and integrity-check with only one value (was: Re: two reasons not to use semi-private keys in our new cap design)

Brian Warner warner at lothar.com
Fri Sep 11 10:04:53 UTC 2009


Zooko Wilcox-O'Hearn wrote:

>>  So we need a new uploader protocol that lets us upload to an as-yet-
>>  unnamed slot, and then provide the slot's storage-index at the very
>>  end of the process.
> 
> Remember that I really, really want this anyway, because this is  
> necessary to have "one-pass" == "on-line" upload.

Yeah, I'm ok with that. It requires a change to the protocol, which is
significant work, but the benefits are worth it.

> Argh!  You are right!  Another few bits needed in the readcap!  Boo  
> hoo.  :-(

Just to be clear, both the set-the-SI-afterwards uploader protocol and
the need for a separate SSI are for *immutable* files. We don't need the
extra SSI bits for *mutable* files, because we can derive everything we
need from "R" in the readcap.

> Re: separate SSI (server-selection-index) value, what makes you  
> nervous about it?

I guess it's just that we don't really know how long to make it, and
that it is tied closely into the server-selection policy. Which means
the cap-format parse-tree gets fuzzy. We'll start by defining a cap
format which means "use Tahoe2, given an SSI from a field defined as
follows..". But then to enable the second person to use a different
policy, we'll have to define a second cap format (with some sort of
version identifier so you can tell the difference), which means "use
Tahoe17, with this other SSI thingy from over here", or to derive the
SSI from existing field, or to not use an SSI at all.

I'm all for crazy server-selection policies. It's just that I want
filecaps to include enough information to know which policy you're
expected to use. (I kind of want a gridid in there too, which is another
aspect of that policy decision). Otherwise the filecaps are
necessary-but-not-sufficient, and must be interpreted relative to
something (perhaps a grid and a single grid-wide policy).

[hm, but maybe this is ok: if you pass a filecap to someone using the
same grid+policy as you, then you can pass a short one, but when you
want to pass it to someone who doesn't know which grid+policy it's
coming from, you have to construct a larger cap that contains that extra
information. Sort of like fully-qualified domain names versus "www" in
the right context being interpreted as "www.allmydata.com"]

Anyways, I guess I'm anticipating confusion or contention in the filecap
format definition process, when we've got several different
server-selection algorithms competing for the shortest caps. But don't
worry, I'll get over that anxiety soon enough :).


before this line is for immutable files
======================================
after this line is for mutable files

> Re: server-side validation, what do you think of my proposal in [1]?   

I think I like it, but I'm still trying to get my head around it. I
think the terminology is confusing, and I need to draw up a picture.

How about I express your idea using David-Sarah's mutable picture, but
in which we define:

 A = H(K1enc, Kverify)
 B = H(R)

We can compute both from the writecap, with no server interaction. Then
make the following definitions:

 readcap = R + A[:20]
 server-selection-index = B
 verifycap = B + A
 storageindex = B + A
 partial-verifycap = B + A[:20]   (derivable offline from the readcap)

When the share is uploaded, the client uses the SSI (B, or a truncated
version of it) to decide which servers to contact. Then it uses the SI
(B+A) to index the share.

The server, on its own, can check the share contents up to A, and can
compare A against the SI that it was provided. It has to accept the
value of B on faith. (we can put B inside the signature, but that merely
proves that somebody with the given privkey agreed to use that value: it
doesn't prove that B is actually the right value for that privkey).

The readcap-holding client, when it wants to fetch the share, takes R
and computes B, truncates it to get the SSI, then contacts those
servers. Then it asks for all shares that have a storage index which
starts with (B+A[:20]).

(note: this requires changes to the server-side storage layout, so it
can answer queries like this correctly. If the client uses an incomplete
SI, which might not be unique, the protocol must allow the server to
return multiple hits, which adds an extra roundtrip unless we do
something clever)

(also note that the writecap-holding client can submit the full B+A
storage index and avoid all this ambiguity, so the protocol should
probably allow queries with variable-length SI values).

The job of the roadblock-attacker (who learns B+A by running a storage
server and seeing the client's first share upload) is to create some
alternate share, with a different pubkey Kverify' (so they can create a
valid signature that will pass the hash/sig check), such that:

 A' = H(K1enc, Kverify')
 A'[:20] == A[:20]

Then they submit this share to the other storage servers, with SI=B+A',
who accept it as valid. When a readcap holder tries to download, they'll
submit their B+A[:20] as usual, and will be given the attacker's share.

They'll notice the problem during their verification step. The
downloader will decrypt the correct K1 value (assuming the attacker left
K1enc unmodified, otherwise they'll get some other K1' value), but since
Kverify' is different, the R'=H(K1+Kverify') step will produce an R'!=R.

In practice, this verification is done once per "prefix": the part of
the share that includes the sequence number, merkle-tree root hash, the
other UEB data, Kverify, and the signature. if no funny business is
taking place then all mutable files (for the same version) will have
identical prefixes. An attacker who only worked hard enough to come up
with one 20-bit collision will cause the downloader to see two prefixes:
one for the good shares, and one for the bad ones.


So, someone with the writecap can verify everything, someone with the
readcap has a 20-bit advantage over the DoS'er, and someone with the
verifycap has a 128-bit advantage over the DoS'er. The storage-server
can verify everything except the "B" value, but the fact that readcap
holders know 20 bits of A makes it less valuable for an attacker to take
advantage of this gap. Disk errors that cause B to flip will be
unnoticed until the downloader is unable to locate their shares (since
they'll be asking with the wrong SI). Disk errors that cause two shares
to be swapped and also swap the "A" part of the SI (but leave "B" part
alone) will likewise not be detected until the downloader shows up.

I think this is good enough: those sorts of errors should be pretty
infrequent. The "B" bitflips could conceivably be tolerated by
downloaders if they had a way to ask for all shares with the given
A[:20] value and then try to decode each one, but that's certainly not
worth the effort.

I threw in the "partial-verifycap" there just to see what its properties
are.. they aren't actually very useful. It can be derived offline from
the readcap. Someone who holds it has enough information to find the
shares, but they'll only get 20 bits of validation on them, which is
trivial. So we'd still be stuck with online attenuation, possible
pre-computed and cached in the dirnodes or something.

I don't know what the storage-server share layout would need to look
like to support this sort of variable-length query, but I'm sure it's
doable. Likewise a protocol that allows the DYHB queries to return
either "yes, here's the data" (in one RTT), "no", or "we have several:
A,B,C, which would you like?" (using two RTTs) is non-trivial but still
feasible.


cheers,
 -Brian



More information about the tahoe-dev mailing list