[tahoe-dev] why hyperelliptic curves?

Zooko O'Whielacronx zookog at gmail.com
Wed Sep 9 13:58:16 UTC 2009


On Fri, Sep 4, 2009 at 3:44 PM, Hal Finney<hal.finney at gmail.com> wrote:
>> Are there any other types of digital signature scheme which can have public keys shorter than 2*security level bits?
>
> Well I suggested the small DSA variant. David Hopwood showed a shortcoming but it sounded like it didn't apply to your usage.

Oh, of course.  I'm sorry I didn't refer to your letter [1].  I guess
the problem is that I don't understand it yet.  And I don't understand
why it isn't vulnerable to the generic attacks on
discrete-log-in-groups that Tanja Lange mentioned [2].  I'll study
your letter more.

> More generally, what would happen if you replaced an arbitrary public key with its hash, and used that for the URL; and then included the full public key (or at least information sufficient to regenerate it) in the signature? The sigs would get much bigger but the key could be much smaller. Does that tradeoff work for you? Are its security implications worth exploring?

Yes!  In fact that is exactly what we do in the current version of
Tahoe-LAFS.  See figure 2 on page 3 of lafs.pdf [3].  Maybe we should
continue to use this "hashes-in-caps" structure in the next version of
our caps.  It allows us to use whichever signature scheme we like best
[footnote 1].

The drawbacks to this approach are (a) it is complicated compared to
the semi-private-key-based design, (b) the simple implementation of it
(with a separate verification hash and symmetric key in each read-cap)
makes for read-caps just as large as those of the
semi-private-key-based design, (c) it requires access to the storage
servers in order to perform a "diminish" operation to get a read-cap
from a write-cap, and (d) if all of the storage servers die then not
only do you lose whatever was in the file but you also permanently
lose the ability to write new things into that file.

lafs.pdf [3] shows how the current mutable file cap design could be
simplified by using semi-private keys instead (see figure 3 on page
5).  The "complication gap" between the two structures grows larger
when we add the desire for deep-verify caps (what Brian has called
"traversal caps").  See Brian's letter [7] which is the best current
summary of the two cryptographic structures used to build a 4-level
hierarchy of caps (deep-write, deep-read, deep-verify,
shallow-verify).  (See also [8] for an argument against creating the
fourth level of the hierarchy -- shallow verify caps -- and instead
letting every storage server have a deep-verify cap for each share
that it stores.)

In my letter [9] to which Brian's [7] is a reply I alluded to a
"compression trick" by which we could perhaps have smaller read-caps
in the "hashes-in-caps" structure.  In [7], Brian said that even
though I hadn't told him what "compression trick" I meant, he strongly
suspected I couldn't do that and also enable servers to verify the
correctness of the storage index.  He was right.  However, my
"compression trick" might save enough bits that we could then spend
some bits to add a separate storage index into the read-cap and still
have a shorter read-cap overall.

I've now finally posted the "compression trick" idea: [10].  Here is
how I was thinking of using it in mutable files.  Look at figure 2 on
page 3 of lafs.pdf [3] again.  See how the read-cap is the
concatenation of the read-key and the verify-cap?  What if it were
instead a single value that protects the confidentiality of the
read-key and the integrity of both of those values?

In his letter [7], Brian wrote: "there's no way to get a single bit to
perform both jobs (and still be able to derive a storage-index from
any cap)".  He's exactly right -- if we used this compression trick
then what would we use as a storage index?  (A storage index is the
means by which a client can indicate to a server *which* of the shares
that the server is holding is the one that the client wants.)

If the storage index is derivable using nothing but the read-key then
the storage server (which isn't allowed to see the read-key) will not
be able to verify that it is storing a given share under the correct
storage index. Imagine an attacker who asks a storage server to hold
onto a share of file A but tells the storage server that it belongs
under the storage index of file B (to which the attacker doesn't have
a write-cap), in order to mess up the storage server's handling of the
latter.  On the other hand if the storage index is *not* derivable
using nothing but the read-key, then the read-cap will need to include
other information in addition to the read-key so that the holder of
the read key will be able to explain to a storage server which share
he wants.

However, this compression trick could save enough bits in the read-cap
that we could then add in some bits for storage index and still have a
shorter read-cap.  Suppose that the read-cap was composed of a 128-bit
crypto value giving confidentiality and integrity over the read-key
and the verify-cap, plus 20 bits which are the first 20 bits of the
verify cap.  The Tahoe-LAFS download protocol currently in v1.5
consists of two phases, in the first phase the downloader asks the
server "What shares of the following storage index do you have?" and
in the second phase it asks a subset of the servers "Please send me
share such-and-such".

With this read-cap of 128-bits crypto plus 20-bits storage index, the
reader would first calculate an identifier of the read-cap (i.e. the
secure hash of the read-cap) and then in the first phase of download
ask the storage server whether it is holding any shares with the
following 20-bit storage index and the following
identifier-of-the-read-cap.  (The server can check that the verify-cap
is correct -- i.e. that the verify-cap is linked to the public key
that was used to sign the share, but it can't verify anything about
the identifier-hash-of-the-read-cap, so whenever a writer writes a
share, the server checks the digital signature on the verify-cap but
just accepts the hash-of-the-read-cap blindly.)  The server's reply in
phase one would include all of the verify-caps that it holds which
match that storage index and hash-of-read-cap.  In the second phase,
the reader would ask for specific shares by their full verify-cap, not
just their 20-bit prefix of it.

We can get away with using a small number of bits such as 20 because
brute-forcing this value doesn't let an attacker violate anyone's
confidentiality or integrity -- all it does is let him impose a very
small added cost to the storage server and to the downloader.  It
costs an attacker approximately 2^20 computation to generate a
verify-cap whose first 20 bits match the first 20 bits of someone
else's verify-cap.  Having done so, the attacker can upload his new
share to the storage server.  The only effect is that when the
downloader asks the storage server if it has shares, it replies with
*two* verify-caps rather than one, and the downloader tries each of
them in turn to find the first one that fits his read-cap.  If the
attacker wants to add another colliding storage index, he'll have to
perform another 2^20 computation (approximately) to find one.  This
attacker would have spent about 2^20 times the cost of calculating the
verify-cap (which costs about 8000 CPU cycles) in order to cause the
downloader try at most one extra verify-cap (which also costs about
8000 CPU cycles).  Also he forces the server and the downloader to
transfer an extra verify-cap, which is about 16 bytes.

(There is another attack that someone could do which is to spend
massive CPU to generate a whole bunch of collisions for their own file
and then upload them, just to stress the server's handling of many
colliding verify-cap prefixes or to then ask someone to read their
file in order to stress the reader's handling of many colliding
verify-cap-prefixes.  You gotta be a pretty desperate DoS attacker to
resort to that kind of junk though.)

What's the point of the added hash-of-read-cap that the server can't
verify?  To minimize accidental collisions among legitimate users.  A
typical storage server in 2009 probably holds a million shares, but
you could imagine that maybe in a few years someone will want to run a
single storage server with many millions or even billions of shares on
one server.  (Note: you can always run multiple storage servers using
the same filesystem, so you don't *have* to put billions of shares
into a single storage server, but for practical reasons you might want
to.)  If legitimate writers include the real hash-of-the-read-cap when
writing shares then there will never be accidental collisions among
legitimate writers.

Here is what read-caps look like in the three schemes --
semi-private-key ECDSA, hashes-in-caps, and
hashes-in-caps-with-compression-trick.  In all cases we get a 128-bit
security level for confidentiality and integrity.

For semi-private-key-ECDSA that means we need an elliptic curve of 256
bits.  That means the read-cap in semi-private-key-ECDSA would be 257
bits (256 bits for the x-coordinate of the curve point and 1 bit for
the sign).

Example: lafs:R_0PpJXAJEAGu1hNLzf1DpdoIrViDHJ5EHsm7LRJ1genKF

For hashes-in-keys, we need 128 bits for the symmetric read-key and
128 bits for the secure hash of the public key.

Example: lafs:R_Nnlb5qlTV2LhccUWEIy7qZrXiHZzBtHKJiasLtGOkxq

For hashes-in-keys-with-compression-trick, we could have 128 bits for
the object which secures both the the symmetric read-key and the
public key, and another 20 bits for the prefix of the verify cap.

Example: lafs:R_3AeIEJs52QXFMGYqalmuxSYx0

I strongly value small caps.  I want Tahoe-LAFS to be widely used.
See ticket #217 for a littany of complaints from users about the
current caps (which take 130 characters, like this:
http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa
).  As another example, see twitter and identi.ca and associated "tiny
url" services.  As another example, observe that allmydata.com, after
funding the development of Tahoe-LAFS and its 130-character caps, then
went ahead and funded the development of a tiny-url service on top of
Tahoe-LAFS, which tiny-url service produced 68-character caps, like
this: //www.allmydata.com/webdrive/tiny/2cxq4ucnwqjkloykf9b7bbr1xrvc .
 (From my recollection of the discussions about the development of
that tiny-url service, the fact that the resulting tiny-urls are
revocable was considered a plus but not a requirement.  The fact that
they were smaller and more acceptable to users was the hard
requirement.)  It is hard to measure something like "How much does
end-user-aversion go up as random-character URL length goes up?", but
I'm sure that the curve is significant, and that 130 characters is way
past the tolerability level.  I also strongly suspect that a scheme
that fits in 32 characters, like lafs:R_3AeIEJs52QXFMGYqalmuxSYx0 will
enjoy more widespread use than a scheme that requires 50 characters,
like lafs:R_Nnlb5qlTV2LhccUWEIy7qZrXiHZzBtHKJiasLtGOkxq .

However, I also strongly value simplicity of design and
implementation.  Too bad we can't have both!

Okay, I've laboriously typed up this letter over the course of about a
week while riding the bus to work, while on lunch-break, and during
the hour that elapses between the time my children fall asleep and the
time I fall asleep.  During this period, I've noticed that Brian
Warner and David-Sarah Hopwood have been busily writing back and forth
to each other, and I wouldn't be surprised if by now they've already
invented this idea or something even better.  :-)  But I'm going to go
ahead and send this note now, and I'll probably read Brian's and
David-Sarah's posts on the bus to work.  :-)

Regards,

Zooko

footnote 1: what signature scheme do I like best?  Some variant of
Rabin-Williams apparently has a good security proof [5], and Paul
Crowley's argument in favor of a security proof appeals to me.  AGL's
implementation thereof, rwb0fuz, has excellent verification speed [6]
-- almost 1/100 the cost of ecdsa-192 or hector!  However, it costs
2.5 times as much as ecdsa-192 or 11 times as much as hector to sign,
and it costs 100 times as much as ecdsa-192 or 500 times as much as
hector generate a new keypair (using the benchmarks on the Intel Atom
chip in 64-bit mode).  Overall, it looks like ecdsa is the fastest
mature algorithm even ignoring its advantages in public key size, and
hector is a very promising candidate for the *next* next version of
Tahoe-LAFS caps.  ;-)

Thanks to Samuel Neves for help with brute force probabilities on IRC.

[1] http://allmydata.org/pipermail/tahoe-dev/2009-August/002709.html
[2] http://allmydata.org/pipermail/tahoe-dev/2009-August/002689.html
[3] http://allmydata.org/~zooko/lafs.pdf
[4] http://allmydata.org/trac/tahoe/browser/docs/specifications/mut.svg
[5] http://cr.yp.to/sigs/rwtight-20080201.pdf
[6] http://bench.cr.yp.to
[7] http://allmydata.org/pipermail/tahoe-dev/2009-July/002345.html
[8] http://allmydata.org/pipermail/tahoe-dev/2009-July/002302.html
[9] http://allmydata.org/pipermail/tahoe-dev/2009-July/002314.html
[10] http://allmydata.org/pipermail/tahoe-dev/2009-September/002796.html

tickets mentioned in this letter:
http://allmydata.org/trac/tahoe/ticket/217 # DSA-based mutable files
-- small URLs, fast file creation



More information about the tahoe-dev mailing list