[tahoe-dev] [pycryptopp] #2: choose+implement EC-DSA KDF: deterministic generation of private key from small seed

pycryptopp trac at allmydata.org
Sun Feb 28 23:31:41 UTC 2010

#2: choose+implement EC-DSA KDF: deterministic generation of private key from
small seed
Reporter:  zooko        |           Owner:  zooko
    Type:  enhancement  |          Status:  new  
Priority:  major        |         Version:  0.4.0
Keywords:               |   Launchpad_bug:       

Comment(by warner):

 We chatted on IRC the other week about what the next step should be.
 upon and implementing the KDF seems like the best one. Testing the KDF is
 challenge. Here's my current favorite approach.

  * the C function (in {{{ecdsamodule.cpp}}}) to construct an ECDSA signing
    key should take a longint argument named {{{exponent=}}}, and raise a
    {{{ValueError}}} (or maybe {{{TypeError}}}) if its numeric value is
    outside the acceptable range (I think this is 1..q-2, where "q" is the
    size of the prime subgroup, but I may be wrong).
  * the C module should also export a constant, next to the constructor
    function, which allows Python code to learn what "q" is.
  * pycryptopp must also expose Poly1305 and/or XSalsa20, or whatever hash
    function we plan to use for the KDF.
  * the Python module should provide an {{{ECDSASigningKey}}} class, whose
    constructor takes an argument named seed=, which accepts a variable
   * internally, this constructor should call a python function named
     {{{KDF}}}, which takes the seed and "q", and invokes whatever hash
     functions, expansions, modulus computations, and try-try-again loops
     necessary until it comes up with an exponent in the correct range.
     it should call the C-based constructor.
   * we might like to say that omitting {{{seed=}}} causes the python code
     use {{{os.urandom(ceil(log2(q))))}}} as a seed.

 This moves the KDF logic out of C and into Python, where we can test it

 If we get concerned about speed, we could implement the KDF logic in C as
 well, but retain {{{exponent=}}} as an option (callers would either pass
 {{{exponent=}}} or {{{seed=}}} but not both), and then have tests which
 assert equality of keys generated with the C {{{seed=}}} argument against
 keys generated with {{{exponent=python.KDF(seed)}}}, to confirm that the
 C-based KDF code behaves the same way as the sample Python implementation.

 As to choice of KDF (specifically the "turn N random bits into a suitable
 random secret exponent" sort of derivation function), I think the four
 variants we discussed (ignoring choice of hash function) were:

  * 1 (try-try-again): guess=HASH(seed+counter), truncate to ceil(log2(q))
    bits, test against limit, if out of bounds then increment counter and
  * 2 (bigmodulo): expand the seed to perhaps 2*log2(q) bits, reduce it
    the limit, done. (actually it'd be more like e=1+(largesecret%(q-3)) or
  * 3 (truncate): expand the seed to floor(log2(q)) bits, done.
  * 4 (bigmodulo-and-try-try-again): expand the seed to 2*log2(q) bits, if
    that is larger than q*int(guess/q) (i.e. in the "tail"), try again,
    reduce modulo q and return

 The try-try-again choices (1 and 4) have the unfortunate property that,
 depending upon how close the group size is to a power of two, they might
 very hard to test (if q=2**128-9 then the chances of triggering the loop
 negligible, so writing a test to exercise that code would be hard). The
 opposite extreme is occasional cases of looping a large number of times
 q=2**128+9 then nearly half of the 129-bit keys would be out-of range,
 causing a loop, and although the average number of loops should be two,
 exponential distribution of loop counts means that sometimes you'll get 10
 100 or 1000 loops).

 The other choices (2 and 3) have the unfortunate property that there will
 some amount of bias in the resulting key, although at least David-Sarah
 believes that the 2-bigmodulo variant can reduce the bias down to a
 negligible size (less than a bit). 2-bigmodulo needs an efficient bigint
 modulo operation, which is probably not a big deal in the C code (and
 certainly not a problem up at the Python level). 3-truncate has up to one
 of bias, reducing a 128-bit key to more like a 127-bit key, but is
 super-simple and super-fast.

Ticket URL: <http://allmydata.org/trac/pycryptopp/ticket/2#comment:13>
pycryptopp <http://allmydata.org/trac/pycryptopp>
Python bindings for the Crypto++ library

More information about the tahoe-dev mailing list