[tahoe-dev] implementing Comb4P for serious use

Sun Jun 13 04:48:19 UTC 2010

Dear Dr. Anja Lehmann:

On Wed, Apr 21, 2010 at 5:35 AM, Anja Lehmann <anja.lehmann at minicrypt.de> wrote:
>
> thanks for your email. It would indeed be great to see our combiners used in
> practice or even in an IETF proposal! Just a few questions from my side ...
> who is actually "behind" that project? Is is just an open community or is
> there any company involved? And will all the implementations be open-source?

Why did you ask if there is a company involved? Did you apply for a
patent on Comb4P?

> From a theoretical point of view, truncating the final output harms a lot of
> security properties. It was shown that by truncating the output of a
> collision resistant hash function (considered as black-box) one would lose
> the collision-resistance guarantee [1]. A similar problem seems to exist
> with TCR and the MAC property as well. On the other hand, it is known that
> the PRF-property is closed under truncation, i.e., any truncated output of a
> PRF is still a pseudorandom value.

I'm aware of the theoretical result that you cite that a combiner of
two black-box hash functions can't preserve collision resistance while
outputting less than double-length outputs [1]. But in practice we
don't have two black boxes—we have two specific hash functions and a
combiner of them could in fact have collision resistance even with an
output length shorter than double-size. The performance and
human-factors advantages of a smaller output (e.g. 256 bits instead of
512 bits) may turn out to be enough that we need to consider a
construction which sacrifices those theoretical proofs but which seems
to be safe in practice.

For example, here is a notation for Comb4P (modelled on your thesis):

<i>₈ is the binary representation of the integer i in 8 bits.

Hⁿₓ(·)=Hₓ(<n>₈||·)

Hⁿ⊻(·)=Hⁿ₀(·)⊕Hⁿ₁(·)

Comb4P(M)=
    Tx←H⁰₁(M)
    Ty←Tx⊕H⁰₀(M)
    Tz←H²⊻(Ty)
    return Ty⊕H³⊻(Tz)||Tz

(Note: the only difference from your thesis is that this uses 8 bits
instead of 2 bits to separate the different uses of H.)

So given that we have Comb4P(M), then we might consider Comb4Pshorthash(M):

Comb4Pshorthash(M)=H⁴₂(Comb4P(M))

To me, it seems very unlikely that any real H⁴₂ (such as MD5, SHA1,
SHA-256, Tiger, or a SHA-3 finalist) would be susceptible to
collisions (much less other failures) when the inputs to it are so
constrained and so difficult for the attacker to manipulate.

A more efficient and simpler function would be Comb4Pshortxor:

Comb4Pshortxor(M)=left_half_of(Comb4P(M)) ⊕ right_half_of(Comb4P(M))

equivalently:

Comb4Pshortxor(M)=
    Tx←H⁰₁(M)
    Ty←Tx⊕H⁰₀(M)
    Tz←H²⊻(Ty)
    return Ty⊕H³⊻(Tz)⊕Tz

I agree of course that Comb4P is clearly better than Comb4Pshorthash
or Comb4Pshortxor when we can afford to store and use a full 512-bit
output value. Comb4P provably preserves four properties, and neither
Comb4Pshorthash nor Comb4Pshortxor can say that.

On the other hand, compare Comb4Pshortxor to H₀. It would seem
extremely unlikely that Comb4Pshortxor[H₀, H₁] is any weaker than H₀
in any real sense. (If it helps, imagine that H₀ could be SHA-256 and
H₁ could be a SHA-3 candidate such as Shabal-256.) By the same
argument, Comb4Pshortxor[H₀, H₁] is probably not any weaker than H₁.

So, while we should certainly keep in mind that Comb4P has provable
security properties that Comb4Pshort* lack, I think it is quite
reasonable to assume that using Comb4Pshort* is safer than using a
single hash function.

How much it matters whether the output is 256-bits or 512-bits is an
empirical question that remains to be seen. Our Google Summer of Code
Student Yu Xue will hopefully write benchmarks for Comb4P before the
summer is out [2]. Prof. Gligoroski has invited his students to
measure SHA-3 candidates in the context of Tahoe-LAFS [3] (see also my
follow-up: [4]). François Deppierraz has posted some measurements of
the current Tahoe-LAFS (which uses SHA-256d and AES-256) on his
low-power ARM system: [5].

Another reason that it might turn out to matter is human-factors
rather than performance. We use the outputs of hash function in our
keys, and the shorter those keys are, the better it is for user
interfaces which choose to expose those keys directly to users: [6].

We use secure hash functions in various ways in Tahoe-LAFS but in most
cases we use the secure hash function in a Merkle Tree.

In addition to the Merkle Trees that we already use, I recently got
very excited about the prospect of using a Merkle Signature Scheme
(e.g. [7]) for our "100 Year Cryptography" project's digital
signatures. Those schemes have traditionally been too inefficient in
CPU, especially for key generation, but I'm hoping that we can
overcome that performance problem. The concrete CPU performance of the
underlying hash function could make a signficant difference in whether
such a scheme is efficient enough for our use.

If we were to use Comb4P as the underlying hash function in a Merkle
Tree, the performance difference between a full 512-bit hash function
output and a 256-bit hash function output could be almost a factor of
two in the overall performance of the Merkle Tree.

On the other hand, our experiments and measurements might tell us that
the full Comb4P is efficient enough. I hope so!

Regards,

Zooko

[1] Krzysztof Pietrzak. Compression from Collisions, or why CRHF
Combiners have a Long Output. CRYPTO 2008.
[2] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-May/004377.html
[3] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-March/004108.html
[4] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-April/004272.html
[5] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-March/004150.html
[6] http://tahoe-lafs.org/trac/tahoe-lafs/ticket/882# Tahoe URIs and
gateway URLs are too long and ugly
[7] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.1374&rep=rep1&type=pdf