[tahoe-dev] Interesting hashing result
zooko at zooko.com
Sun Feb 15 16:04:33 UTC 2009
Thanks for the information. Could you give me a couple more details
-- what sizes of files are in your test set, how long does it take
with cold cache, and how long with warm cache?
Also, what is this signature that you are generating? Is it
something that is generated and used just by your backup tool?
Tahoe uses a strong hash function to generate the immutable file caps
from the ciphertext so that we get this property:
The "At Most One File Per Immutable File Cap" Property:
There can exist (in this universe, in the forseeable future) at
most one file matching any given immutable file capability.
In cryptographic terms, this is called "collision-resistance".
This property might not be needed for some applications, but it is
very tricky for application writers to know when they can safely rely
on a weaker property, such as the property that cryptgraphers call
(You, Shawn, might already be familiar with all this, but I'm
spelling it out for the benefit of other readers also.)
Unfortunately the SHA-256d secure hash function imposes a significant
CPU overhead. Currently I think that all actual uses of Tahoe are I/
O-bound anyway, and have ridiculously overpowered CPUs anyway, so I
don't think this CPU-overhead is causing an actual performance
problem for anyone in practice, but hopefully Tahoe will move into
more and more use cases, and as it does this might become a problem.
In the future, we could switch to a faster hash function which is
still secure. I've been eyeing the Tiger hash function for a long
time -- it takes about 1/3 as many CPU cycles to hash things as
SHA-256 does and its output size (192-bits) is more fitting to the
rest of our system than SHA-256's 256-bit output size. However,
there is a good chance that Tiger could be proven to lack collision-
resistance in the forseeable future, and I don't think taking that
risk is currently worth saving those CPU cycles.
In the year 2012 (hey, we're living in the future!), the new SHA-3
hash function will be chosen. That function will also, I hope,
require about 1/3 as many CPU cycles as SHA-256 does while being a
safer long-term bet.
More information about the tahoe-dev