[tahoe-dev] Interesting hashing result
shawn-tahoe at willden.org
Sat Feb 14 20:25:58 UTC 2009
Actually not all that surprising, but perhaps interesting enough to be worth
While looking a little at the performance of my file system scanning and
change detection code I noticed that the librsync signature generation is
about 4 times faster than SHAD-256 hashing. Since the signatures are about
1% of the size of the whole file, I get a 4-5x speedup by generating the
signature first and then computing the SHAD-256 hash of that for use as
my "content hash", as compared to hashing the content and separately
generating the signature. That is:
H(SIG(content)) is 5x faster than
In most cases the hashing and signature operations are I/O bound anyway, so
this doesn't matter that much in terms of reducing scanning time. The only
reason I noticed it was because my testing was operating repeatedly on the
same set of files which ended up cached in memory.
Still, it seems worthwhile to use the more efficient method just to avoid
spending cycles that could be used elsewhere (or just avoided to reduce power
The only possible concern here would be if the librsync signature algorithm
were to somehow fail to detect changes that SHA-256 alone would detect, or if
there were some way the cryptographic weakneesses of MD-4 (the "strong" of
the two checksums used by librsync -- I believe the "weak" is a CRC) could be
On the first issue, rsync is very widely used and has proved itself very
reliable, so I'm not concerned about that.
On the second issue, I don't think there would be any security concerns
anyway, given the application here, but certainly any issues that could arise
should be addressed by the application of SHAD-256.
Anyway, thought this might be of interest to someone.
More information about the tahoe-dev