[tahoe-dev] Tahoe performance

Brian Warner warner-tahoe at allmydata.com
Thu Feb 19 22:19:42 UTC 2009

On Thu, 19 Feb 2009 13:30:55 -0600
Luke Scharf <luke.scharf at clusterbee.net> wrote:

> > When rsync decides that the file might have changed, over a regular
> > network (e.g. ssh), it uses a clever differencing algorithm.

> That clears it up a bit! I don't know any of rsync's tricks that I
> couldn't discover with lsof or ls -a....

http://samba.anu.edu.au/rsync/tech_report/ has a paper on the
technique, which also spawned Andrew Tridgell's PhD thesis. The
"rolling checksum" is the snazziest bit, as well as the observation
that a fast (but no longer cryptographically-secure) hash like MD4 is
good enough for the use case, for two reasons. The first is that the
comparison scope is so small: one source file vs one destination file.
The second is that the threat model is so limited: the only party in a
position to take advantage of the hash's flaws is yourself. If you can
find two files A1 and A2 that have the same "strong" hash (as well as
the same weak checksums), and put A1 on the destination machine and A2
on the source, then 'rsync ./A2 remote:A1' would fail to properly
replace A1 with A2. But they're both your own files anyways. Heck,
MD*1* (if there ever even was such a thing) would be good enough for
this purpose.


