[tahoe-dev] [p2p-hackers] measure your convergence

Peter Secor secorp at secorp.net
Thu Mar 20 22:00:43 UTC 2008

Thanks Zooko, this looks cool.

Just as added information, one backup use case that many people are 
following is to copy over their home directory every day to something 
like "YYYY-MM-DD_myhomedir" on their virtual drive. This takes advantage 
of the effect of per-user convergence without needing to converge with 
other people's files.


zooko wrote:
> Folks:
> Ever wondered how much storage space you would save if you and your  
> friends coalesced all of your identical files?
> Wonder no longer!  Now you can find out!  Install the "dupfilefind"  
> utility [*] and run it with command-line arguments like:
> dupfilefind --ignore-dirs="," --min-size=32 --profiles
> (It probably works on all operating systems.)
> It will recursively examine all files reachable from the current  
> working directory and spew out a series of "hashcode filesize" pairs,  
> where the hashcode is the least significant 8 bits of the adler32  
> checksum of the first 8192 bytes of the file.
> It will also mention whenever it finds two separate files on your  
> system which are identical with each other.
> Send the output to your friends, or to me -- zooko at zooko.com -- and  
> we'll find out approximately how many of your files are shared with  
> other people who submit results.  (Please compress your output with a  
> good compressor like 7zip or rzip or bzip2.)
> You take full responsibility for leaking all this information about  
> your files -- namely their 8 bit adler32 sums of their first 8192  
> bytes, and their file size.  Also, in case duplicate files are  
> detected on your system, their device number and inode number.
> Regards,
> Zooko
> [*] To install the dupfilefind utility, either download this tarball:
> http://pypi.python.org/packages/source/d/dupfilefind/ 
> dupfilefind-1.1.2.tar.gz#md5=af8de6f3ead053e326389a9a87b0a11d
> untar it, cd into the resulting directory, and run:
> python setup.py install
> or else install the easy_install tool:
> http://peak.telecommunity.com/DevCenter/EasyInstall#installing-easy- 
> install
> and then run:
> easy_install dupfilefind
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at lists.zooko.com
> http://lists.zooko.com/mailman/listinfo/p2p-hackers

More information about the tahoe-dev mailing list