[tahoe-dev] wanted: a permanent copy of everything I've ever looked at through my web browser

Bill Janssen janssen at parc.com
Wed Feb 17 07:03:35 UTC 2010


I did this for UpLib last year.  I wrote (in Python-Cocoa) a small Mac
Menubar frob that sat there and watched (via appscript and System
Events) what you were doing, by asking System Events every second what
the "foremost" application was.  If it was a document handler (XCode,
Powerpoint, Safari, iTunes, etc.), I then asked (via Python appscript
again) the application for the URL of the document it was working with
or looking at.  I then checked UpLib to see if I had a copy.  If not, I
asked the app for a copy, and stuck it in UpLib.  If so, I did a SHA
hash of the bits, to verify that the doc hadn't changed.  If it had, I
fetched a new copy from the app, and stashed that copy away, too, with a
pointer to the previous version.  Firefox is hard, because it doesn't do
Applescript, but Camino does.

Pretty straightforward.

I believe UpLib satisfies your 4 goals, and is open source at
http://uplib.parc.com/, but that "watch-and-copy" bit of code isn't in
the released versions yet.  UpLib stores everything in non-lossy form,
and stores everything as files in the file system, so it should run just
fine on top of tahoe.

All that being said, I'm not sure you really want what you say you want.
Our experience is that explicit curation (that is, a button on your
browser that you press when you want to save something to your
collection) seems to work better.


More information about the tahoe-dev mailing list