Whoosh + Tahoe = distributed search engine for documents in Tahoe

Okhin okhin at okhin.fr
Wed Jul 9 10:55:43 UTC 2014


> Hello,
> 
> At first I thought this was some "helper" to index and search all the
> data you put in Tahoe, but in fact it's a standard Whoosh instance that
> stores the data you feed to Whoosh inside a given Tahoe capa. So you can
> inherit tahoe's distribution and security properties to distribute your
> full-python search, all you need is to share a capa.
> 
> That's awesome !
> 
> Do you use it on some useful (ie more than test files) corpus ? What
> performance can we expect from it ?

I'm working on the telecomix broadcast system (broadcast.telecomix.org)
which started as a text file and ends up being a django website storing
media in a small tahoe grid.

And I'm planning on extending it to build a general purpose media
library to be used - for instance - by media organisation. Or, you know,
to just rebuild youtube for squatters.

From teh perf point of view … whoosh is surprsisingly fast (it's only
core python). But adding the tahoe lafs storage mechanically adds
latency. Didn't had the opportunity to test it in "real" life situation.

-- 
With datalove,
Okhin
:(){ :|:& };:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20140709/c196cfbf/attachment.asc>


More information about the tahoe-dev mailing list