Whoosh + Tahoe = distributed search engine for documents in Tahoe

Matthieu Rakotojaona matthieu.rakotojaona at gmail.com
Tue Jul 8 18:48:34 UTC 2014


Excerpts from Okhin's message of 2014-07-08 15:22:42 +0200:
> Ohai,
> 
> First post here.
> 
> I'm working on a python module to extend whoosh[1] with a Tahoe LAFS
> storage engine.
> 
> Whoosh is a search engine in pure python. It can be used to index
> documents and metadata and search through them - well, what a search
> engine is supposed to do.
> 
> By extending it with a storage over Tahoe it allows to have a CAP
> containing documents and the associated metadata indexed by whoosh and
> to access it directly from a client (and some python code). That way you
> do not rely on a single directory to manage and search through the
> collection of document.
> 
> The code is in Le Loop gitlab[2] and I've pushed the module through
> Pypi[3].
> 
> If you have comment, patch, ideas, it will be mostly welcome :)
> 
> --
> [1]: https://whoosh.readthedocs.org/en/latest/
> [2]: https://git.leloop.org/okhin/tahoe-whoosh
> [3]: https://pypi.python.org/pypi/Tahoe-whoosh

Hello,

At first I thought this was some "helper" to index and search all the
data you put in Tahoe, but in fact it's a standard Whoosh instance that
stores the data you feed to Whoosh inside a given Tahoe capa. So you can
inherit tahoe's distribution and security properties to distribute your
full-python search, all you need is to share a capa.

That's awesome !

Do you use it on some useful (ie more than test files) corpus ? What
performance can we expect from it ?

-- 
Matthieu Rakotojaona
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20140708/72321227/attachment.asc>


More information about the tahoe-dev mailing list