Tahoe as git remote

Greg Troxel gdt at ir.bbn.com
Fri Oct 23 13:20:58 UTC 2015


Jean-Rene David <tahoe-dev at levelnine.net> writes:

> I would like to use tahoe as a storage for my
> personal git repositories. The idea would be to
> have a local git repo for live work, and a
> tahoe-based bare remote where I could push my work
> and clone from. 

That seems sensible.

> From the little I understand of tahoe, this could
> be a viable use-case since most files in a git
> repo are small and unmutable. 

I would say "most files are not written often".   Even a bare remote
will run gc and create new packs.

> The problem I would like to solve is how to get
> git to write its objects to tahoe. Is sshfs my
> only option? I read that this may not be entirely
> reliable. 

This is my biggest complaint about tahoe; the universal interface to
filesystems is throught the OS VFS layer, and tahoe has been mostly
living in a world where people are expected to run special tahoe
commands.

Part of the reason is that that tahoe has features that don't fit neatly
in the POSIX filesystem spec.   I see that as an opportunity to grow the
interface to do things that multiple filesystems need, rather than to
reject it and expect the world to somehow put tahoe-specific code in
various places.

> Are there other options? I'm perfectly willing to
> write some code. But I'm not sure where to start. 

I see two reasonable ways forward.  One is to test and/or debug the
sshfs approach, and perhaps finish how authentication is handled.  It
would be reasonable to run filesystem tests on it.

The other is to implement a FUSE interface for tahoe.   This could be a
program in python that does tahoe ops using the existing code and takes
requests from FUSE.   This will run into the same issues that sshfs has,
and that other distributed filesystems have, which is that the posix
interface allows arbitrary writes to a file, which turns into a need for
read-modify-write.  But, it's fairly normal to only write to the cloud
on when the close system call happens, which means a usage pattern of
open/write/write/write/close can result in a put without having to get,
and with only a single put.   coda does this, and it worked reasonably
well.

All that said, one of the things missing in tahoe is caching, where
copies of files from the grid are kept locally to make reads more
efficient.  In coda's case, there is write-back caching, so
open/write/close is fast, and then the changes are put back to the
servers.  But all of this raises the spectre of locking and conflicts -
which are quite avoidable if you only use the distributed fs from one
place at a time.

It might be that caching should be a layered FUSE fileysstem that
presents a cache to the user while using an uncached fs.  I think there
are read-only caches.  But this is tricky because once you have caching
you more or less need cache invalidation.  Coda has all this - when a
user on one system opens a file for write it gets a write lock from the
servers, and when it's written the other servers get notified and
invalidate their local caches (details fuzzy, but the point is right).


Greg
resident old-school Unix crank
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 180 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20151023/2fe688c2/attachment.asc>


More information about the tahoe-dev mailing list