use case: sharing media content between orgs

Gulyás Attila toraritte at gmail.com
Wed Dec 2 19:56:53 UTC 2020


Hi Jean-Paul,

I truly appreciate your detailed answer to my underspecified (to say
the least) question and you raised excellent points!

> At a very  high level, this sounds plausible. It  is true that different
> organizations can collaborate to form a Tahoe-LAFS storage "grid".

Thanks for the confirmation!

> We talk  about the "grid" a  lot but you really  have to apply a  lot of
> abstractions to  create this concept  with Tahoe-LAFS right  now (though
> there's some work underway to turn it into a more concrete thing).

I  believe  I  understand  what you  mean  here  but
would  you be able to provide more  info  about  the
work  currently underway?  (Trac ticket,  blog post,
anything, just to make sure I got it right.)

> A  Tahoe-LAFS "grid"  is just  one or  more storage  servers being  used
> together and  you have a lot  of flexibility around where  those storage
> servers come from.

Yes,  this  is  the   main  thing  that  grabbed  my
attention.

> At the content level, how would different organizations offer content to
> save other organizations labor?

> One  area that  you  probably want  to  think about  more  is how  those
> organizations coordinate with  each other in their use  of these servers
> and the creation and consumption of content on them.

The  original idea  was that  organizations wouldn't
use the "grid" directly; a service would be built on
top of  it via  the Tahoe LAFS  REST API  that would
also  have  public  APIs  for  other  organizations'
application(s)  to  consume.   Tahoe-LAFS  would  be
a  "flat"  file/object  storage layer  (holding  the
metadata  and hierarchical  structure  as well?)  and
the  service would  also  serve  as access  control.
Using Tahoe-LAFS  this way would of  course lose all
its  distributed properties  because of  the central
service  on  top,  but  it  would  provide  a  cloud
vendor-agnostic layer.

An example  would be that Org_A's  volunteers record
the Safeway ads, upload  it to this central service,
and other  organization check if there  is a current
Safeway  flyer recorded  for  a given  week, and  if
there is, they  wouldn't have to assign  it to their
volunteers.

All this may  be a very naive idea, and  the hope of
other organizations  willing to participate  in such
endevour may be a pipe dream anyway. (The governance
of  such  multi-organizational  collaboration  would
also be  tricky but  it's also off-topic  here.) The
main reason I looked at Tahoe-LAFS at first was that
when we were working on overhauling our 20+ year old
system, we started on Google Cloud Engine but had to
migrate to Azure; migrating  VMs was relatively easy
but each  vendor has vastly different  cloud storage
APIs,  limitations, specs,  etc. and  it would  be a
pain to go through this one more time.

> For example,  would it make more  sense to have  a fully open grid  or a
> private grid open only to participating organizations?

Thank  you  for  raising this  question  because  it
didn't  even   occurred  to  me.   Audio  information
services  provide a  lot  of  public domain  content
(e.g.,  old  time  radio  shows,  store  ad  flyers,
newsletters)  but   they  are   also  able   to  air
copyrighted materials  pursuant to  17 U.S.C.  § 121
(https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title17-section121&num=0&edition=prelim)
.  Ideally,  the  former  would  be  made  available
to  anyone and  disseminated as  freely as  possible
whereas  only print-disabled  individuals should  be
able to access the latter.

This  the part  where  I get  confused:  I guess  it
is  possible to  build the  above mentioned  central
service  to reflect  these  requirements, but  would
there  have  to  be  2  separate  "grids"?  I  mean,
volunteers could  also pitch  in storage  nodes, but
that would also mean that  they would have access to
all the data as well, correct?

> Perhaps each organization maintains a directory (or directory hierarchy)
> which only  it can write to  but which all other  organizations can read
> from.  This might  be  done on  an  ad hoc  basis or  with  a tool  like
> magic-folder (which  is currently  very much a  work in  progress). Then
> organizations would browse the read-only directories shared with them by
> other organizations  to see  if the desired  content already  exists. If
> found, they can retrieve and use it.  If not, they can create and upload
> it.

Even though  I mentioned earlier that  this wouldn't
be  a  direct  access  "grid",  I  still  appreciate
this  description because  (1)  the central  service
notion  may  not  be  viable at  all  and  (2)  this
straightforward example  made me understand  some of
the  concepts  I  struggled with  when  reading  the
manual.

> This seems  workable -  however, I  wonder if there  is an  advantage to
> using  Tahoe-LAFS over  another system.  For example,  Google Drive  and
> Dropbox would offer comparable experiences, I think, without the need to
> operate storage servers.

Oh, do  you mean  that Tahoe-LAFS  can be  used over
Google  Drive, OneDrive,  Dropbox, and  ilk? If  yes
then it would  make things way simpler  (and I'm not
sure how I missed this...).

> Or,  to  avoid  proprietary,  centralized systems,  NextCloud  has  file
> storage and  sharing capabilities.  It is not  a distributed  system but
> it's easy  to find  a commercial  offering that  could be  shared across
> organizations. This comes  at a cost - but so  does operating Tahoe-LAFS
> storage servers,  and I suspect  NextCloud hosting is  price competitive
> (unless volunteer labor can be discounted, perhaps).

Just looked up NextCloud and  will have to look into
it some more but I really like how anyone could just
chip in into a  Tahoe-LAFS based system. Then again,
if an organization  quits/goes out of business/etc.,
one  would have  to figure  out how  the lost  nodes
would affect  the "grid"  (if the "grid"  too small,
that is, right?). I may be also oversimplifying here
(hello Dunning-Kruger effect).

> Or maybe it  is the case that the loose  group of organizations actually
> benefit  significantly  from  the  distributed nature  of  Tahoe-LAFS  -
> perhaps because the  operation of the software more  closely matches the
> relationships of the organizations to each other?

This is so spot on that I feel mad not being able to
put this into words myself.

Thanks again!
Attila


On Tue, Dec 1, 2020 at 2:13 PM Jean-Paul Calderone
<jean-paul+tahoe-dev at leastauthority.com> wrote:
>
> On Fri, Nov 27, 2020 at 3:45 AM Gulyás Attila <toraritte at gmail.com> wrote:
>>
>> Hi,
>>
>> Would it be a valid use case for Tahoe-LAFS to share media content
>> among organizations?
>>
>> To elaborate, there are many reading services for the blind all across
>> the States (https://en.wikipedia.org/wiki/Radio_reading_service) and
>> volunteer effort is duplicated for certain topics (e.g., grocery store
>> flyers for large chains are the same for every state, yet services
>> have their volunteers read these every week, independently of each
>> other). These services all use different storage solutions (i.e.,
>> on-site servers and different cloud vendors) and based on what I read
>> so far Tahoe-LAFS could help to bridge this gap: organizations that
>> choose to join would be able to contribute storage space to the grid
>> and everyone would have uniform access to the content.
>>
>> Am I missing something? Thanks in advance!
>
>
> Hi Gulyás,
>
> At a very high level, this sounds plausible.  It is true that different organizations can collaborate to form a Tahoe-LAFS storage "grid".  We talk about the "grid" a lot but you really have to apply a lot of abstractions to create this concept with Tahoe-LAFS right now (though there's some work underway to turn it into a more concrete thing).
>
> A Tahoe-LAFS "grid" is just one or more storage servers being used together and you have a lot of flexibility around where those storage servers come from.
>
> It's true that each volunteer/service organization could operate one or more storage servers and that a Tahoe-LAFS storage client could be configured to use any and all of these servers for storage.  One area that you probably want to think about more is how those organizations coordinate with each other in their use of these servers and the creation and consumption of content on them.
>
> For example, would it make more sense to have a fully open grid or a private grid open only to participating organizations?  A fully open grid can accept contributions of resources from more participants but it also gives out storage access to more participants as well.  A private grid might make more sense but comes with operational security requirements to ensure it remains private.
>
> At the content level, how would different organizations offer content to save other organizations labor?  Perhaps each organization maintains a directory (or directory hierarchy) which only it can write to but which all other organizations can read from.  This might be done on an ad hoc basis or with a tool like magic-folder (which is currently very much a work in progress).  Then organizations would browse the read-only directories shared with them by other organizations to see if the desired content already exists.  If found, they can retrieve and use it.  If not, they can create and upload it.
>
> This seems workable - however, I wonder if there is an advantage to using Tahoe-LAFS over another system.  For example, Google Drive and Dropbox would offer comparable experiences, I think, without the need to operate storage servers.  Or, to avoid proprietary, centralized systems, NextCloud has file storage and sharing capabilities.  It is not a distributed system but it's easy to find a commercial offering that could be shared across organizations.  This comes at a cost - but so does operating Tahoe-LAFS storage servers, and I suspect NextCloud hosting is price competitive (unless volunteer labor can be discounted, perhaps).
>
> Or maybe it is the case that the loose group of organizations actually benefit significantly from the distributed nature of Tahoe-LAFS - perhaps because the operation of the software more closely matches the relationships of the organizations to each other?
>
> Does this make sense?  Does it help?  I'm happy to consider any follow-up questions.
>
> Jean-Paul
>
>
>>
>>
>> Appreciatively,
>> Attila Gulyas  |  IT/Program Assistant
>> Email: agulyas at societyfortheblind.org
>> Phone: (916) 889-7510
>>
>> Access News helpdesk:
>> (916) 889-7519
>> accessnews at societyfortheblind.org
>>
>> Access News system:
>> (800) 665-4667
>> (916) 732-4000
>>
>> Society for the Blind
>> 1238 S Street
>> Sacramento, CA 95811
>>
>> SFTB Main Phone: (916) 452-8271
>> Fax: (916) 492-2483
>> www.societyfortheblind.org
>>
>> Our mission is to empower individuals living with low vision or
>> blindness to discover, develop and achieve their full potential.
>> _______________________________________________
>> tahoe-dev mailing list
>> tahoe-dev at tahoe-lafs.org
>> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev



More information about the tahoe-dev mailing list