[tahoe-dev] big data usability question

Raoul Duke raould at gmail.com
Sat Nov 5 17:06:22 UTC 2011


please let me know a better / good place to ask such a question?

when you have small data, you can study it every time you want to do
something to it. you can pretty quickly e.g. run rsync over the whole
set of data vs. another set to see what the diffs are. there's a sense
of being able to see the whole thing, of being able to get a snapshot
of it, of having some kind of transactional feeling to it. even though
it isn't guaranteed to be race free, nevertheless it feels good enough
because you probably are the only one using the data, for example. and
you know where it all is, and it is all pretty quickly enumerable.
etc.

vs.

you have a f* ton of data and it takes rsync 3 hours to figure out
what is different, let alone to start the transfer of actual content
bits. and you are sharing the storage system with lots of other users.
if you are lucky  you can get in and ls around, but in the bigger case
you can't because it is a slow remote object store. the underlying
storage system is e.g. saving things across servers, with replicates,
and changing the names to store them based on checksum rather than on
the original file system tree the data came from etc.

thus,

it seems like now you are in some user experience where trying to
consider your personal collection of data there, you have to start
thinking in terms of eventual this that and the other, it all feels
like you are dealing in probabilities rather than in concretes like
you were when you were just doing rsync across 2 external usb drives
on your desktop at home in the office room.

like, how do i have a meaningful, happy, rewarding, informative,
correct, useful, etc. user experience when it try to rsync from s3
over and back and among openstack-swift or backblaze or whatever else?

[i have thoughts about it, but (a) they are half-baked just obvious
off-the-cuff ones and (b) i haven't seen or heard of systems that have
the underlying features and implementation let alone the higher-level
ux to satisfy my questions.]

sincerely.



More information about the tahoe-dev mailing list