[tahoe-dev] automatic repair/renewal : where should it go?
shawn at willden.org
Sat Aug 29 23:20:28 UTC 2009
Oops, forgot to address a couple of questions.
On Thursday 27 August 2009 02:18:52 am Brian Warner wrote:
> So.. does this seem reasonable? Can people imagine what the schema of
> this persistent store would look like? What sort of statistics or trends
> might we want to extract from this database, and how would that
> influence the data that we put into it? In allmydata.com's pre-Tahoe
> "MV" system, I really wanted to track some files (specifically excluded
> from repair) and graph how they degraded over time (to learn more about
> what the repair policy should be). It might be useful to get similar
> graphs out of this scheme. Should we / can we use this DB to track
> server availability too?
I think it would be very useful to track both share loss and server
availability, for lots of reasons even beyond the needs of the repairer.
I think the repairer should also track its backlog, and display that through
the web API (just a number, so no security concern), so that the user can see
when the repair process is falling behind.
> How should the process be managed? Should there be a "pause" button? A
> "go faster" button? Where should bandwidth limits be imposed?
Hehe. Here's my wish list:
I'd like to see a global Tahoe bandwidth limitation. I'd like to be able to
specify up and down rate limits and ensure that all Tahoe operations fit
within those limits. I'd also like simple CLI and WebAPI controls to modify
those limits, to allow the creation of tools that dynamically adjust the
limits. In the absence of user configuration of the limits, I'd like Tahoe
to automatically determine the available bandwidth that can be used without
impacting latency, and auto-set the limits just below that level.
I'd also like to be able to specify soft allocations of those limits. X% for
the repairer, Y% for requests from other nodes and the remainder for this
node's work. If any of those categories of work are using less than their
current allocation, other categories may use that portion.
Yeah, that's well beyond what you were asking. If anyone is interested in
doing something like that, though, I'm willing to write the bandwidth
accounting code. One of these days I'll invest the time to actually
understand the upload/download code...
> Can we do
> all of this through the webapi? How can we make that safe? (i.e. does
> the status page need to be on an unguessable URL? how about the control
> page and its POST buttons?). And what's the best way to manage a
> loop-avoiding depth-first directed graph traversal such that it can be
> interrupted and resumed with minimal loss of progress? (this might be a
> reason to store information about every node in the DB, and use that as
> a "been here already, move along" reminder).
It might make for a large DB, but information about every node could provide
very useful statistics, as well as helping with restarts. Ideally, it would
be nice to have share location information as well as a log of repair work
performed. With that and some statistical calculations a lot could be
learned, and perhaps some useful predictions could be made as well.
More information about the tahoe-dev