[tahoe-dev] erasure coding makes files more fragile, not less
eugen at leitl.org
Wed Mar 28 17:22:52 UTC 2012
On Wed, Mar 28, 2012 at 09:00:56AM -0400, Shawn Willden wrote:
> The arguments make are the basis for the approach I (successfully) pushed
> when we started the VG2 grid: We demand high uptime from individual
Can you please tell more about the VG2 grid? I clean missed it.
> servers because the math of erasure coding works against you when the
> individual nodes are unreliable, and we ban co-located servers and prefer
> to minimize the number of servers owned and administered by a single person
> in order to ensure greater independence.
> How has that worked out? Well, it's definitely constrained the growth rate
> of the grid. We're two years in and still haven't reached 20 nodes. And
It doesn't surprise me at all, since I've never heard a single squeak
about it in the usual channels. (And I'm moderately well-informed
in such matters).
> although our nodes have relatively high reliability, I'm not sure we've
> actually reached the 95% uptime target -- my node, for example, was down
> for over a month while I moved, and we recently had a couple of outages
> caused by security breaches.
> However, we do now have 15 solid, high-capacity, relatively available (90%,
> at least) nodes that are widely dispersed geographically (one in Russia,
> six in four countries in Europe, seven in six states in the US; not sure
> about the other). So it's pretty good -- though we do need more nodes.
How large is the total storage capacity? What about introducer nodes, is
there just one?
> I can see two things that would make it an order of magnitude better:
> monitoring and dynamic adjustment of erasure-coding parameters.
> Monitoring is needed both to identify cases where file repairs need to be
> done before they become problematic and to provide the node reliability
> data required to dynamically determine erasure coding parameters.
More information about the tahoe-dev