# [tahoe-dev] How many servers can fail?

Greg Troxel gdt at ir.bbn.com
Wed Oct 26 14:00:29 UTC 2011

```Shawn Willden <shawn at willden.org> writes:

> I think it would simplify things greatly to further constrain share
> placement so that each server gets no more than one share, so that N,
> H and K all refer to servers.  I realize that there are some
> interesting things that can be achieved by setting N to be a multiple
> of the number of servers available, but in practice I don't think they
> add enough value to offset the conceptual complexity.

In a grid with a lot of servers S >> N, I can see your point.

I have a private grid that had 3 servers, and now has 4.  But I'm using
3/10 encoding, so that I can gradually add servers without having to
torque the world around.

I realize this is tricky business, but I've come to think that H isn't
the right concept.  What I really care about is knowing how many servers
can fail without causing me to lose files.  If I had 5 servers but was
using 3/10 (to enable migration to larger-N grids), then optimal share
placement would be 5x2, and I could lose 3/5 and reconstruct with any 2.
But with 4x1 and 1x5, I could lose at most 2/5 while being sure of
reconstruction.  Both situations have H=5.

It would be nice instead to have a way to express: place shares so that
I am guaranteed to be able to reconstruct if any R (for reconstruction)
servers which have a share have not failed.

For S >> N, then perhaps R = N - H.

For S << N, then H ensures that H servers have 1 share, whereas R will tend
to ensure that that shares are spread evenly.

As an example, consider 3/10 encoding with R=2 and S=5.

Placing 2 shares each on 5 servers meets that goal, and placing 1 on 1,
3 on 1, and 2 on the last 3 also does.  But any less good distribution
does not.  Here, there are many distributions which meet the same H
(even if H=S=5!) but only some disitrbutions meet R=2.

So, if it turns out I'm not confused, my proposal is to replace H with
R.
