[tahoe-dev] Upload failures when shares.happy == shares.total

Shawn Willden shawn at willden.org
Thu Oct 20 17:17:25 UTC 2011


It's taken a year to grow the Volunteer Grid 2 to a point where I'm
ready to use it for my regular backups, but we have succeeded (yay!).
So yesterday I started uploading some 200 GB of data to it.  Things
went well for a few hours, then I started getting occasional
Unhappiness exceptions, saying that there weren't enough servers to
accept my shares.

That was odd because there were 14 nodes in the grid, all of them
active and available, and I was quite certain that all of them were
accepting shares and had plenty of storage available (every node on
the grid has > 450 GB available, and many of them are close to 1 TB),
and I had shares.happy set to 12.  But the exception reported that
only 11 storage nodes were available to accept my shares.

I decided to temporarily set shares.total to 14 and lower shares.happy
to 11, so I could look at information about a successful upload to see
which of the 14 nodes weren't accepting data.  It turns out that *all*
of them were accepting shares, in fact my uploads were happily
succeeding with 14 shares delivered.

My theory is that perhaps the uploader is attempting to contact only
shares.total servers and perhaps one of them is responding a little
too slowly, making the uploader believe that < shares.total is
accepting shares, so the upload fails.  By setting shares.total >
shares.happy, even if one is a little slow, there are still enough to
allow the upload to continue... and the slowpoke ends up responding
and receiving a share.

Does that make sense?  Any thoughts on whether or not this behavior is "right"?

(Aside:  For anyone looking for a really good grid for backups, VG2 is
still looking for new members who are interested in a fast,
high-capacity, highly-available grid.  My backup is running between
200 and 300 KBps for large files (>2MB).  Key requirements to join are
that you have to contribute at least 500 GB of storage and you have to
commit to keeping your node available at least 95% of the time.  Fast
network connections are a plus.)

--
Shawn



More information about the tahoe-dev mailing list