[tahoe-dev] dogfood tasting -- robustness of the volunteer grid

zooko zooko at zooko.com
Thu Apr 9 19:23:32 UTC 2009

Today there was a major Internet outage in California.  This means  
that the mail server for the tahoe-dev mailing list is unreachable,  
which means you won't read this until after the problem has been  
repaired, and by then you may well know more about what happened than  
I know now.

One interesting during this process was that I got to see how well  
three different Tahoe grids stayed up and provided service during the  
outage.  The first is the allmydata.com grid.  Alas, all of the  
allmydata.com storage servers were cut off from the Internet, or at  
least unreachable from my home here in Boulder, Colorado.  Also the  
http://allmydata.com web site and the Tahoe webapi server, and the  
FTP server, etc.  Allmydata.com currently uses Tahoe to store  
customer files, and the massive robustness of Tahoe means that those  
files are likely to survive accidents -- they have good "durability"  
-- but the current network configuration means that they are not  
particularly more likely to be reachable during Internet outages --  
they have pretty normal "availability".  (Actually their availability  
is probably a bit better than most services, since disruptions which  
effect only one or a few of their storage servers do *not* reduce  
availability of their service to their customers.)

(Note: I am no longer an employee of allmydata.com and don't speak  
for them.  I am commenting on my observations or guesses about their  
situation.  Peter or Zandr could correct me if I am wrong.)

The next grid I looked at is the test grid.  This is running mostly  
on servers contributed by and operated by allmydata.com, plus whoever  
else boots up a Tahoe node and connects to it to experiment.  I  
couldn't connect to any of the servers on the test grid because the  
test grid introducer itself is inside the network outage area.  I'm  
pretty sure that even if I could reach the introducer, that all of  
the files on the testgrid would be unreachable due to so many  
testgrid servers being inside the outage area.

The next grid I looked at is the volunteer grid.  I was at at  
coffeeshop in Boulder when I started wondering about this, so I tried  
to connect my laptop to the volunteer grid.  Alas -- the volunteer  
grid introducer is running on my server -- nooxie.zooko.com -- which  
is in the same giant co-lo building in San Francisco as the  
allmydata.com servers.  This made me wish for decentralized  
introduction.  Fortunately, there are some students interested in  
hacking on decentralized introduction this summer, as prompted by the  
Google Summer of Code project.  More on that later.

Then I went back to my house, where I have a volunteer grid node  
already running.  Hooray!  It doesn't need the introducer, since it  
already knows the furls of all of the (current) volunteer grid  
servers.  It looks something like this (attached as grid-status.html):

Nickname  Connected?  Since
SECORP-MAC  No  2009-04-09_10:58:20
SECORP_DOT_NET_02  Yes: to  2009-04-08_07:24:33
stockrt-terra  No  2009-04-09_10:58:20.338400  2009-04-07_16:47:54
draco (Zooko's Mac/PPC 867 MHz laptop)  No  2009-04-09_09:25:43
trid0  Yes: to seron.dyndns.org:3141  2009-04-08_07:30:41
SECORP_DOT_NET_04  Yes: to  2009-04-08_07:24:33
SECORP_DOT_NET_03  Yes: to  2009-04-08_07:24:33
yukyuk (Zooko's Athlon64 Linux workstation)  Yes: to (loopback)   
francois1 at tahoe.ctrlaltdel.ch  Yes: to   
SECORP_DOT_NET_01  Yes: to  2009-04-08_07:24:33
ndurner  Yes: to ent.merseine.nu:4961  2009-04-08_07:27:37.321217   
trelbox  Yes: to trel.dyndns.org:56034  2009-04-08_23:26:26.063565   
aogail-volunteergrid @ tigard.w007.org  No   
2009-04-09_10:58:20.347164  2009-04-08_17:13:52

I ran a deep check of my directory of flac files, and here are the  
results, below, and attached as deep-check-flax.html.  Now that's  
more like it -- all of my files are still usable.  Just for kicks, I  
played some music for myself from the volunteer grid.

Once we have a map with little colored pins in showing the locations  
of the volunteer grid servers, I will be able to visually imagine  
what sorts of internet outages the volunteergrid can tolerate without  
interrupting operation.

Hm...  I wonder if we could use one of those service that tries to  
map your IP address to a geographic location to automatically  
generate such a map.



Relative Path  	Healthy  	Recoverable  	Storage Index  	Summary
<root> 	False 	True 	tsxqypdes5ysfme6cuqebinmdy 	Unhealthy: 8 shares  
(enc 6-of-9)
01 Battery.flac 	False 	True 	efqq2yffg36yvclyd5wff5wwrm 	Not  
Healthy: 5 shares (enc 4-of-6)
02 Master Of Puppets.flac 	True 	True 	inal62aoisoa4huma5fmzq3req 	 
03 The Thing That Should Not Be.flac 	False 	True 	 
4ub2wplyahagoner4npnzbshle 	Not Healthy: 5 shares (enc 4-of-6)
04 Welcome Home (Sanitarium).flac 	False 	True 	 
jjrflgi4bpkodj5q7ygz6wz6oy 	Not Healthy: 5 shares (enc 4-of-6)
05 Disposable Heroes.flac 	False 	True 	tdsby2jfikwsy67vfgoomax26e 	 
Not Healthy: 5 shares (enc 4-of-6)
06 Leper Messiah.flac 	False 	True 	figj73pp7e7swislhr7bo3ol2u 	Not  
Healthy: 5 shares (enc 4-of-6)
07 Orion.flac 	False 	True 	crcijgwqs7wmzd4juhcu4xpy5u 	Not Healthy:  
5 shares (enc 4-of-6)
08 Damage Inc..flac 	False 	True 	tydsprhi2bkdnqhzxsdzpgbjdq 	Not  
Healthy: 5 shares (enc 4-of-6)
13-of-16-fec 	False 	True 	xzazzqtgytts6mp3iiv6tjrxdy 	Unhealthy: 14  
shares (enc 13-of-16)
13-of-16-fec/01 Battery.flac 	False 	True 	wdbbrh2y4p77y7jpflkbucuzvy  
	Not Healthy: 15 shares (enc 13-of-16)
13-of-26-fec 	False 	True 	igmsz64hn6egnqu5c6e4bjmtsq 	Unhealthy: 23  
shares (enc 13-of-26)
13-of-26-fec/01 Battery.flac 	False 	True 	uwgvqn372sycxncb6p3ct6rd7i  
	Not Healthy: 23 shares (enc 13-of-26)
13-of-26-fec/02 Master Of Puppets.flac 	False 	True 	 
uwqr6zia2qn2sgd2mxbwzui6ei 	Not Healthy: 23 shares (enc 13-of-26)
13-of-26-fec/03 The Thing That Should Not Be.flac 	False 	True 	 
xollsyamtkwcg3nxhxwl4qffyq 	Not Healthy: 23 shares (enc 13-of-26)
13-of-26-fec/04 Welcome Home (Sanitarium).flac 	False 	True 	 
2izwujkldbas3z2oryunctseoi 	Not Healthy: 23 shares (enc 13-of-26)

More information about the tahoe-dev mailing list