[tahoe-dev] [tahoe-lafs] #686: Search for lost share resulted in a directory popping up at unexpected place

tahoe-lafs trac at allmydata.org
Sun Apr 26 13:36:12 UTC 2009

#686: Search for lost share resulted in a directory popping up at unexpected
 Reporter:  [4-tea-2]  |           Owner:  nobody   
     Type:  defect     |          Status:  new      
 Priority:  major      |       Milestone:  undecided
Component:  unknown    |         Version:  1.4.1    
 Keywords:             |   Launchpad_bug:           
 I'm currently running a private test grid which, over the last few weeks,
 grew to 20 nodes. As test data, I'm using my audio folder, I backed it up
 in a few stages using "tahoe backup .../audio media:audio". The grid is
 running "3-of-5", since all of the nodes are pretty reliable and under my

 A couple of days, I ran a "tahoe deep-check --add-lease media:" and got a
 summary indicating an unhealthy file. I ran a few more deep-checks until I
 found the affected file ("tahoe deep-check media:" did not give the file
 name, "tahoe deep-check -v media:" gave the filename but at that time I
 didn't see it because "grep -v Healthy" also matched the "Not Healthy"
 message ;) - finally running deep-check from the WUI gave me the filename
 and the storage index).

 Local file:
 .../audio/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE
 - EAC)/Aim - Fabriclive 17.wav

 Affected file in grid:
 media:audio/Archives/2009-04-17_23:04:36Z/untagged or
 incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim -
 Fabriclive 17.wav

 Message from "tahoe deep-check -v media:":
 audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim -
 Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav: Not Healthy: 4
 shares (enc 3-of-5)

 Checking the file from the WUI gave me the list of the available shares,
 1-4. Share 0 was gone.

 Since I wanted to find out why the share vanished, zooko recommended to
 search .flog files for the storage index. I found 35 incident reports,
 most of those I checked were caused by connectivity problems (e.g.
 introducer not reachable, because I opened the firewall on the introducer
 only after installing and starting the tahoe node), none of the .flog
 files contained the storage index of the unhealthy file.

 The file <storage idx>/0 wasn't physically present in any of the storage/
 folders on any of the nodes (/1, /2, /3, /4 were).

 Well, it seems one of my nodes lost a share without good reason - could
 that happen when a node is restarted while a share is uploading?

 But here's the real weird thing:

 marc at bong:~$ tahoe ls -l media:audio
 drwx - Apr 13 00:02               Archives
 dr-x - Apr 13 00:05                 Latest
 drwx - Apr 25 00:59 untagged or incomplete
 marc at bong:~$ tahoe manifest media:audio/"untagged or incomplete"
 URI:DIR2:... Music
 URI:DIR2:... Music/AIM
 URI:DIR2:... Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)

 For reasons which are a complete mystery to me, part of the directory
 structure of the file with the lost share appeared in the target folder of
 "tahoe backup .../audio media:audio".

 Not the whole directory tree was duplicated, only the folders leading to
 the affected file. The directory Music/ contains many more files and
 directories. Sadly, some of the filenames contain UTF-8 diacritics,
 triggering a "UnicodeEncodeError: 'ascii' codec can't encode character
 u'\xe4' in position 7: ordinal not in range(128)" when I try to "tahoe ls"
 the directory. I can access the files from the WUI, though.

 I did not try to repair the unhealthy file yet, I didn't want to spoil the
 chance to find the original problem.

 I can supply additional info (incident reports etc.) if needed.

Ticket URL: <http://allmydata.org/trac/tahoe/ticket/686>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid

