[tahoe-dev] Help uploading when file exists but needs repair

Kyle Markley kyle at arbyte.us
Wed Dec 1 06:58:50 UTC 2010


 Brian et al,

> Huh? Shouldn't the new upload just put new shares in place? I know 
> our
> uploader isn't particularly clever in the face of existing shares (it
> will put multiple shares on one server, and in general not achieve 
> the
> ideal diversity), but it shouldn't just fail.

 Ok; maybe I'm misunderstanding the failure.  Let's do a more robust 
 diagnosis.

 Start with this to clear out old cruft:
 rm ~/.tahoe/private/aliases
 rm ~/.tahoe/private/backupdb.sqlite
 tahoe create-alias $USER

 $ tahoe --version
 allmydata-tahoe: 1.8.1, foolscap: 0.5.1, pycryptopp: 0.5.25, zfec: 
 1.4.7, Twisted: 10.1.0, Nevow: 0.10.0, zope.interface: 3.6.1, python: 
 2.6.5, platform: 
 OpenBSD-4.8-amd64-Genuine_Intel-R-_CPU_000_ at _2.93GHz-64bit-ELF, sqlite: 
 3.6.23, simplejson: 2.1.2, argparse: 1.1, pycrypto: 2.3, pyOpenSSL: 
 0.11, pyutil: 1.7.12, zbase32: 1.1.2, setuptools: 0.6c11, pyasn1: 
 0.0.11a, pysqlite: 2.4.1

 $ tahoe backup -v --exclude-vcs --exclude=build --exclude=.darcs 
 --exclude=.python-eggs $HOME $USER:
 .... lots of normal-looking output, followed by ....
 uploading '/storage/_buildbot/.login'..
 Traceback (most recent call last):
   File "/usr/local/bin/tahoe", line 9, in <module>
     load_entry_point('allmydata-tahoe==1.8.1', 'console_scripts', 
 'tahoe')()
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/runner.py", 
 line 111, in run
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/runner.py", 
 line 97, in runner
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/cli.py", line 
 513, in backup
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py", 
 line 324, in backup
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py", 
 line 117, in run
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py", 
 line 193, in process
   File 
 "/usr/local/lib/python2.6/site-packages/allmydata/scripts/tahoe_backup.py", 
 line 304, in upload
 allmydata.scripts.common_http.HTTPError: Error during file PUT: 500 
 Internal Server Error
 "Traceback (most recent call last):\x0a  File 
 \"build/bdist.openbsd-4.8-amd64/egg/foolscap/call.py\", line 674, in 
 _done\x0a    \x0a  File 
 \"build/bdist.openbsd-4.8-amd64/egg/foolscap/call.py\", line 60, in 
 complete\x0a    \x0a  File 
 \"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\", 
 line 318, in callback\x0a    self._startRunCallbacks(result)\x0a  File 
 \"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\", 
 line 424, in _startRunCallbacks\x0a    self._runCallbacks()\x0a--- 
 <exception caught here> ---\x0a  File 
 \"/usr/local/lib/python2.6/site-packages/Twisted-10.1.0-py2.6-openbsd-4.8-amd64.egg/twisted/internet/defer.py\", 
 line 441, in _runCallbacks\x0a    self.result = callback(self.result, 
 *args, **kw)\x0a  File 
 \"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\", 
 line 546, in _got_response\x0a    \x0a  File 
 \"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\", 
 line 396, in _loop\x0a    \x0a  File 
 \"/usr/local/lib/python2.6/site-packages/allmydata/immutable/upload.py\", 
 line 561, in _failed\x0a    
 \x0aallmydata.interfaces.UploadUnhappinessError: shares could be placed 
 on only 3 server(s) such that any 2 of them have enough shares to 
 recover the file, but we were asked to place shares on at least 4 such 
 servers. (placed all 4 shares, want to place shares on at least 4 
 servers such that any 2 of them have enough shares to recover the file, 
 sent 4 queries to 4 peers, 4 queries placed some shares, 0 placed none 
 (of which 0 placed none due to the server being full and 0 placed none 
 due to an error))\x0a"


 So it appears it's failing to upload the .login file.  The specific 
 error message doesn't make sense to me -- if all 4 queries placed some 
 shares, and 0 queries placed none, then why hasn't the file become 
 healthy?

 In this particular case I am able to locate a copy of that file on the 
 grid.  This is the output from tahoe check --raw for what I believe is 
 the corresponding file.  Note that one server has two shares and two 
 servers have none; I don't know whether that's relevant.  (I'd like to 
 learn how to be more certain I'm looking at the correct object, to begin 
 with!):

 {
  "results": {
   "needs-rebalancing": true,
   "count-shares-expected": 4,
   "healthy": false,
   "count-unrecoverable-versions": 0,
   "count-shares-needed": 2,
   "sharemap": {
    "0": [
     "juwmgssmwnhrhfdcpxxmrz3bghh37esx"
    ],
    "1": [
     "vjqcroalrgmft66mgiwfjug667fl6qjd"
    ],
    "3": [
     "juwmgssmwnhrhfdcpxxmrz3bghh37esx"
    ]
   },
   "count-recoverable-versions": 1,
   "servers-responding": [
    "vjqcroalrgmft66mgiwfjug667fl6qjd",
    "juwmgssmwnhrhfdcpxxmrz3bghh37esx",
    "47cslusczp3uu2kygodi3nlalcruscif",
    "xxaj2tgmnl7debjdpn4mgv2oks6pjjnx"
   ],
   "count-good-share-hosts": 2,
   "count-wrong-shares": 0,
   "count-shares-good": 3,
   "count-corrupt-shares": 0,
   "list-corrupt-shares": [],
   "recoverable": true
  },
  "storage-index": "dumi26otgmnemrypt3zlesxm5y",
  "summary": "Not Healthy: 3 shares (enc 2-of-4)"
 }

 What do the expert folk make of this situation?

-- 
 Kyle Markley



More information about the tahoe-dev mailing list