[tahoe-dev] String encoding in tahoe

Dan McNair glucnac at gmail.com
Tue Dec 23 03:31:19 UTC 2008


On Mon, Dec 22, 2008 at 18:33, zooko <zooko at zooko.com> wrote:

> Okay, after testing on my Macbook Pro, I committed François's patch
> [1], and some related patches of my own [2, 3, 4].  This fixed the
> cli tests on Ubuntu Feisty -- hooray!  But it broke the test on
> cygwin, GNU/OpenSolaris, Windows, and ArchLinux -- boo!  See the
> buildbot for details [5].


Got ArchLinux working again with a simple fix outside of the Tahoe source.

I had all my locale environment set to "C", so when os.stat() was called, it
automagically converted its unicode argument (at least, I assume it's a
unicode object still, I only glanced at the source, and figured it wouldn't
be trying to call encode() if it wasn't a unicode object) using the 'ascii'
encoding. Which failed, because ascii can't encode anything except A-Za-z0-9
and the other basics.

So I switched my system over to using "en_US.UTF-8" for the locale, and now
os.stat() (I assume) is autoconverting to UTF-8, which can represent the
special characters in the test filename, and all is well.

There's a big chance something similar may be happening on the OpenSolaris
box, a smaller chance it's related to the cygwin failure, and I have no idea
whether the failure on Windows has anything at all to do with it.

I guess that it is incorrect to assume that the Python strings that
> appear in sys.argv are utf-8 encoded.  They could be in some other
> encoding.


My guess is this is locale-specific on most POSIX platforms.

Your mileage my vary,

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20081222/8ed07c57/attachment.html>


More information about the tahoe-dev mailing list