[tahoe-dev] Unicode issues review

Shawn Willden shawn-tahoe at willden.org
Tue Feb 17 18:52:10 UTC 2009


On Tuesday 17 February 2009 10:31:35 am zooko wrote:
> Ugh -- you mean to tell me that the filesystem itself might not know
> what encoding a filename is in?

Yep!

On most (all?) Unix-style systems, locale is an environment setting with a 
system-wide default, but can be overridden per-user (or even per-shell).  The 
file system doesn't know anything about the encoding used, it just coughs up 
the bytes and relies on higher layers to make sense of them, per the current 
locale.

There's also no enforcement, by any layer, really, that any of the file 
systems make sense in the current locale, or any other.

Even if the file system did know the encoding on a per-name basis, there's 
still no guarantee that other names won't slip in, because there are plenty 
of ways files can be transferred by tools that don't worry about name 
encodings.

> In that case, examining the 
> directory with "ls" or a gooey file browser would show gibberish,
> right?

See the attached screenshot.  This is from my machine.  The name would be 
meaningless to me even if I knew what the encoding was, because it's Korean.  
The content, however, is quite useful to me, so I'm just happy that my system 
and application software lets me open and use it, even with the bizarre name.  
Maybe I should rename it, but it's from a CVS repository and I don't want it 
to show up on the "cvs update" list.

These kinds of things are common when you work with people from around the 
world.  Mostly everyone tries to stick to English, but stuff slips through.

	Shawn.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snapshot1.png
Type: image/png
Size: 38827 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20090217/c212446f/attachment.png>


More information about the tahoe-dev mailing list