~sschwarzer/ftputil#53: 
FTPHost.walk fails when the argument is a unicode string and the tree contains non-ASCII characters

When FTPHost.walk is used to examine a filesystem tree which somewhere contains a non-7-bit-ASCII character in a name and the argument passed in is any unicode string, the walk method implicitly will raise a UnicodeEncodeError.

Imagine this directory structure:

some_dir
    some_stränge_file

then

ftp_host.walk(u"some_dir")

will cause a UnicodeEncodeError in posixpath like:

   File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 70, in join
     path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 41: ordinal not in range(128)

(original ​report by Henning Hraban Ramm - thanks!)

Status
RESOLVED FIXED
Submitter
schwa (unverified)
Assigned to
No-one
Submitted
13 years ago
Updated
13 years ago
Labels
bug library

schwa (unverified) 13 years ago · edit

As pointed out in ​this mail, FTP has no concept of encodings. As the encoding of the directories and files on the remote side is unknown, there's no convenient solution.

At the moment, I think the most appropriate approach is to have a method fail as early as possible if it accepts remote paths and gets a unicode string for them.

A solution might be something like:

def walk(self, root):
    # If `root` isn't ASCII, fail now instead of later.
    # Otherwise, continue with a byte string.
    root = str(root)
    ...

schwa (unverified) 13 years ago · edit

I put this off until ftputil 2.5.1. I'll have to go through all the methods and see how they're affected, and I don't want to delay the release of ftputil 2.5 final even more after the apparently long beta phase.

schwa (unverified) 13 years ago · edit

Register here or Log in to comment, or comment via email.