Opened 4 years ago
Closed 4 years ago
#100 closed defect (fixed)
Wrong handling of non-ASCII characters in paths under Python 2
Reported by: | schwa | Owned by: | schwa |
---|---|---|---|
Priority: | major | Milestone: | 3.3 |
Component: | Library | Version: | 3.2 |
Keywords: | listdir, byte strings, unicode strings, UnicodeEncodeError, Python 2, Python 3 | Cc: |
Description
For ticket #96, I was looking into the exact behavior of ftputil when encoding and decoding file system paths for FTPHost.listdir
.
It turned out that under Python 2 ftputil doesn't work correctly with paths that contain non-ASCII characters. The intended behavior is:
Path argument is a byte string:
- Characters are sent as they are, even if they are non-ASCII characters (i. e. character codes greater than 127).
- Returned lines are byte strings (latin1-encoded, i. e. the data sent by the server) if the
listdir
argument was a byte string.
Path argument is a unicode string:
- Unicode strings are assumed to be decoded from latin1. For example, to get the contents of the directory
b'\xc3\xa4bc'
(corresponding to "äbc"), the unicode string must beb'\xc3\xa4bc'.decode("latin1")
.
- Returned lines are unicode strings (decoded assuming latin1 encoding).
This behavior is indeed seen when ftputil runs under Python 3. (By the way, the handling of unicode strings is strange, but it's used to be compatible with Python 3's ftplib.FTP
implementation. If ftputil.FTPHost
would behave differently, it couldn't work with non-ASCII paths that were created with ftplib.FTP
under Python 3.)
Under Python 2, pure ASCII strings work as above regardless of being byte strings or unicode strings. However, when the strings contain characters with codes greater than 127, FTPHost.listdir
raises a UnicodeEncodeError
, again both for byte strings and unicode strings.
Change History (4)
comment:1 Changed 4 years ago by
Status: | new → assigned |
---|
comment:2 Changed 4 years ago by
comment:3 Changed 4 years ago by
The problem with the unit test is that it uses a mock session, that is, it doesn't use ftplib.FTP
to access a real FTP server. Instead, the mock dir
method returns lines from a multi-line string. Here's the mock implementation of ftplib.FTP.dir
:
def dir(self, *args): """ Provide a callback function for processing each line of a directory listing. Return nothing. """ # The callback comes last in `ftplib.FTP.dir`. if isinstance(args[-1], collections.Callable): # Get `args[-1]` _before_ removing it in the line after. callback = args[-1] args = args[:-1] else: callback = None # Everything before the path argument are options. path = args[-1] if DEBUG: print("dir: {0}".format(path)) path = self._transform_path(path) if path not in self.dir_contents: raise ftplib.error_perm dir_lines = self.dir_contents[path].split("\n") for line in dir_lines: if callback is None: print(line) else: callback(line)
Only tests in test/test_real_ftp.py
actually talk to an FTP server. I'm going to add a test there.
comment:4 Changed 4 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
The problem should be fixed now.
- Add a session factory adapter in [5a4abb73e8a1] and modify
FTPHost
to use the adapter in [350fad454a39].
- Allow
ftputil.test.mock_ftplib.MockSession.transfercmd
to take arest
argument because this argument is forwarded by the adapted session [086cea309114].
- Fix: Adapt the session, not the session factory [3bdb8233c7f2]. If the session factory is a function instead of a class, we can't inherit from it.
There's a unit test
test.test_host.TestAcceptEitherUnicodeOrBytes.test_listdir
:As far as I can tell this should catch the problematic behavior in the ticket description. I need to find out why the test doesn't uncover the problem.