Opened 3 years ago

Closed 3 years ago

#100 closed defect (fixed)

Wrong handling of non-ASCII characters in paths under Python 2

Reported by: schwa Owned by: schwa
Priority: major Milestone: 3.3
Component: Library Version: 3.2
Keywords: listdir, byte strings, unicode strings, UnicodeEncodeError, Python 2, Python 3 Cc:

Description

For ticket #96, I was looking into the exact behavior of ftputil when encoding and decoding file system paths for FTPHost.listdir.

It turned out that under Python 2 ftputil doesn't work correctly with paths that contain non-ASCII characters. The intended behavior is:

Path argument is a byte string:

  • Characters are sent as they are, even if they are non-ASCII characters (i. e. character codes greater than 127).
  • Returned lines are byte strings (latin1-encoded, i. e. the data sent by the server) if the listdir argument was a byte string.

Path argument is a unicode string:

  • Unicode strings are assumed to be decoded from latin1. For example, to get the contents of the directory b'\xc3\xa4bc' (corresponding to "äbc"), the unicode string must be b'\xc3\xa4bc'.decode("latin1").
  • Returned lines are unicode strings (decoded assuming latin1 encoding).

This behavior is indeed seen when ftputil runs under Python 3. (By the way, the handling of unicode strings is strange, but it's used to be compatible with Python 3's ftplib.FTP implementation. If ftputil.FTPHost would behave differently, it couldn't work with non-ASCII paths that were created with ftplib.FTP under Python 3.)

Under Python 2, pure ASCII strings work as above regardless of being byte strings or unicode strings. However, when the strings contain characters with codes greater than 127, FTPHost.listdir raises a UnicodeEncodeError, again both for byte strings and unicode strings.

Change History (4)

comment:1 Changed 3 years ago by schwa

  • Status changed from new to assigned

comment:2 Changed 3 years ago by schwa

There's a unit test test.test_host.TestAcceptEitherUnicodeOrBytes.test_listdir:

    def test_listdir(self):
        """Test whether `listdir` accepts either unicode or bytes."""
        host = self.host
        as_bytes = ftputil.tool.as_bytes
        host.chdir("/home/file_name_test")
        # Unicode
        items = host.listdir("ä")
        self.assertEqual(items, ["ö", "o"])
        #  Need explicit type check for Python 2
        for item in items:
            self.assertTrue(isinstance(item, ftputil.compat.unicode_type))
        # Bytes
        items = host.listdir(as_bytes("ä"))
        self.assertEqual(items, [as_bytes("ö"), as_bytes("o")])
        #  Need explicit type check for Python 2
        for item in items:
            self.assertTrue(isinstance(item, ftputil.compat.bytes_type))

As far as I can tell this should catch the problematic behavior in the ticket description. I need to find out why the test doesn't uncover the problem.

comment:3 Changed 3 years ago by schwa

The problem with the unit test is that it uses a mock session, that is, it doesn't use ftplib.FTP to access a real FTP server. Instead, the mock dir method returns lines from a multi-line string. Here's the mock implementation of ftplib.FTP.dir:

    def dir(self, *args):
        """
        Provide a callback function for processing each line of a
        directory listing. Return nothing.
        """
        # The callback comes last in `ftplib.FTP.dir`.
        if isinstance(args[-1], collections.Callable):
            # Get `args[-1]` _before_ removing it in the line after.
            callback = args[-1]
            args = args[:-1]
        else:
            callback = None
        # Everything before the path argument are options.
        path = args[-1]
        if DEBUG:
            print("dir: {0}".format(path))
        path = self._transform_path(path)
        if path not in self.dir_contents:
            raise ftplib.error_perm
        dir_lines = self.dir_contents[path].split("\n")
        for line in dir_lines:
            if callback is None:
                print(line)
            else:
                callback(line)

Only tests in test/test_real_ftp.py actually talk to an FTP server. I'm going to add a test there.

comment:4 Changed 3 years ago by schwa

  • Resolution set to fixed
  • Status changed from assigned to closed

The problem should be fixed now.

  • Allow ftputil.test.mock_ftplib.MockSession.transfercmd to take a rest argument because this argument is forwarded by the adapted session [086cea309114].
  • Fix: Adapt the session, not the session factory [3bdb8233c7f2]. If the session factory is a function instead of a class, we can't inherit from it.
Note: See TracTickets for help on using tickets.