Opened 9 years ago

Closed 8 years ago

#69 closed defect (fixed)

Error parsing directory holding items with names consisting of space characters only

Reported by: ftputiluser Owned by: schwa
Priority: major Milestone: 2.8
Component: Library Version: 2.7.1
Keywords: listdir, whitespace, parser Cc: e-wyze@…

Description (last modified by schwa)

In case if there is an entry with an empty name or a name consisting of spaces only in the certain folder, listdir() is unable to correctly parse the response from FTP server. This leads to ValueError exception.

Use case of the server response (after 23:35 there are 4 spaces):

drwx------    2 1021     601          4096 Nov 14 23:35     
d-w-------    5 1021     601          4096 Oct 11  2011 test

Any attempt to list these items with listdir() or to remove the holding folder with rmtree() raises the exception:

Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Python27\lib\site-packages\ftputil\", line 759, in rmtree
    names = self.listdir(path)
  File "C:\Python27\lib\site-packages\ftputil\", line 863, in listdir
    return self._stat._listdir(path)
  File "C:\Python27\lib\site-packages\ftputil\", line 602, in _listdir
    return self.__call_with_parser_retry(self._real_listdir, path)
  File "C:\Python27\lib\site-packages\ftputil\", line 578, in __call_with_parser_retry
    result = method(*args, **kwargs)
  File "C:\Python27\lib\site-packages\ftputil\", line 460, in _real_listdir
    for stat_result in self._stat_results_from_dir(path):
  File "C:\Python27\lib\site-packages\ftputil\", line 436, in _stat_results_from_dir
  File "C:\Python27\lib\site-packages\ftputil\", line 297, in parse_line
    year_or_time, name = self._split_line(line)
ValueError: need more than 8 values to unpack


215 UNIX Type: L8

Looks strange but at my server I was able to create a new folder with an empty name using FTPHost.mkdir(" ").

Regards, Alexander

Change History (10)

comment:1 Changed 9 years ago by schwa

Keywords: listdir whitespace added
Status: newassigned

Hi Alexander, thanks for the report.

That's quite an unusual use case, I suppose, but I agree that ftputil ideally should be able to handle the situation. This reminds me of ticket #60, which was caused by leading whitespace in a file name. Please read the comments there.

The basic problem is that ftputil only gets the directory listing from the server; this listing is the same you get with ls in a command line FTP client. As far as I know there's no formal standard for this listing. It just happens to be the same for most FTP servers by convention.

An attempt to solve your problem might be to change ftputil's parsing routines to use regular expressions instead of just line.split. But if I, for example, assume exactly one space between a timestamp and a file name and a server puts two spaces instead of one between the timestamp and the name, every filename would seem to have a leading space.

Interestingly, at least some FTP servers also don't like whitespace in names. When I tried to create a directory consisting of only one space, the FTP server I use for testing changed the space to an underscore.

I closed ticket #60 as WONTFIX for the above reasons. I'll probably do the same with this one unless you have a good idea how leading or trailing whitespace in file names could be handled.

That said, if I close this one as WONTFIX, the problem with leading/trailing whitespace should be mentioned in the documentation and the way of reporting the problem should be improved (e. g. by raising a ParserError and hint at the whitespace problem in the exception message).

comment:2 in reply to:  1 Changed 9 years ago by ftputiluser

Thanks for the reply, Stefan.

At least I have one idea regarding this. Since it is almost impossible to make universal parsing of a directory listing (which is more than understandable), what if we add a possibility to skip an item with inappropriate name. I think it is better to list all directory items without a "bad" item than not to list the directory at all. I believe that could be done using exception handling for ValueError.

I am a Windows user and I utilize Total Commander and WinSCP for working with FTP. Both software list all items except the one with the "bad" name. However, they are unable to remove the directory holding that item, but that's not a big problem IMO.

To be on the safe side we can add a setting (globally or for listdir only) of skipping "bad" entries or raising a ParserError.

comment:3 Changed 9 years ago by schwa

Description: modified (diff)

Hey, that's an excellent idea. Thanks a lot! :-)

I don't know yet if I'll add the flag, but skipping problematic entries instead of aborting the whole listing operation definitely makes sense to me. I wonder why I didn't have this idea myself.

comment:4 Changed 9 years ago by schwa

Keywords: parser added
Summary: Error parsing directory holding items with empty namesError parsing directory holding items with names consisting of space characters only

comment:5 Changed 9 years ago by schwa

I've committed changeset [09f76157a11a] which makes ftputil raise a ParserError instead of a ValueError if the name consists of whitespace only.

comment:6 Changed 9 years ago by schwa

Regarding ignoring invalid lines:

The bad news is that I won't implement this for now. The documented behavior of parse_line is that it returns a ParseResult instance or raises a ParserError if it can't parse the line.

The good news is that there's a relatively simple workaround. You can use the documented parser API to implement a custom parser which ignores invalid lines (although this isn't the intended use for ignores_line):

# Untested code!

import ftputil
from ftputil import ftp_error
from ftputil import ftp_stat

class TolerantUnixParser(ftp_stat.UnixParser):
    This parser ignores invalid lines which would otherwise abort
    parsing of a directory listing with a `ParserError`.

    def ignores_line(self, line):
        if super(TolerantUnixParser, self).ignores_line(line):
            return True
            # Time shift shouldn't be of interest here.
        except ftp_error.ParserError:
            return True
            return False

ftp_host = ftputil.FTPHost(...)

A disadvantage of the approach is that each valid line is parsed twice. However, this probably won't matter much in comparison to network latency, unless you're on a network with gigabit-per-second bandwidth.

comment:7 in reply to:  6 Changed 9 years ago by schwa

Replying to schwa:

The bad news is that I won't implement this for now. The documented behavior of parse_line is that it returns a ParseResult instance or raises a ParserError if it can't parse the line.

Actually, I've implemented something in changeset [6f7b16432a38] *, but I'm not really happy with it for the reason in the commit message. Do you have other ideas?

I think it's a good idea to ignore parser errors only via an explicitly-set parser. If ignoring invalid lines would be the default, the automatic parser switching from Unix to MS format would no longer work. (ftputil tries the Unix parser first, and if this raises a ParserError, tries the parser for the Microsoft format. Setting a parser explicitly disables this automatic switching.)

* Please ignore the TolerantUnixParser in I only wrote this there to see how it would look like before I pasted it into the issue tracker in the previous comment. The class wasn't intended to go into the actual ftputil code.

comment:8 Changed 9 years ago by schwa

Milestone: 2.8

comment:9 Changed 8 years ago by schwa

I reverted the mentioned change [6f7b16432a38] because I think it's rather awkward (see the commit message). If a user really wants to ignore whitespace-only names, he/she can still use the custom parser workaround from comment 6.

comment:10 Changed 8 years ago by schwa

Resolution: fixed
Status: assignedclosed

I set this ticket to "fixed" because I changed the ValueError to the documented ParserError (see comment 5).

Note: See TracTickets for help on using tickets.