~sschwarzer/ftputil#69: 
Error parsing directory holding items with names consisting of space characters only

In case if there is an entry with an empty name or a name consisting of spaces only in the certain folder, listdir() is unable to correctly parse the response from FTP server. This leads to ValueError exception.

Use case of the server response (after 23:35 there are 4 spaces):

drwx------    2 1021     601          4096 Nov 14 23:35     
d-w-------    5 1021     601          4096 Oct 11  2011 test

Any attempt to list these items with listdir() or to remove the holding folder with rmtree() raises the exception:

Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Python27\lib\site-packages\ftputil\__init__.py", line 759, in rmtree
    names = self.listdir(path)
  File "C:\Python27\lib\site-packages\ftputil\__init__.py", line 863, in listdir
    return self._stat._listdir(path)
  File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 602, in _listdir
    return self.__call_with_parser_retry(self._real_listdir, path)
  File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 578, in __call_with_parser_retry
    result = method(*args, **kwargs)
  File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 460, in _real_listdir
    for stat_result in self._stat_results_from_dir(path):
  File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 436, in _stat_results_from_dir
    self._host.time_shift())
  File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 297, in parse_line
    year_or_time, name = self._split_line(line)
ValueError: need more than 8 values to unpack

SYST:

215 UNIX Type: L8

Looks strange but at my server I was able to create a new folder with an empty name using FTPHost.mkdir(" ").

Regards, Alexander

Status
RESOLVED FIXED
Submitter
ftputiluser (unverified)
Assigned to
No-one
Submitted
11 years ago
Updated
11 years ago
Labels
bug library

schwa (unverified) 11 years ago · edit

Hi Alexander, thanks for the report.

That's quite an unusual use case, I suppose, but I agree that ftputil ideally should be able to handle the situation. This reminds me of ticket #60, which was caused by leading whitespace in a file name. Please read the comments there.

The basic problem is that ftputil only gets the directory listing from the server; this listing is the same you get with ls in a command line FTP client. As far as I know there's no formal standard for this listing. It just happens to be the same for most FTP servers by convention.

An attempt to solve your problem might be to change ftputil's parsing routines to use regular expressions instead of just line.split. But if I, for example, assume exactly one space between a timestamp and a file name and a server puts two spaces instead of one between the timestamp and the name, every filename would seem to have a leading space.

Interestingly, at least some FTP servers also don't like whitespace in names. When I tried to create a directory consisting of only one space, the FTP server I use for testing changed the space to an underscore.

I closed ticket #60 as WONTFIX for the above reasons. I'll probably do the same with this one unless you have a good idea how leading or trailing whitespace in file names could be handled.

That said, if I close this one as WONTFIX, the problem with leading/trailing whitespace should be mentioned in the documentation and the way of reporting the problem should be improved (e. g. by raising a ParserError and hint at the whitespace problem in the exception message).

ftputiluser (unverified) 11 years ago · edit

Thanks for the reply, Stefan.

At least I have one idea regarding this. Since it is almost impossible to make universal parsing of a directory listing (which is more than understandable), what if we add a possibility to skip an item with inappropriate name. I think it is better to list all directory items without a "bad" item than not to list the directory at all. I believe that could be done using exception handling for ValueError.

I am a Windows user and I utilize Total Commander and WinSCP for working with FTP. Both software list all items except the one with the "bad" name. However, they are unable to remove the directory holding that item, but that's not a big problem IMO.

To be on the safe side we can add a setting (globally or for listdir only) of skipping "bad" entries or raising a ParserError.

schwa (unverified) 11 years ago · edit

Hey, that's an excellent idea. Thanks a lot! :-)

I don't know yet if I'll add the flag, but skipping problematic entries instead of aborting the whole listing operation definitely makes sense to me. I wonder why I didn't have this idea myself.

schwa (unverified) 11 years ago · edit

I've committed changeset 11506029e0290b45ad2873d48720da16fcb5619c which makes ftputil raise a ParserError instead of a ValueError if the name consists of whitespace only.

schwa (unverified) 11 years ago · edit

Regarding ignoring invalid lines:

The bad news is that I won't implement this for now. The documented behavior of parse_line is that it returns a ParseResult instance or raises a ParserError if it can't parse the line.

The good news is that there's a relatively simple workaround. You can use the documented parser API to implement a custom parser which ignores invalid lines (although this isn't the intended use for ignores_line):

# Untested code!

import ftputil
from ftputil import ftp_error
from ftputil import ftp_stat


class TolerantUnixParser(ftp_stat.UnixParser):
    """
    This parser ignores invalid lines which would otherwise abort
    parsing of a directory listing with a `ParserError`.
    """

    def ignores_line(self, line):
        if super(TolerantUnixParser, self).ignores_line(line):
            return True
        try:
            # Time shift shouldn't be of interest here.
            self.parse_line(line)
        except ftp_error.ParserError:
            return True
        else:
            return False


ftp_host = ftputil.FTPHost(...)
ftp_host.set_parser(TolerantUnixParser())

A disadvantage of the approach is that each valid line is parsed twice. However, this probably won't matter much in comparison to network latency, unless you're on a network with gigabit-per-second bandwidth.

schwa (unverified) 11 years ago · edit

Replying to schwa:

The bad news is that I won't implement this for now. The documented behavior of parse_line is that it returns a ParseResult instance or raises a ParserError if it can't parse the line.

Actually, I've implemented something in changeset 935ddb1b0d14a9958606660fa0a2158210b1d287 *, but I'm not really happy with it for the reason in the commit message. Do you have other ideas?

I think it's a good idea to ignore parser errors only via an explicitly-set parser. If ignoring invalid lines would be the default, the automatic parser switching from Unix to MS format would no longer work. (ftputil tries the Unix parser first, and if this raises a ParserError, tries the parser for the Microsoft format. Setting a parser explicitly disables this automatic switching.)

* Please ignore the TolerantUnixParser in ftp_stat.py. I only wrote this there to see how it would look like before I pasted it into the issue tracker in the previous comment. The class wasn't intended to go into the actual ftputil code.

schwa (unverified) 11 years ago · edit

I reverted the mentioned change 935ddb1b0d14a9958606660fa0a2158210b1d287 because I think it's rather awkward (see the commit message). If a user really wants to ignore whitespace-only names, he/she can still use the custom parser workaround from comment 6.

schwa (unverified) 11 years ago · edit

I set this ticket to "fixed" because I changed the ValueError to the documented ParserError (see comment 5).

Register here or Log in to comment, or comment via email.